Next Article in Journal
Root Cause Failure Analysis of Deep-Groove Ball Bearing Used in a Governor
Previous Article in Journal
A Blockchain-Based Trust Model for Uploading Illegal Data Identification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Complementary in Time and Space: Optimization on Cost and Performance with Multiple Resources Usage by Server Consolidation in Cloud Data Center

1
School of Information Science, Guangdong University of Finance and Economics, Guangzhou 510320, China
2
Guangdong Intelligent Business Engineering Technology Research Center, Guangdong University of Finance and Economics, Guangzhou 510320, China
3
School of Information Engineering, Hunan Industry Polytechnic, Changsha 410208, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2022, 12(19), 9654; https://doi.org/10.3390/app12199654
Submission received: 12 August 2022 / Revised: 16 September 2022 / Accepted: 17 September 2022 / Published: 26 September 2022

Abstract

:
The recent COVID-19 pandemic has accelerated the use of cloud computing. The surge in the number of users presents cloud service providers with severe challenges in managing computing resources. Guaranteeing the QoS of multiple users while reducing the operating cost of the cloud data center (CDC) is a major problem that needs to be solved urgently. To solve this problem, this paper establishes a cost model based on multiple computing resources in CDC, which comprehensively considers the hosts’ energy cost, virtual machine (VM) migration cost, and SLAV penalty cost. To minimize this cost, we design the following solution. We employ a convolutional autoencoder-based filter to preprocess the VM historical workload and use an attention-based RNN method to predict the computing resource usage of the VMs in future periods. Based on the predicted results, we trigger VM migration before the host enters an overloaded state to reduce the occurrence of SLAV. A heuristic algorithm based on the complementary use of multiple resources in space and time is proposed to solve the placement problem. Simulations driven by the VM real workload dataset validate the effectiveness of our proposed method. Compared with the existing methods, our proposed method reduces the energy consumption of the hosts and SLAV and reduces the total cost by 26.1~39.3%.

1. Introduction

The worldwide pandemic of the COVID-19 virus has forced people to change the way they live, socialize, and work. To maintain social distance, more human activities than ever have migrated online. In December 2021, Kamouri et al. [1] estimated that approximately 56% of the U.S. workforce could move to online or remote work. McKinsey Consumer Pulse [2] demonstrates that the shares of e-commerce in the major countries increased at two to five times the rate before COVID-19. Compared to the pre-lockdown levels, the usage of Internet services increased from 40% to 100% [3]. The usage of video-conferencing services, such as Zoom, has increased ten times [4]. In addition, to contain the pandemic, massive amounts of residential and medical data were sent to data centers for analysis. For example, China uses health QR codes nationwide to monitor and manage population movements with big data. Behind these large-scale digital activities, the support of cloud computing data centers (CDCs) is inseparable. To meet the corresponding computing power needs, people have now embraced cloud computing in all aspects.
With the explosive increase in the use of cloud computing, cloud service providers (CSPs) are also facing severe challenges in the management of CDCs. The problem mainly comes from two aspects: the cost of maintaining the operation of the CDC and the guarantee of Quality of Service (QoS) for users. With the surge in the number of users, to maintain the business needs of existing users and satisfy new users’ experience of cloud services, CSPs must increase investment in their CDCs. For example, SaaS service provider Salesforce increased the cost of its data center by 43% in 2022, which is USD 284 million, to meet higher customer service requirements [5]. Such a large financial investment is a great economic burden for the operation of an enterprise. Therefore, a feasible way to alleviate the economic pressure is to optimize the management of the CDC and improve the utilization of various computing resources in the CDC. The problem, however, brought about when resource utilization is improved is that the competition for different computing resources among the virtualized computing devices of users is enhanced. When the competition on the host with limited resources is intense to a certain extent, it will inevitably lead to a situation in which the user’s QoS drops significantly. It can be said that there is a certain contradiction between reducing the operating cost of the CDC and ensuring the QoS of users. We need to design a feasible way to achieve the tradeoff between the two.
Users run a wide variety of services in the CDC, and the corresponding virtual machine (VM) workloads are constantly changing due to the dynamic changes in services over time. During peak business hours, VMs have increased demands on computing resources such as CPU, memory, disk, and network. During periods of slack in business, the demand for these computing resources by VMs is also greatly reduced. To adapt to the dynamic changes of such workloads, the commonly used solution for managing hosts and VMs in CDCs is to perform server consolidation periodically [6]. By leveraging VM live migration, server consolidation migrates some VMs on hosts with high resource usage to other hosts with low resource usage to achieve load balancing in the CDC as much as possible. On the other hand, SLA is used to quantitatively describe the QoS for the use of VMs [7]. Hence, to ensure QoS, SLA needs to be taken into account to a certain extent when performing server consolidation.
In general, a round of server consolidation is divided into three sub-steps to implement. The first is host workload detection, which picks out overloaded and underloaded hosts in the cluster. Next is VM selection for overloaded hosts. To achieve the goal of reducing the workload on these hosts and the occurrence of SLAV, suitable VMs need to be selected for migration. Finally, the VM placement method is implemented to map an appropriate destination host for each VM to be migrated. VM placement problems often have specific optimization goals, such as using the smallest number of hosts to mount a given number of VMs. After implementing VM placement, underloaded hosts need to be addressed. By migrating all VMs on the underloaded host to other suitable hosts as much as possible, and shutting down or switching these underloaded hosts to an energy-saving state, the host energy cost of the CDC can be further reduced. The operating cost of the CDC involves many aspects, including host energy consumption, VM migration cost in server consolidation, SLAV penalty, cooling, security, storage, and network bandwidth overhead. The parts directly related to the computing resources used by users, including CPU, memory, disk, and network, are host energy consumption, VM migration overhead, and SLAV penalty. Therefore, in this paper, we comprehensively consider the operating cost of a CDC based on multiple computing resources, establish a relevant cost model, and propose a corresponding server consolidation solution. Our contributions are as follows:
(A)
We formally identify the multi-resource-based cost model for server consolidation, which involves host energy consumption, VM migration, and SLAV. Based on the cost model, the optimization problem is given.
(B)
A convolutional auto-encoder-based filter is leveraged to denoise the VM workload trace. Then, we propose an attention-based RNN method to predict the future workloads of the VMs. Based on the prediction results, a host workload detection policy is proposed.
(C)
To minimize the total cost of server consolidation, we propose a VM selection policy and a VM placement algorithm which consider the multi-resource demands of VMs in the present and future.
(D)
We conduct simulations to evaluate the performance of our proposed solution ARP-TSCP. The simulations’ results indicate that ARP-TSCP can reduce host energy consumption by 18.5~30.3%, SLAV cost by 38~52%, and total cost by 26.1~39.3% as compared to the baseline methods.
The remainder of the paper is organized as follows. In Section 2, we review the related work. In Section 3, we formally propose the cost model and then define the optimization problem to be solved. In Section 4, we propose the heuristic algorithms to solve this problem. In Section 5, we evaluate the performance of the proposed method by real VM workload, trace-driven simulations. In Section 6, we conclude the paper and discuss future works.

2. Related Work

In this section, we first survey the energy consumption models and cost models in cloud server consolidation, then we review the server consolidation solutions.

2.1. Sever Consolidation Cost Models

The cost of server consolidation in the cloud is mainly related to the host energy consumption, VM migration, and SLAV.
CPU is one of the most important components in a host, so many previous works [8,9,10,11,12,13,14,15,16,17] have proposed the host energy models based only on CPU usage or processor performance. However, the composition of a host’s energy consumption is not only related to the CPU factor. Therefore, several works [18,19,20,21,22,23,24] have established the host energy consumption models based on the usage of multiple resources, including CPU, memory, disk, and network. These works are of great help to us in building the proposed cost model of this paper. However, these models only consider the energy consumption when a host is running independently and do not consider the additional energy consumption caused by VM migration in server consolidation.
When addressing server consolidation, costs of VM live migration are primarily considered to be related to CPU [6]. Maziku et al. [25] and Dargie et al. [26] respectively pointed out that the duration of VM migration is related to the memory size of the VM being migrated and the network bandwidth of the host. Therefore, in addition to the CPU, other resources used by the VM migration process should also be included in the cost calculation.
Buyya et al. [6] proposed a CPU-based SLAV calculation method, which was widely adopted by many subsequent works [27,28,29,30,31,32,33,34,35]. However, the QoS of using VMs cannot be measured merely by CPU performance, and SLAV must involve the use of multiple resources. Guan et al. [36] proposed SLAV for GPUs for cloud gaming. Some SLAs [37,38] related to network factors have been proposed, but these models are not entirely built in the context of CDC.

2.2. Server Consolidation Solutions

Multiple solutions have been proposed to handle server consolidation.
Lin et al. [24] proposed a host energy consumption model based on CPU, memory, and disk and a centralized server consolidation structure called DEM. The master node is aware of the energy consumption of slave nodes that host VMs to trigger server consolidation. However, this work does not design a VM placement algorithm and does not consider VM migration and SLAV. Li et al. [39] proposed a server consolidation method based on multi-resource constraints. The goal of this method is to reduce energy consumption while ensuring user QoS. This work considers multi-resource constraints in the host workload detection and the design of the VM placement algorithm. However, in terms of the host energy consumption model and SLAV, only the CPU factor is considered. YADAV et al. [31] established a server consolidation model based on CPU usage, aiming to reduce energy consumption and SLAV at the same time. They proposed an adaptive host overloaded detection method and a bandwidth-aware-based VM selection strategy. Naeen et al. [40] proposed a stochastic process-based server consolidation policy to minimize data center costs while satisfying QoS requirements. The proposed energy consumption model, SLAV, and heuristic VM algorithm are based on CPU usage. They also take into account the cost of the host switching between different states. Sayadnavard et al. [41] proposed a server consolidation method based on multi-resource constraints. The goal is to minimize the number of hosts used by VM placement. When selecting a destination host for each VM to be migrated, the Markov chain model is used to determine whether that host would be overloaded soon. Yuan et al. [42] proposed to use the culture multiple-ant-colony algorithm to solve the server consolidation problem, thereby reducing the energy consumption of the hosts. The energy consumption model is based on CPU and does not specifically consider SLAV factors. Mamun et al. [43] proposed a server consolidation problem in the CDC with wireless network structures. They proposed an energy consumption model consisting of CPU-oriented host energy consumption and energy consumption of various network devices in the network topology. A network-ware VM placement method is proposed to reduce energy consumption. However, they did not consider SLAV.

3. Cost Model and Problem Description

In this section, we first formally describe the multiple computational resources based cost model in CDC, and based on this, we describe the problem definition.

3.1. Cost Model

In CDC, the cost related to computational resources mainly involves hosts, VM migrations, and SLAV penalties.
Before giving a specific cost model, we first describe the time and objects of the entire system. There are N heterogeneous hosts in CDCs, forming the host set H = { h 1 , h 2 , , h N } . The total resources provided by a host h i is denoted as C i = ( c i c p u , c i m e m , c i d i s k , c i n e t ) , where c i c p u , c i m e m , c i d i s k , and c i n e t are the total available usage of CPU, memory, Disk R/W, and network I/O throughput, respectively. There are M running VMs, forming VM set V = { v 1 , v 2 , , v M } . When a user creates a VM v j , the submitted resource requirements are denoted as D j = ( d j c p u , d j m e m , d j d i s k , d j n e t ) , where d j c p u , d j m e m , d j d i s k , and d j n e t are the total requirements of CPU, memory, Disk R/W, and network I/O throughput, respectively.
The life cycle of the CDC is divided into L small and equally long consecutive time segment { t 1 , t 2 , , t L } , and each time segment has a length of T. In a certain time segment t k , if host h i is in the working state, then we have λ i , k = 1 ; otherwise, λ i , k = 0 . The amount of resources that h i can provide are denoted as R i , k = ( r i , k c p u , r i , k m e m , r i , k d i s k , r i , k n e t ) . In t k , the resources required by VM v j are denoted as S j , k = ( s j , k c p u , s j , k m e m , s j , k d i s k , s j , k n e t ) .
We summarize the total cost of CDC for a given lifetime by analyzing the performance of each computing device in each time segment. In addition to the energy consumption cost of the hosts, it is also necessary to consider the cost of VM migration during each server consolidation and the penalty caused by the occurrence of SLAV. We will discuss them separately in the following subsections.

3.1.1. Cost of Host Energy Consumption

Given a host h i , its cost C h i during operation is mainly related to the electricity fee E P and its power p h i , which is:
C h i = k = 1 L ( E P × p h i × T × λ i , k ) .
It should be noted that if h i is powered off or in a power-saving state, its power consumption is negligible, so it will not incur any electricity-related costs.
In CDCs, in order to meet the various business demands of a large number of different users, almost all hosts are high-power and high-performance components. Therefore, it is impractical to model host power consumption only by analyzing the CPU. Basmadjian et al. [44] calculated that the average energy consumption of CPU in the host is about 37%. It should be noted that when users create or apply for a VM in a CDC, they need to directly specify the relevant parameters of the CPU, memory, disk, and network devices. Here, we can think that the host power consumption is related to the four resources: CPU, memory, disk, and network interface card (NIC). Therefore, in this paper, we build the power consumption model of the host based on these four resources:
  • CPU power model
Buyya et al. [6] leveraged a linear CPU power consumption model, that is, the current energy consumption of the CPU is directly related to its usage rate. When the CPU usage increases, the power of the host increases linearly. Although the linear model is simple to calculate, the result is lower than the actual power consumption value and the accuracy is not enough [24]. Therefore, we adopt an exponential model to fit the power consumption of the CPU.
Under normal circumstances, even if the working host is in idle state (the CPU usage rate is 0), the CPU will also generate a certain amount of energy consumption. This part of the energy consumption needs to be considered. Hence, the CPU power p i , k c p u of host h i in t k can be described as:
p i , k c p u = p i d l e , i c p u + ( p p e a k , i p i d l e , i ) × ( u i , k c p u ) α i c p u ,
where p i d l e , i c p u is the CPU power of h i in idle state, p p e a k , i is the CPU power of h i when it is running at full load, u i , k c p u is the usage rate of the CPU at any time in t k , and α i c p u is the fit index. Here, we assume that the CPU usage at any time in t k is equal and is u i , k c p u . For a given CPU model, the powers in idle state and at full load are fixed values. According to [45], the value of α i c p u varies with the CPU usage. However, in general, the CPU power can be described more accurately when α i c p u = 0.75 [45].
  • Memory power model
The power consumption of memory is related to its reading and writing. Accurately monitoring the read and write speed of memory in real time is too expensive. An alternative low-overhead solution is to estimate the throughput of memory through last level cache (LLC) miss counter available in processors. However, this approach is still impractical because the specific instructions associated with LLC misses are different in different CPU models [22].
In fact, in all public host or VM workload trace datasets, the provided memory-related data are the memory usage in certain periods of time. Therefore, we use the current footprint of memory (current usage of memory) u i , k m e m to estimate the power p i , k m e m of host h i :
p i , k m e m = p i d l e , i m e m + α i m e m × u i , k m e m ,
where p i d l e , i m e m is memory power of h i in idle state, α i c p u is the fit index, and u i , k m e m is the usage rate of memory at any time in t k . Here, we assume that the memory usage at any time in t k is equal and is u i , k m e m . According to statistics, for a DDR memory system, when α i m e m = 0.3  W/GB, the memory power can be estimated more accurately [46].
  • Disk energy consumption model
The power consumption of disk p i , k d i s k is directly related to its read and write throughput, we have:
p i , k d i s k = α i r × u i , k r + α i w × u i , k w + p i d l e , i d i s k ,
where u i , k r and u i , k w are the disk read and write number of bytes of h i in any time in t k , respectively; α i r and α i w are the disk read and write fit index, respectively; and p i d l e , i d i s k is the disk power in idle state. Here, we assume that the disk read and write number of bytes at any time in t k are equal. Kansal et al. [22] discovered that the difference between disk read power and write power is negligible. Hence, Equation (4) can be updated as:
p i , k d i s k = p i d l e , i d i s k + α i d i s k × u i , k d i s k ,
where u i , k d i s k = u i , k r + u i , k w , and α i d i s k is the fit index.
In the actual operation of host, the power consumption of the disk is related to random read and write and sequential read and write. Different read and write modes generate different read and write power consumption. Hence, α i d i s k is divided into α i s e q (sequential read and write) and α i r n d (random read and write). In practice, it is impractical to monitor and detect the read/write mode of the disk. Therefore, for convenience, we can assume that the probability of sequential reads and writes caused by all users is equal to that of random reads and writes. We obtain:
α i d i s k = ( α i s e q + α i r n d ) / 2 .
Lin et al. [24] found that setting α i s e q = 0.07 W/MB/s and α i r n d = 0.22 W/MB/s can more accurately estimate the power consumption of the disk.
  • Network power model
We mainly discuss the power of NICs. Under the framework of the TCP/IP protocol, we discuss the power NIC p i , k N I C , which is mainly based on the number of IP packets (the size of an IP packet is 500 bytes, denoted as i p _ s i z e ) that it can handle per second. We have:
p i , k N I C = p i d l e , i N I C + α i N I C × u i , k N I C ,
where α i N I C is the power consumption of each IP packet processed by h i ’s NIC, u i , k N I C = s j , k n e t / i p _ s i z e is the number of IP packets processed at any time in t k , and p i d l e , i N I C is the power of NIC in idle state. Here, we assume that the number of IP packets processed by the NIC at any time in t k is equal. Garcia-Saavedra et al. [47] found that the value of α i N I C should be 1.26 × 10 5 joule/packet.
Hence, combining with Equations (2), (3), (5) and (7), we obtain:
p h i = p i , k c p u + p i , k m e m + p i , k d i s k + p i , k N I C .
Hence, combing with Equation (1), we obtain the energy consumption cost of all hosts  C H :
C H = i = 1 N C h i = i = 1 N k = 1 L [ E P × ( p i , k c p u + p i , k m e m + p i , k d i s k + p i , k N I C ) × T × λ i , k ] .

3.1.2. Cost of VM Migration

We assume that at the beginning of each time segment, the CDC performs server consolidation to balance between the CSP’s cost and the user’s performance. VM migration is an important part of server consolidation. The majority of data transferred during VM live migration are from the memory of VM. Although the VM generates several dirty pages during live migration, Dargie et al. [26] found that the energy consumption of VM live migration is positively related to the memory size of the migrating VM. The larger the VM memory at the time of migration, the longer the migration time and the more energy consumption.
In CDC, migrating a VM v j from host h i to another host h i requires additional resources provided by h i to support the process. We assume that h i reserves enough resources to complete the migration of v j , and h i also reserves enough resources to receive v j .
The work conducted by Buyya et al. [6] on server consolidation is based on the assumption that a VM consumes an additional 10% of CPU usage during migration. In this paper, we extend this assumption to the VM usage with respect to all four computing resources in migration. In addition, in order to speed up batch VM migration and reduce the impact on user services during batch VM migration, we assume that CDC deploys an exclusive NIC and network for migration tasks, and here the size of the exclusive migration bandwidth of h i is M I G _ N E T i . Therefore, the VM migrations cloud be completed within a time segment. This part of the cost is denoted as C k m i g _ n e t . In this paper, the total cost of VM migration of a CDC in a given life cycle is denoted as C m i g . Since the power model of CPU is a nonlinear exponential model, and the power model of other resources are linear, C m i g can be described as:
C m i g = k = 1 L ( C k v c p u + C k m i g + C k m i g _ n e t ) ,
where C k v c p u is the cost caused by CPU when migrating VMs within t k , and C k m i g is the cost caused by non-CPU resources when migrating VMs within t k .
C k m i g is described as:
C k m i g = i = 1 N j = 1 M i = 1 N ( E P × γ j , i , i , k × p j , k m i g × t j , k m i g ) ,
where γ j , i , i , k is a 0-1 indicator, p j , k m i g is the power of migrating v j , and t j , k m i g is the time spent migrating v j . If v j is migrated from h i to h i , then γ j , i , i , k = 1 ; otherwise γ j , i , i , k = 0 . Furthermore, it is obviously that t j , k m i g < T . Since VM memory is the majority data transferred during migration, there are:
t j , k m i g = s j , k m e m m i g _ b w i , k ,
where m i g _ b w i , k is the migration bandwidth size allocated by h i to v j . We assume that the dedicated migration bandwidth is evenly allocated to each migrated VM within t k on h i . Hence, for a given h i and a h i , we have:
m i g _ b w i , k = M I G _ N E T i i = 1 N j = 1 M i = 1 N γ j , i , i , k ,
then
t j , k m i g = s j , k m e m × i = 1 N j = 1 M i = 1 N γ j , i , i , k m i g _ b w i , k × M I G _ N E T i .
Then, we substitute Equation (14) into Equation (11).
We let p j , k v m e m , p j , k v d i s k , and p j , k v n e t be the memory, disk, and network power of v j within t k , respectively. The migration cost regarding memory of v j is 0.1 × p j , k v m e m = 0.1 × α i m e m × s j , k m e m , the migration cost regarding memory of v j is 0.1 × p j , k v d i s k = 0.1 × α i d i s k × s j , k d i s k , and the migration cost regarding network of v j is 0.1 × p j , k v n e t = 0.1 × α i N I C × u i , k N I C . Hence, there are
p j , k m i g = 0.1 × ( α i m e m × s j , k m e m + α i d i s k × s j , k d i s k + α i N I C × u i , k N I C ) .
Based on the power model in Equation (7), we discuss C k m i g _ n e t .
C k m i g _ n e t = 2 × i = 1 N ( E P × λ i , k × p i , k m i g _ n e t × T ) ,
where p i , k m i g _ n e t is the power of the dedicated NIC for migration on h i . Since h i is sending VM data and h i is receiving corresponding data, the NIC power consumption of both needs to be considered. We obtain
p i , k m i g _ n e t × T = ( α i N I C × u i , k m i g _ n e t + p i d l e , i N I C ) × T ,
where u i , k m i g _ n e t × T is the total transferred VM data from h i to h i 1 in time segment t k . Then we have:
u i , k m i g _ n e t × T = j = 1 M ( γ j , i , i , k × s j , k m e m ) .
We substitute Equation (18) into Equation (16) and have:
C k m i g _ n e t = 2 × i = 1 N [ E P × λ i , k × ( α i N I C × j = 1 M γ j , i , i , k × s j , k m e m + T × p i d l e , i N I C ) ] .
We discuss C k v c p u in the following paragraph. A 0-1 indicator β i , j , k is used to mark whether VM v j is running on host h i at the beginning of t k . If v j is running on h i , then β i , j , k = 1 ; otherwise, β i , j , k = 0 . We assume that the power consumption of the hosts in CDC is mainly used to keep VMs running. Hence, regarding h i , ( p p e a k , i p i d l e , i ) × ( u i , k c p u ) α i c p u in Equation (2) can be presented as:
( p p e a k , i p i d l e , i ) × ( u i , k c p u ) α i c p u = ( p p e a k , i p i d l e , i ) × [ j = 1 M ( β i , j , k × s j , k c p u ) ] α i c p u .
Then, we take VM migrations on h i into consideration:
( p p e a k , i p i d l e , i ) × [ j = 1 M ( β i , j , k × s j , k c p u ) + 0.1 × j = 1 M ( γ j , i , i , k × s j , k c p u ) ] α i c p u = ( p p e a k , i p i d l e , i ) × { j = 1 M [ ( β i , j , k + 0.1 × γ j , i , i , k ) × s j , k c p u ] } α i c p u .
It should be noted that if γ j , i , i , k = 1 , then β i , j , k = 1 .
Then, we substitute Equation (21) into Equation (2). We denote the new C H caused by updated p i , k c p u as C H .

3.1.3. SLAV Penalty

In CDCs, once SLAV emerges, then the CSP must compensate the relevant users in some form. This is an effective method for CSPs to guarantee QoS after users pay for cloud services. This part of the overhead needs to be included in the cost of CDC.
We expand the CPU-based SLA definition [13] to CPU SLAV, memory SLAV, disk SLAV, and network SLAV, which are denoted as S L A V c p u , S L A V m e m , S L A V d i s k , S L A V n e t , respectively.
S L A V c p u = 1 N i = 1 N T i s , c p u T i a , c p u × 1 M j = 1 M k = 1 L u d i , k d , c p u s i , k r , c p u ,
where T i s , c p u is CPU SLAV duration caused by CPU overloaded on h i , and T i a , c p u is the working duration of h i , and d i d , c p u is the size of the unsatisfied demand for CPU resources as a result of v j migration in t k .
Similarly, we formally define S L A V m e m , S L A V d i s k and S L A V n e t as follows:
S L A V m e m = 1 N i = 1 N T i s , m e m T i a , m e m × 1 M j = 1 M k = 1 L u d i , k d , m e m s i , k r , m e m ,
S L A V d i s k = 1 N i = 1 N T i s , d i s k T i a , d i s k × 1 M j = 1 M k = 1 L u d i , k d , d i s k s i , k r , d i s k ,
and
S L A V n e t = 1 N i = 1 N T i s , n e t T i a , n e t × 1 M j = 1 M k = 1 L u d i , k d , n e t s i , k r , n e t ,
where the definitions of corresponding parameters are similar to those of S L A V c p u , and no further explanation will be given here.
In a given life cycle of CDC, the penalty cost caused by the appearance of SLAV is
C S L A V = p u n c p u × S L A V c p u + p u n m e m × S L A V m e m + p u n d i s k × S L A V d i s k + p u n n e t × S L A V n e t ,
where p u n c p u , p u n m e m , p u n d i s k , and p u n n e t are the CPU, memory, disk, and network SLAV penalty price index, respectively.

3.2. Problem Description

In Section 3.1, we discuss the cost models involved in the operating cost of computing equipment in CDC, and they are host energy consumption cost C H , the VM migration cost C m i g , and SLAV penalty cost C S L A V . In this paper, our goal is to minimize the associated operating cost C of CDC. Based on above models, we have:
M I N C = C H + k = 1 L ( C k m i g + C k m i g _ n e t ) + C S L A V .
We name this problem as the Minimizing Computing Resources Cost problem (MCRC problem) in server consolidation. The constraints of MCRC problem are:
i = 1 N β i , j , k = 1 , j , k ,
i = 1 N γ j , i , i , k = 1 , i i , j , k ,
j = 1 M β i , j , k × s j c p u s i c p u , i , k ,
j = 1 M β i , j , k × s j m e m s i m e m , i , k ,
j = 1 M β i , j , k × s j d i s k s i d i s k , i , k ,
and
j = 1 M β i , j , k × s j n e t s i n e t , i , k .
Equation (28) indicates that any VM can only run on a unique host in any given time segment. Equation (29) indicates that the destination host of any VM migration exists and is unique. In Equations (30)–(33) indicate that the CPU, memory, disk and network resources provided by each host to the VMs cannot exceed its own resource limits, respectively.
Now we analyze the complexity of the MCRC problem. Considering a simple case that satisfies the following conditions. If the hosts in the CDC are homogeneous, the resource requirements of any VM v j in any time segment t k are fixed values and satisfy In Equations (30)–(33), then the VM migration cost and SLAV penalty cost are both 0, and the objective function of the MCRC problem is:
M I N C = C H .
Apparently, the MCRC problem in this simple case can be reduced to the Bin-Packing problem. Since Bin-Packing problem is NP-hard, the MCRC problem is also NP-hard.

4. Solution for MCRC Problem

Since the MCRC problem is NP-hard, we propose a heuristics solution in this paper based on the traditional three-step method for dealing with server consolidation. The first step is to detect the overloaded and underloaded host; the second step is to select suitable VMs from the overloaded host to be migrated; the third step is to select appropriate hosts VMs to be migrated.
Before performing host overloaded detection and VM selection, we first predict the future workload changes of the VMs based on their historical workloads. The purpose of this is to balance loads of hosts before they become overloaded and cause the occurrence of SLAV, thereby reducing cost.

4.1. VM Workload Prediction

Before predicting the future workload of a VM, we first need to preprocess the history of the workload. In Section 3.1.1, we assume that in a certain time segment t k , the usage of a given resource by a VM is constant. In practice, this phenomenon is rare. For example, assuming that the sampling interval for workload records is five minutes (five minutes is the sampling interval used by most public workload trace data sets), then the CPU usage of the VM during these five minutes will vary according to business conditions. The following situation is likely to occur. During the 299 s after the ( k 1 ) -th sampling, the CPU usage of a VM is extremely low. However, at the 300th s (k-th sampling), the CPU usage of the VM instantly increases to 90%. This record does not reflect that the VM’s CPU has been under heavy load for the past five minutes. This means two points: (1) there is a certain deviation between the historical sampling records and the actual use of resources by the VM; (2) The assumption in Section 3.1.1 has a certain deviation from the actual use of resources by the VM. To minimize the impact of these biases on the final result, we can consider the presence of noise in the workload records. In this paper, we leverage the convolutional auto-encoder (CAE) to build a filter algorithm to preprocess the workload of the VM and then adopt the Attention-based RNN method for prediction.

4.1.1. A Convolutional Auto-Encoder-Based Filter

CAE has been shown to process time series data well [48].
We use the CAE based filter to process the four resource usages of the VM, respectively. We take the historical time series data of CPU usage by VM v j as an example to illustrate the proposed CAE based filter. But we will not describe the mathematical principles of CAE in detail in this paper.
Let the CPU usage record of v j in l consecutive time periods as time series data { s j , 1 c p u , s j , 2 c p u , , s j , l c p u } . After being processed by the encoder and decoder of the CAE based filter, the denoised time series data of CPU usage is { s j , 1 c p u , s j , 2 c p u , , s j , l c p u } . The neural network structure of the CAE based filter is shown in Figure 1.
As shown in Figure 1, both the encoder and decoder are three-layer network structures, and the activation function is T a n h ( ) . In our actual use, the input is data collected for 48 consecutive time periods (4 h) and normalized. The data collected in the last 5000 consecutive time periods are used for training. Figure 2 shows the result of denoising the CPU usage of a VM in the Alibaba CDC [49] with the proposed CAE based filter.

4.1.2. An Attention-Based RNN Prediction Method

The time series data denoised by the CAE based filter are used to predict the resource usage of the VM in the future time period. The proposed prediction method is based on attention mechanism, named as the ARP algorithm, and is based on our previous work [50].The ARP algorithm predicts the usage of four resources by VMs at the same time, without separate processing.
The ARP algorithm consists of two parts: the encoder and the decoder. In the encoder, an attention module is used to adaptively select the relevant series. Then in the decoder, the relevant encoder hidden states is selected via another attention module [50]. We demonstrate the neural network structures of the encoder and the decoders in Figure 3 and Figure 4, respectively.
Let S j , l = { s j , l c p u , s j , l m e m , s j , l d i s k , s j , l n e t } , s j , l c p u = { s j , 1 c p u , s j , 2 c p u , , s j , L c p u } (L is size of the window), s j , l m e m = { s j , 1 m e m , s j , 2 m e m , , s j , L m e m } , s j , l d i s k = { s j , 1 d i s k , s j , 2 d i s k , , s j , L d i s k } , s j , l n e t = { s j , 1 n e t , s j , 2 n e t , , s j , L n e t } . The attention mechanism in the encoder calculates the encoder attention weight α l q based on the previous l 1 hidden state, and then get S j , l = { α l 1 × s j , l c p u , α l 2 × s j , l m e m , α l 3 × s j , l d i s k , α l 4 × s j , l n e t } . Then S j , l is fed to the LSTM unit in the encoder. The attention mechanism in the decoder calculates the attention weight β l l based on the previous l 1 hidden state. At time l, for L S j , l , their attention weights are β l 1 to β l L at time l, respectively. The input information is represented as a weighted sum of the encoder hidden states across all the time steps, and then is fed to the LSTM unit of the decoder. The output result of LSTM is the prediction of the next time period. For the specific explanation and mathematical description of the model, such as the calculation method of the attention weights, please refer to our previous work [50].
As the examples, Figure 5, Figure 6 and Figure 7 demonstrate, respectively, the results of the CPU part after using the ARP algorithm (L = 10) to predict the resource usages of three different VMs in the Alibaba CDC [49].
After accurately predicting the resources usage of each VM in the next period, we can perform host workload detection and VM selection.

4.2. Host Workload Detection

The purpose of host overloaded detection is to avoid and eliminate the fierce competition of VMs for resources, thereby reducing the occurrence of SLAV. Common host overloaded detection methods are divided into two categories, static threshold method, and dynamic threshold method. Regarding the static threshold method, the overloaded thresholds for various resources are set as fixed values. When the usages exceed the thresholds, the host is overloaded, and SLAV occurs. At this time, the VMs must be migrated to reduce the load. Regarding the dynamic threshold method, various statistical methods are used to analyze the usages of computing resources by VMs or hosts, to determine whether the competition for resources by VMs is intense, and whether the host is overloaded. The advantage of the static threshold method is that the host resources are fully utilized, but the disadvantage is that once overloaded, more overhead is required to solve the SLAV. The advantage of the dynamic threshold method is that it can effectively reduce the SLAV, but sometimes the use of host resources is not sufficient. Therefore, we combine the advantages of the two and propose an ARP-based fixed threshold overloaded detection method (ARP-FT method).
ARP-FT is a dual detection method. At the beginning of the time segment t k , ARP-FT first detects whether the usage of various resources on h i exceeds the given thresholds, and then, based on the prediction result of the ARP algorithm, we judge whether the usage of various resources on h i exceeds the given thresholds in the next period.
Let the overloaded thresholds T H u p = { T H u p c p u , T H u p m e m , T H u p d i s k , T H u p n e t } , where T H u p c p u , T H u p m e m , T H u p d i s k , and T H u p n e t . They are all in the interval ( 0 , 1 ) . For host h i , it is overloaded in time segment t k when the following inequalities hold:
j = 1 M β i , j , k × s j c p u , T H u p c p u × c i c p u ,
j = 1 M β i , j , k × s j m e m > T H u p m e m × c i m e m ,
j = 1 M β i , j , k × s j d i s k > T H u p d i s k × c i d i s k ,
j = 1 M β i , j , k × s j n e t > T H u p n e t × c i n e t ,
j = 1 M β i , j , k + 1 × s j c p u > T H u p c p u × c i c p u ,
j = 1 M β i , j , k + 1 × s j m e m > T H u p m e m × c i m e m ,
j = 1 M β i , j , k + 1 × s j d i s k > T H u p d i s k × c i d i s k ,
j = 1 M β i , j , k + 1 × s j n e t > T H u p n e t × c i n e t .
Let the underloaded thresholds T H d o w n = { T H d o w n c p u , T H d o w n m e m , T H d o w n d i s k , T H d o w n n e t } , where T H d o w n c p u , T H d o w n m e m , T H d o w n d i s k , and T H d o w n n e t . They are all in the interval ( 0 , 1 ) . For host h i , it is underloaded in time segment t k when the following inequalities hold:
j = 1 M β i , j , k × s j c p u < T H d o w n c p u × c i c p u ,
j = 1 M β i , j , k × s j m e m < T H d o w n m e m × c i m e m ,
j = 1 M β i , j , k × s j d i s k < T H d o w n d i s k × c i d i s k ,
j = 1 M β i , j , k × s j n e t < T H d o w n n e t × c i n e t ,
j = 1 M β i , j , k + 1 × s j c p u < T H d o w n c p u × c i c p u ,
j = 1 M β i , j , k + 1 × s j m e m < T H d o w n m e m × c i m e m ,
j = 1 M β i , j , k + 1 × s j d i s k < T H d o w n d i s k × c i d i s k ,
j = 1 M β i , j , k + 1 × s j n e t < T H d o w n n e t × c i n e t .

4.3. VM Selection

VM selection is for overloaded hosts. The reason why we use the ARP algorithm is to avoid host overloading and SLAV as much as possible, rather than reactively responding to SLAV after its occurrence. Therefore, we assume that in time segment t k + 1 , there would be SLAV and overloaded hosts in the CDC, but not many. Regarding VM selection in t k for h i , our priority is to select the VMs that may cause h i to be overloaded during time segment t k + 1 and form a list of VMs to be migrated. After the migration of these VMs is completed, if h i is still overloaded in t k , it will be processed accordingly.
Given a host h i overloaded in the beginning of certain future time segment t l , we define its overloaded type o l _ t y p e i , l . We check which inequalities in Equations (35)–(38) are satisfied by h i , so as to determine which resources in cpu, memory, disk, and network are involved in its overload state. Let the CPU, memory, disk, and network overloaded marks be A , B , C , and D , respectively. In time segment t l , they are denoted as A i , l , B i , l , C i , l , and D i , l , respectively, and their corresponding values are V ( A i , l ) = c i c p u r i , l c p u T H u p c p u × c i c p u , V ( B i , l ) = c i m e m r i , l m e m T H u p m e m × c i m e m , V ( C i , l ) = c i d i s k r i , l d i s k T H u p d i s k × c i d i s k , and V ( D i , l ) = c i n e t r i , l n e t T H u p n e t × c i n e t . If h i satisfies inequalities (35) and (37), that is, V ( A i , l ) > 1 and V ( C i , l ) > 1 , then its overloaded type sequence can be denoted as ( A i , l C i , l ) . We sort the overloaded resources in h i ’s overloaded type sequence in descending order according to the overloaded mark value. For example, if the initial overloaded type sequence of h i is ( A i , l B i , l C i , l D i , l ) , and V ( A i , l ) = 0.1 , V ( B i , l ) = 0.3 , V ( C i , l ) = 0.05 , and V ( D i , l ) = 0.2 , then the sorted overloaded type sequence is ( B i , l D i , l A i , l C i , l ) . The sorted overloaded type sequence is o l _ t y p e i , l of h i in t l . Regarding the above instance, o l _ t y p e i , l = ( B i , l D i , l A i , l C i , l ) .
Given a VM v j , we define its workload type w l _ t y p e j , l in t l . Let its CPU, memory, disk, and network usage marks be A , B , C and D , respectively. In time segment t l , they are denoted as A j , l , B j , l , C j , l , and D j , l , respectively, and their corresponding values are V ( A j , l ) = s j , l c p u c m a x c p u , V ( B j , l ) = s j , l m e m c m a x m e m , V ( C j , l ) = s j , l d i s k c m a x d i s k , and V ( D j , l ) = s j , l n e t c m a x n e t , where c m a x c p u = m a x { c i c p u | i [ 1 , N ] } , c m a x m e m = m a x { c i m e m | i [ 1 , N ] } , c m a x d i s k = m a x { c i d i s k | i [ 1 , N ] } , and c m a x n e t = m a x { c i n e t | i [ 1 , N ] } . Let the initial workload type sequence of VM v j be ( A j , l B j , l C j , l D j , l ) . We sort the initial sequence in descending order according to the values of resource usage marks, and the sorted workload type sequence is w l _ t y p e j , l . For example, if V ( A j , l ) = 0.1 , V ( B j , l ) = 0.09 , V ( C j , l ) = 0.15 , and V ( D j , l ) = 0.02 , then w l _ t y p e j , l = ( C j , l A j , l B j , l D j , l ) . Then, we define the complementary workload type c w l _ t y p e j , l of v j in t l as the reverse sequence of w l _ t y p e j , l .
For the overloaded type of h j , we select as few VMs as possible to migrate, so that the values of A j , l , B j , l , C j , l , and D j , l are all less than or equal to 1. For example, if the overloaded type of h i is ( D A ) , then the VMs with the workload type ( D A ) should be selected for migration as much as possible. Therefore, we need to use ( D A ) as the reference to sort the VMs on h i .
Given marks A , B , C , and D , the corresponding values of host h i and VM v j in t l are V ( A i , l ) = r i , l c p u T H u p c p u × c i c p u , V ( B i , l ) = r i , l m e m T H u p m e m × c i m e m , V ( C i , l ) = r i , l d i s k T H u p d i s k × c i d i s k , V ( D i , l ) = r i , l n e t T H u p n e t × c i n e t , V ( A j , l ) = s j , l c p u , V ( B j , l ) = s j , l m e m , V ( C j , l ) = s j , l d i s k , and V ( D j , l ) = s j , l n e t . Given the overloaded reference R F = ( E 1 E p ) , where E g { A , B , C , D } , and the sort of E 1 E p corresponds to that of o l _ t y p e j , l . The resources usage of v j based on R F is denoted as R U j = ( e j , 1 e j , p ) , where e j , g { A , B , C , D } and the sort of e j , 1 e j , p also corresponds to that of o l _ t y p e j , l .
Regarding two different VMs v j and v j , if v j is better than v j based on R F , denoted as v j > v j , then the following is met:
g = 1 p α g R F × | V ( E g ) V ( e j , g ) | > g = 1 p α g R F × | V ( E g ) V ( e j , g ) | ,
where α g R F is the overloaded reference weight. Resources with higher overloaded levels should be prioritized to reduce load; hence the value of the corresponding α g R F should be larger. We let the value of α g R F be equal to the value of resource mark in w l _ t y p e j , l that corresponds to the E g of the same subscript g. For instance, assuming w l _ t y p e j , l = ( C j , l A j , l ) , then R F = ( E 1 E 2 ) = ( C j , l A j , l ) , α 1 R F = V ( C j , l ) , and α 2 R F = V ( A j , l ) .
If v j is equivalent to v j based on R F , denoted as v j = v j , then the following is met:
g = 1 p α g R F × | V ( E g ) V ( e j , g ) | = g = 1 p α g R F × | V ( E g ) V ( e j , g ) | .
If v j is worse than v j based on R F , denoted as v j < v j , then the following is met:
g = 1 p α g R F × | V ( E g ) V ( e j , g ) | < g = 1 p α g R F × | V ( E g ) V ( e j , g ) | .
According to the above given relationship between VMs based on R F , we sort the VMs on h i from ’good’ to ’bad’ as an ordered list. Subsequently, the VMs are sequentially selected from the ordered list and placed in the list of VM to be migrated. After each selection, we determine whether h i is still overloaded after removing the resources required by the selected VM. If it is overloaded, continue to select VMs in order; otherwise, stop VM selection.

4.4. VM Placement

In order to make full use of the resources of a host, we should fully consider the competition of various VMs for resources when placing them on the same host. This competition is divided into two aspects: space and time of resources usage.
In a given period of time, competition should be evenly distributed over different resources, rather than a situation where multiple VMs scramble for the same one or two resources while leaving others idle. We illustrate this with a simple example. Assume that the case where there are only two resources, CPU and memory, is considered. The CPU and memory provided by host H are both 10 units, and the VMs to be placed are V1–V7. The CPU and memory requirements ( s c p u , s m e m ) of these VMs are ( 2.40 , 1.16 ) , ( 2.46 , 1.11 ) , ( 2.46 , 0.97 ) , ( 0.73 , 2.72 ) , ( 1.05 , 2.13 ) , ( 0.87 , 1.91 ) , and ( 2.65 , 0.97 ) , respectively. Figure 8 demonstrates two methods of VM placement, which are shown in Figure 8a,b, respectively. Obviously, the placement in Figure 8a is more reasonable because this method allows the VMs to make full use of the two resources of the host H. In Figure 8b, the VMs’ scramble for resources is concentrated on CPU, so that the utilization rate of H’s CPU has reached the highest point, while the utilization rate of memory resources has not exceeded half, resulting in waste.
In the two adjacent time periods, the user’s business needs change, and the usage of certain resources of the host may change greatly. When we implement VM placement in t k , we must consider the resource usage of the VM in t k + 1 to avoid the overloaded state of the destination host in t k + 1 after migration. We illustrate this with a simple example. For the convenience of discussion, only CPU is considered. During time period t k , the CPU usage of host H is 30 % . For t k + 1 , the predicted value of the CPU usage of host H is 60 % . The VMs to be placed are V1–V5. Their required usage rate of CPU in t k and t k + 1 , { s k , s k + 1 } , are { 30 % , 15 % } , { 20 % , 10 % } , { 15 % , 10 % } , { 20 % , 30 % } , and { 10 % , 5 % } , respectively. Figure 9 demonstrates two methods of VM placement, which are shown in Figure 9a,b, respectively. Obviously, the placement in Figure 9a is more reasonable. In two consecutive time periods, H’s CPU load did not exceed the upper limit. The VM placement in Figure 9b lets H load more VMs in t k . However, in the t k + 1 period, the CPU load of H exceeds the upper limit, leaving H in the overloaded state, and SLAV occurs, so VM migration must be used to reduce the load. The placement of VMs for H in Figure 9b increases the operating cost of the CDC.
Based on this competition in resource space and time, we design a resources- complementary-usage-based VM placement strategy. There are two goals we want to achieve: (1) to ensure that the resources of the target host can be fully utilized during the t k period and (2) after placing the VMs on the given host h i , this host would not be overloaded in t k + 1 .
Given a host h i , we define its workload type W L _ t y p e i , k in t k . Let its CPU, memory, disk, and network usage marks be AW , BW , CW , and DW , respectively. In time segment t k , they are denoted as AW i , k , BW i , k , CW i , k , and DW i , k , respectively, and their corresponding values are V ( AW i , k ) = r i , k c p u c m a x c p u , V ( BW i , k ) = r i , k m e m c m a x m e m , V ( CW i , k ) = r i , k d i s k c m a x d i s k , and V ( DW i , k ) = r i , k n e t c m a x n e t . Let the initial workload type sequence of host h i be ( AW i , k BW i , k CW i , k DW i , k ) . We sort the initial sequence in ascending order according to the values of resource usage marks, and the ordered workload type sequence is W L _ t y p e j , l . For example, if V ( AW i , k ) = 0.1 , V ( BW i , k ) = 0.09 , V ( CW i , k = 0.15 , and V ( DW i , k ) = 0.02 , then W L _ t y p e j , l = ( DW i , k BW i , k AW i , k CW i , k ) . Then, we define the complementary workload type C W L _ t y p e i , k of h i in t k as the reverse sequence of W L _ t y p e j , l . In the production environment, the power consumption of the four resources in a host is sorted from largest to smallest as CPU, memory, disk, and network. Therefore, in a host workload type sequence, if two resource usages are equal, these two resources are sorted according to the order of their resource power consumption levels. Hence, we have 4 ! = 24 workload types.
We divided all non-overloaded hosts (including those underloaded ones) into 24 groups by the workload types. Then, the 24 groups of hosts are arranged in descending order by the number of hosts included to form a list:
H G = { W L T _ H o s t _ L i s t 1 , W L T _ H o s t _ L i s t 2 , , W L T _ H o s t _ L i s t 2 4 } .
Next, we sort the hosts in each W L T _ H o s t _ L i s t i . We compare two hosts of the same workload type as follows. Given the workload type ( W i , 1 W i , 2 W i , 3 W i , 4 ) , where W i , p { AW , BW , CW , DW } . Regarding two different host h i and h i of the same workload type:
If h i is better than h i , denoted as h i > h i , then the following is met:
p = 1 4 V ( W i , p ) > p = 1 4 V ( W i , p ) .
If h i is equivalent to h i , denoted as h i = h i , then the following is met:
p = 1 4 V ( W i , p ) = p = 1 4 V ( W i , p ) .
If h i is worse than h i , denoted as h i < h i , then the following is met:
p = 1 4 V ( W i , p ) < p = 1 4 V ( W i , p ) .
Hosts in the same group, according to the above method of comparison, are sorted in descending order to form an ordered group. All ordered groups are denoted as:
{ W L T _ H o s t _ L i s t _ s o r t e d 1 , W L T _ H o s t _ L i s t _ s o r t e d 2 , , W L T _ H o s t _ L i s t _ s o r t e d 24 } .
For each host given a workload type, in order to take full advantage of all its resources, it is best to pick and place VMs of the complementary work type that are the same as the host’s workload type. For instance, if the host is W L _ t y p e i , k = ( DW BW CW AW ) , then C W L _ t y p e i , k = ( AW CW BW DW ) , and the VMs of the complementary work type ( A C B D ) should be selected and placed on this host. Hence, we divide all VMs to be migrated into 24 groups according to their complementary workload types, denoted as:
V G = { c w l t _ V M _ L i s t 1 , c w l t _ V M _ L i s t 1 , , c w l t _ V M _ L i s t 24 } .
Next, we sort the VMs in each groups. We compare two VMs of the same complementary workload type as follows. Given workload type ( W j , 1 W j , 2 W j , 3 W j , 4 ) , where W j , p { A , B , C , D } . Regarding two different VMs v j and v j of the same complementary workload type:
If v j is better than v j , denoted as v j > v j , then the following is met:
p = 1 4 V ( W j , p ) > p = 1 4 V ( W j , p ) .
If v j is equivalent to v j , denoted as v j = v j , then the following is met:
p = 1 4 V ( W j , p ) = p = 1 4 V ( W j , p ) .
If v j is worse than v j , denoted as v j < v j , then the following is met:
p = 1 4 V ( W j , p ) < p = 1 4 V ( W j , p ) .
VMs in the same group, according to the above method of comparison, are sorted in descending order to form an ordered group. All ordered groups are denoted as:
V G = { c w l t _ V M _ L i s t _ s o r t e d 1 , c w l t _ V M _ L i s t _ s o r t e d 2 , , c w l t _ V M _ L i s t _ s o r t e d 24 } .
We select a set of hosts W L T _ H o s t _ L i s t _ s o r t e d x from H G in sequential order and obtain a VM group c w l t _ V M _ L i s t _ s o r t e d y that is the same as its complementary workload type. We select a host from W L T _ H o s t _ L i s t _ s o r t e d x in sequence and take out VMs from c w l t _ V M _ L i s t _ s o r t e d y in sequence and put them into the host. At this point, we need to judge whether the host is overloaded both in the current time period and the next time period after the VM is placed. If the answer is no, the VM can be placed on this host and will be removed from c w l t _ V M _ L i s t _ s o r t e d y ; otherwise, skip to the next VM in order.
Traversing c w l t _ V M _ L i s t _ s o r t e d y for the current host, we then process the next host in order. The above algorithm is named the Time and Space Complementary VM placement algorithm (TSCP algorithm), and its pseudo-code is shown in Algorithm 1.
Algorithm 1 TSCP algoritm.
Input: hostlist, vmlist
Output: allocation of the VMs
1:
H G ← get_HG(hostlist)
2:
for each W L T _ H o s t _ L i s t i in H G do
3:
W L T _ H o s t _ L i s t _ s o r t e d i ← get_sorted( W L T _ H o s t _ L i s t i )
4:
end for
5:
V G ← get_VG(vmlist)
6:
V G ← get_VG’( V G )
7:
for each c w l t _ V M _ L i s t i in V G do
8:
c w l t _ V M _ L i s t _ s o r t e d y ← get_sorted( c w l t _ V M _ L i s t i )
9:
end for
10:
for each W L T _ H o s t _ L i s t _ s o r t e d i do
11:
c w l t _ V M _ L i s t _ s o r t e d s a m e ← FindSameWLT( W L T _ H o s t _ L i s t _ s o r t e d i , V G )
12:
for each HOST in W L T _ H o s t _ L i s t _ s o r t e d i do
13:
  for each VM in c w l t _ V M _ L i s t _ s o r t e d s a m e do
14:
   if HOST is not overloaded with hosting VM in t k and t k + 1 then
15:
    allocation.add(VM,HOST)
16:
   end if
17:
  end for
18:
end for
19:
end forreturn allocation
After all the above operations are completed, if there are still some VMs not migrated, perform the following steps:
(Step 1)
Implementing the TSCP algorithm again but skipping the steps of VM grouping and sorting;
(Step 2)
If there are still some VMs not migrated, implement the First-Fit VM placement algorithm for these VMs;
(Step 3)
If there are still some VMs not migrated, it indicates that the resources of all working hosts are not enough, and the energy-saving mode hosts should be booted up.
After this, we perform underloaded host detection. If there are underloaded hosts, then we leverage the TSCP algorithm to place all the VMs on them.

5. Performance Evaluation

In this section, we evaluate the performance of our proposed solution with a real VM workload trace-driven simulation.

5.1. Experiment Setup

Based on the energy consumption analysis and statistics of each component of the host in the data center by Minartz et al. [51] and Jin et al. [52], we simulated three sizes of large, medium, and small hosts: H l a r g e , H m e d i u m , and H s m a l l , respectively. Their resource parameters are shown in Table 1, and their power parameters are shown in Table 2. We simulated a CDC consisting of 200 H s m a l l hosts, 100 H m e d i u m hosts, and 50 H l a r g e hosts.
The VM workload trace dataset is provided by the Alibaba CDC [49]. The dataset contains workload traces of 4000 VMs over 8 consecutive days. The traces in the record are generated by sampling every five minutes. Each trace records the CPU usage, memory usage, disk I/O, and network throughput of a VM at a sampling moment. We selected the traces of 1000 VMs in one day (a total of 288 time segments) from the dataset to simulate the users’ business demands for VMs. The simulation is implemented on the CDC simulator CloudMatrix Lite [53], and the CAE-based filter and the ARP algorithm is based on PyTorch [54].
We set the electricity price as E P = 0.25 $ / kWh, the SLAV penalty is a static price p u n c p u = p u n m e m = p u n d i s k = p u n n e t = 0.01 $ . A VM migration costs 10% extra resources; hence, the host reserves 10% resources for migrations. Based on this, we set T H u p c p u = T H u p m e m = T H u p d i s k = T H u p n e t = 0.9 . We also set T H d o w n c p u = T H d o w n m e m = T H d o w n d i s k = T H d o w n n e t = 0.1 .

5.2. Evaluation

In the evaluation, four overloading detection algorithms (MAD [6], IQR [6], and LR [6]), three VM selection algorithms (MMT [6,31,39], MC [6,55], and RS [6]), and one VM placement algorithm (PABFD [6]) are combined as nine methods to compare with our proposed solution. The PABFD placement algorithm and its corresponding energy consumption model only take into account the host’s CPU. Therefore, we modify it here to suit our multi-resource scenario. The new PABFD placement algorithm is PABFDM, see Algorithm 2 for the pseudo-code.
Algorithm 2 PABFDM algorithm.
Input: hostList, vmList
Output: allocation of the VMs
1:
vmList.sortDecreasingUtilization()
2:
for each VM in vmList do
3:
 minPower ← MAX
4:
for host in hostList do
5:
  if no SLAV on this host and this host meets the CPU, memory, disk and network resource requirement for VM then
6:
   power ← estimatePower(host,VM)
7:
   if power < minPower then
8:
    minPower←power
9:
   end if
10:
  end if
11:
end for
12:
 allocation.add(VM,host)
13:
end for
return allocation
The naming rule of the baseline methods is ’xx-xx’. For instance, LR-MMT indicates that the combination uses LR for overloading detection and MMT for VM selection. We denote our proposed method as ARP-TSCP.
The metrics involved are host-operation-related costs, SLAV cost, and the number of VM migrations. Since the CPU part of the VM migration energy consumption belongs to the host’s energy consumption during calculation, we use the number of VM migrations to indirectly measure the migration cost.
Figure 10 demonstrates the energy consumption comparison of all hosts using ARP-TSCP and the nine baseline methods in 288 consecutive periods. Figure 11 demonstrates the comparison of the total energy consumption of all hosts using these two methods to consolidate servers throughout the day. It can be seen in Figure 10 that, throughout most of the period, the host energy consumption produced by ARP-TSCP is lower than that produced by the baseline methods. Therefore, in the comparison of the total host energy consumption, ARP-TSCP outperforms the baseline methods. The total host energy consumption generated by executing ARP-TSCP in one day is about 18.5% lower than that generated by executing LR-MMT (the best result of all baseline methods) and is about 30.3% lower than that of MAD-RS (the worst result of all baseline methods), as shown in Figure 11.
Figure 12 demonstrates the comparison of the total SLAV of CPU, memory, disk, and network generated by ARP-TSCP and the baseline methods in one day respectively. As can be seen from Figure 12, in terms of reducing the SLAV of CPU and memory, ARP-TSCP has obvious improvement over all baseline methods. We can argue that based on the workload prediction method and eliminating potential overloaded states in advance is a feasible approach. In addition, the memory SLAV caused by performing ARP-TSCP is much smaller than that of other methods. This is because the baseline methods are essentially designed based only on CPU resource. The baseline host overload detection strategies only consider the CPU overloading situation, and the premise of the PABFD algorithm assumes that the host can provide unlimited memory resources for VMs. In the actual production environment, the competition between VMs for host memory resources is very fierce, resulting in a high frequency of memory SLAV. Most of the VMs in the Alibaba VM workload dataset are not disk IO intensive and require more CPU and memory resources [49]. In addition, the NIC throughput rate of 1 GB/s can meet the business needs of many users most of the time. Therefore, in terms of both disk and network, the related SLAV caused by all methods is less than that of CPU and memory.
As shown in Equations (22)–(25), in the SLAV calculation formula, the entire life cycle of the host is involved, so we do not show the SLAV generated in each time period. In Figure 13, we compare the total cost of the SLAV penalty for all methods over a full day. It can be seen that in the case of p u n c p u = p u n m e m = p u n d i s k = p u n n e t , the SLAV penalty cost caused by ARP-TSCP is about 38% lower than that of LR-MMT (the best result of all baseline methods) and is about 52% lower than that of MAD-RS (the worst result of all baseline methods). Due to the importance of CPU and memory in actual use as well as their larger proportion of power consumption of the host, if the values of p u n c p u and p u n m e m are greater than the values of p u n d i s k and p u n n e t , then ARP-TSCP will have a lower SLAV penalty cost. The advantage will be more obvious.
Figure 14 demonstrates the comparison of the number of VM migrations triggered by all methods in 288 consecutive periods in CDC. Figure 15 demonstrates the comparison of the total number of VM migrations caused by using these ten methods to consolidate servers throughout the day. As can be seen from Figure 14, in different periods, the number of VM migrations induced by ARP-TSCP is sometimes higher than that induced by the baseline methods, and it is sometimes lower than that induced by them. Therefore, as shown in Figure 15, the total amount of VM migrations induced by ARP-TSCP does not have much advantage over that of induced by the baseline methods throughout the day, and the two are very close.
In each period, ARP-TSCP and the baseline methods trigger VM migration for different reasons, respectively. ARP-TSCP actively considers the situation of overloaded hosts and SLAV occurrence in the next period and initiates VM migrations to deal with it in advance. The baseline methods passively respond to the host overloaded phenomenon generated in the current period. The total numbers of VM migrations are very close, but the mechanisms of triggering migration are different. Such differences are reflected in the final result, which is that the amount of SLAV occurrences caused by ARP-TSCP is much smaller than that caused by LR-MMT, as shown in Figure 13.
Compared with the baseline methods, our proposed ARP-TSCP has better improvements in host energy consumption, SLAV occurrence, and VM migration cost. Therefore, in the final total cost comparison, ARP-TSCP outperforms the other methods. As shown in Figure 16, compared to LR-MMT (the best result of all baseline methods), ARP-TSCP reduces the CDC operating cost by about 26.1%, and compared to MAD-RS (the worst result of all baseline methods), ARP-TSCP reduces the CDC operating cost by about 39.3%. This is because the comprehensive performance of ARP-TSCP is better than that of the baseline methods in all aspects.

6. Conclusions

In this paper, we focus on reducing CDC operating costs while ensuring user SLA requirements. We first established a cost model based on multiple computing resources in the CDC, taking into account the host energy cost, VM migration cost, and SLAV penalty cost. Based on this model, we define the MCRC problem in server consolidation. We devised the following step-by-step solution to deal with this problem. We employ a convolutional autoencoder-based filter to preprocess the workload of VMs and to reduce noise in the sampling record. Subsequently, an attention-based RNN method is used to predict the usage of computing resources by VMs in the future period, allowing us to trigger VM migration before the host enters the overloaded state to reduce the occurrence of SLAV. We design a heuristic algorithm based on the complementary use of multiple resources in space and time, ARP-TSCP, to solve the VM placement problem. Finally, simulations driven by real VM workload traces verify the effectiveness of our proposed method. Compared with the existing server consolidation methods, ARP-TSCP can reduce host energy consumption from 18.5% to 30.3%, SLAV cost by 38% to 52%, and total cost by 26.1% to 39.3%.
In the future, we will first consider a more comprehensive cost model, such as taking into account multi-core CPU, the network topology of CDC, use of network equipment, and cooling system. We will also design a forecasting method that can predict the resource usage of equipment over multiple time periods in the future, thereby further reducing the generation of SLAV.

Author Contributions

Conceptualization, H.L., Y.X. and Y.S.; methodology, H.L. and Y.X.; software, H.L.; validation, H.L. and Y.S.; formal analysis, H.L.; investigation, H.L. and H.X.; resources, H.L. and H.X.; data curation, H.L. and H.X.; writing—original draft preparation, H.L.; writing—review and editing, Y.X. and Y.S.; visualization, H.L.; supervision, Y.X.; project administration, Y.S.; funding acquisition, H.L., Y.X., H.X. and Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (No. 62002067), the Guangzhou Youth Talent Program (QT20220101174), the Department of Education of Guangdong Province (No. 2020KTSCX039), the SRP of Guangdong Education Dept (2019KZDZX1031), and the Natural Science Foundation of Education of Guizhou Province (No. KY[2017]351).

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. The State of Remote Work 2021. Available online: https://globalworkplaceanalytics.com/whitepapers (accessed on 8 August 2022).
  2. McKinsey Consumer Pulse. Available online: https://www.mckinsey.com/business-functions/growth-marketing-and-sales/our-insights/global-surveys-of-consumer-sentiment-during-the-coronavirus-crisis (accessed on 8 August 2022).
  3. De’, R.; Pandey, N.; Pal, A. Impact of digital surge during Covid-19 pandemic: A viewpoint on research and practice. Int. J. Inf. Manag. 2020, 55, 102171. [Google Scholar]
  4. Branscombe, M. The network impact of the global COVID-19 pandemic. New Stack 2020, 14. Available online: https://thenewstack.io/the-network-impact-of-the-global-covid-19-pandemic/ (accessed on 16 September 2022).
  5. Salesforce Increases Data Center Spend in 2021/22. Available online: https://www.datacenterdynamics.com/en/news/salesforce-increases-data-center-spend-in-202122/ (accessed on 8 August 2022).
  6. Beloglazov, A.; Buyya, R. Optimal online deterministic algorithms and adaptive heuristics for energy and performance efficient dynamic consolidation of virtual machines in cloud data centers. Concurr. Comput. Pract. Exp. 2012, 24, 1397–1420. [Google Scholar] [CrossRef]
  7. Aljoumah, E.; Al-Mousawi, F.; Ahmad, I.; Al-Shammri, M.; Al-Jady, Z. SLA in cloud computing architectures: A comprehensive study. Int. J. Grid Distrib. Comput. 2015, 8, 7–32. [Google Scholar] [CrossRef]
  8. Dhiman, G.; Mihic, K.; Rosing, T. A system for online power prediction in virtualized environments using gaussian mixture models. In Proceedings of the 47th Design Automation Conference, Anaheim, CA, USA, 13–18 June 2010; pp. 807–812. [Google Scholar]
  9. Ham, S.; Kim, M.; Choi, B.; Jeong, J. Simplified server model to simulate data center cooling energy consumption. Energy Build. 2015, 86, 328–339. [Google Scholar] [CrossRef]
  10. Kavanagh, R.; Djemame, K. Rapid and accurate energy models through calibration with IPMI and RAPL. Concurr. Comput. Pract. Exp. 2019, 31, e5124. [Google Scholar] [CrossRef]
  11. Gupta, V.; Nathuji, R.; Schwan, K. An analysis of power reduction in datacenters using heterogeneous chip multiprocessors. ACM Sigmetr. Perform. Eval. Rev. 2011, 39, 87–91. [Google Scholar] [CrossRef]
  12. Lefurgy, C.; Wang, X.; Ware, M. Server-level power control. In Proceedings of the Fourth International Conference on Autonomic Computing (ICAC’07), Jacksonville, FL, USA, 11–15 June 2007; p. 4. [Google Scholar]
  13. Beloglazov, A.; Abawajy, J.; Buyya, R. Energy-aware resource allocation heuristics for efficient management of data centers for cloud computing. Future Gener. Comput. Syst. 2012, 28, 755–768. [Google Scholar] [CrossRef]
  14. Rezaei-Mayahi, M.; Rezazad, M.; Sarbazi-Azad, H. Temperature-aware power consumption modeling in Hyperscale cloud data centers. Future Gener. Comput. Syst. 2019, 94, 130–139. [Google Scholar] [CrossRef]
  15. Chen, Y.; Das, A.; Qin, W.; Sivasubramaniam, A.; Wang, Q.; Gautam, N. Managing server energy and operational costs in hosting centers. In Proceedings of the 2005 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, Banff, AB, Canada, 6–10 June 2005; pp. 303–314. [Google Scholar]
  16. Wu, W.; Lin, W.; Peng, Z. An intelligent power consumption model for virtual machines under CPU-intensive workload in cloud environment. Soft Comput. 2017, 21, 5755–5764. [Google Scholar] [CrossRef]
  17. Lien, C.; Bai, Y.; Lin, M. Estimation by software for the power consumption of streaming-media servers. IEEE Trans. Instrum. Meas. 2007, 56, 1859–1870. [Google Scholar] [CrossRef]
  18. Economou, D.; Rivoire, S.; Kozyrakis, C.; Ranganathan, P. Full-system power analysis and modeling for server environments. In Proceedings of the International Symposium on Computer Architecture, Ouro Preto, Brazil, 17–20 October 2006. [Google Scholar]
  19. Alan, I.; Arslan, E.; Kosar, T. Energy-aware data transfer tuning. In Proceedings of the 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Chicago, IL, USA, 26–29 May 2014; pp. 626–634. [Google Scholar]
  20. Li, Y.; Wang, Y.; Yin, B.; Guan, L. An online power metering model for cloud environment. In Proceedings of the 2012 IEEE 11th International Symposium on Network Computing and Applications, Cambridge, MA, USA, 23–25 August 2012; pp. 175–180. [Google Scholar]
  21. Lent, R. A model for network server performance and power consumption. Sustain. Comput. Inform. Syst. 2013, 3, 80–93. [Google Scholar] [CrossRef]
  22. Kansal, A.; Zhao, F.; Liu, J.; Kothari, N.; Bhattacharya, A. Virtual machine power metering and provisioning. In Proceedings of the 1st ACM Symposium on Cloud Computing, Indianapolis, IN, USA, 10–11 June 2010; pp. 39–50. [Google Scholar]
  23. Lin, W.; Wang, W.; Wu, W.; Pang, X.; Liu, B.; Zhang, Y. A heuristic task scheduling algorithm based on server power efficiency model in cloud environments. Sustain. Comput. Inform. Syst. 2018, 20, 56–65. [Google Scholar] [CrossRef]
  24. Lin, W.; Wang, H.; Zhang, Y.; Qi, D.; Wang, J.; Chang, V. A cloud server energy consumption measurement system for heterogeneous cloud environments. Inf. Sci. 2018, 468, 47–62. [Google Scholar] [CrossRef]
  25. Maziku, H.; Shetty, S. Towards a network aware VM migration: Evaluating the cost of VM migration in cloud data centers. In Proceedings of the 2014 IEEE 3rd International Conference on Cloud Networking (CloudNet), Luxembourg, 8–10 October 2014; pp. 114–119. [Google Scholar]
  26. Dargie, W. Estimation of the cost of VM migration. In Proceedings of the 2014 23rd International Conference on Computer Communication and Networks (ICCCN), Shanghai, China, 4–7 August 2014; pp. 1–8. [Google Scholar]
  27. Li, H.; Li, W.; Wang, H.; Wang, J. An optimization of virtual machine selection and placement by using memory content similarity for server consolidation in cloud. Future Gener. Comput. Syst. 2018, 84, 98–107. [Google Scholar] [CrossRef]
  28. Li, H.; Li, W.; Zhang, S.; Wang, H.; Pan, Y.; Wang, J. Page-sharing-based virtual machine packing with multi-resource constraints to reduce network traffic in migration for clouds. Future Gener. Comput. Syst. 2019, 96, 462–471. [Google Scholar] [CrossRef]
  29. Li, H.; Li, W.; Feng, Q.; Zhang, S.; Wang, H.; Wang, J. Leveraging content similarity among vmi files to allocate virtual machines in cloud. Future Gener. Comput. Syst. 2018, 79, 528–542. [Google Scholar] [CrossRef]
  30. Li, H.; Wang, S.; Ruan, C. A fast approach of provisioning virtual machines by using image content similarity in cloud. IEEE Access 2019, 7, 45099–45109. [Google Scholar] [CrossRef]
  31. Yadav, R.; Zhang, W.; Kaiwartya, O.; Singh, P.; Elgendy, I.; Tian, Y. Adaptive energy-aware algorithms for minimizing energy consumption and SLA violation in cloud computing. IEEE Access 2018, 6, 55923–55936. [Google Scholar] [CrossRef]
  32. Hieu, N.; Di Francesco, M.; Ylä-Jääski, A. Virtual machine consolidation with multiple usage prediction for energy-efficient cloud data centers. IEEE Trans. Serv. Comput. 2017, 13, 186–199. [Google Scholar] [CrossRef]
  33. Esfandiarpoor, S.; Pahlavan, A.; Goudarzi, M. Structure-aware online virtual machine consolidation for datacenter energy improvement in cloud computing. Comput. Electr. Eng. 2015, 42, 74–89. [Google Scholar] [CrossRef]
  34. Arianyan, E.; Taheri, H.; Sharifian, S. Novel energy and SLA efficient resource management heuristics for consolidation of virtual machines in cloud data centers. Comput. Electr. Eng. 2015, 47, 222–240. [Google Scholar] [CrossRef]
  35. Rodero, I.; Viswanathan, H.; Lee, E.; Gamell, M.; Pompili, D.; Parashar, M. Energy-efficient thermal-aware autonomic management of virtualized HPC cloud infrastructure. J. Grid Comput. 2012, 10, 447–473. [Google Scholar] [CrossRef]
  36. Guan, H.; Yao, J.; Qi, Z.; Wang, R. Energy-efficient SLA guarantees for virtualized GPU in cloud gaming. IEEE Trans. Parallel Distrib. Syst. 2014, 26, 2434–2443. [Google Scholar] [CrossRef]
  37. Sahoo, P.; Mohapatra, S.; Wu, S. SLA based healthcare big data analysis and computing in cloud network. J. Parallel Distrib. Comput. 2018, 119, 121–135. [Google Scholar] [CrossRef]
  38. Sun, C.; Bi, J.; Zheng, Z.; Hu, H. SLA-NFV: An SLA-aware high performance framework for network function virtualization. In Proceedings of the 2016 ACM SIGCOMM Conference, Florianopolis, Brazil, 22–26 August 2016; pp. 581–582. [Google Scholar]
  39. Li, Z.; Yan, C.; Yu, L.; Yu, X. Energy-aware and multi-resource overload probability constraint-based virtual machine dynamic consolidation method. Future Gener. Comput. Syst. 2018, 80, 139–156. [Google Scholar] [CrossRef]
  40. Monshizadeh Naeen, H.; Zeinali, E.; Toroghi Haghighat, A. A stochastic process-based server consolidation approach for dynamic workloads in cloud data centers. J. Supercomput. 2020, 76, 1903–1930. [Google Scholar] [CrossRef]
  41. Sayadnavard, M.; Toroghi Haghighat, A.; Rahmani, A. A reliable energy-aware approach for dynamic virtual machine consolidation in cloud data centers. J. Supercomput. 2019, 75, 2126–2147. [Google Scholar] [CrossRef]
  42. Yuan, C.; Sun, X. Server consolidation based on culture multiple-ant-colony algorithm in cloud computing. Sensors 2019, 19, 2724. [Google Scholar] [CrossRef]
  43. Mamun, S.; Ganguly, A.; Markopoulos, P.; Kwon, M.; Kwasinski, A. NASCon: Network-Aware Server Consolidation for server-centric wireless datacenters. Sustain. Comput. Inform. Syst. 2021, 29, 100452. [Google Scholar] [CrossRef]
  44. Basmadjian, R.; Ali, N.; Niedermeier, F.; De Meer, H.; Giuliani, G. A methodology to predict the power consumption of servers in data centres. In Proceedings of the 2nd International Conference on Energy-efficient Computing and Networking, New York, NY, USA, 31 May–1 June 2011; pp. 1–10. [Google Scholar]
  45. Hsu, C.; Poole, S. Power signature analysis of the SPECpower_ssj2008 benchmark. In Proceedings of the (IEEE ISPASS) IEEE International Symposium on Performance Analysis of Systems and Software, Austin, TX, USA, 10–12 April 2011; pp. 227–236. [Google Scholar]
  46. Karyakin, A.; Salem, K. An analysis of memory power consumption in database systems. In Proceedings of the 13th International Workshop on Data Management on New Hardware, Chicago, IL, USA, 14–19 May 2017; pp. 1–9. [Google Scholar]
  47. Garcia-Saavedra, A.; Serrano, P.; Banchs, A.; Bianchi, G. Energy consumption anatomy of 802.11 devices and its implication on modeling and design. In Proceedings of the 8th International Conference on Emerging Networking Experiments and Technologies, Nice, France, 10–13 December 2012; pp. 169–180. [Google Scholar]
  48. Yin, C.; Zhang, S.; Wang, J.; Xiong, N. Anomaly detection based on convolutional recurrent autoencoder for IoT time series. IEEE Trans. Syst. Man, Cybern. Syst. 2020, 52, 112–122. [Google Scholar] [CrossRef]
  49. Lu, C.; Ye, K.; Xu, G.; Xu, C.; Bai, T. Imbalance in the cloud: An analysis on alibaba cluster trace. In Proceedings of the 2017 IEEE International Conference on Big Data (Big Data), Boston, MA, USA, 11–14 December 2017; pp. 2884–2892. [Google Scholar]
  50. Xi, H.; Yan, C.; Li, H.; Xiao, Y. An Attention-based Recurrent Neural Network for Resource Usage Prediction in Cloud Data Center. J. Phys. Conf. Ser. 2021, 2006, 012007. [Google Scholar] [CrossRef]
  51. Minartz, T.; Kunkel, J.; Ludwig, T. Simulation of power consumption of energy efficient cluster hardware. Comput. Sci.-Res. Dev. 2010, 25, 165–175. [Google Scholar] [CrossRef]
  52. Jin, Y.; Wen, Y.; Chen, Q.; Zhu, Z. An empirical investigation of the impact of server virtualization on energy efficiency for green data center. Comput. J. 2013, 56, 977–990. [Google Scholar] [CrossRef]
  53. Li, H.; Xiao, Y. CloudMatrix Lite: A Real Trace Driven Lightweight Cloud Data Center Simulation Framework. In Proceedings of the 2020 2nd International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI), Taiyuan, China, 23–25 October 2020; pp. 424–429. [Google Scholar]
  54. Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
  55. Cao, Z.; Dong, S. Dynamic VM consolidation for energy-aware and SLA violation reduction in cloud computing. In Proceedings of the 2012 13th International Conference on Parallel and Distributed Computing, Applications And Technologies, Beijing, China, 14–16 December 2012; pp. 363–369. [Google Scholar]
Figure 1. The neural network structure of the CAE based filter.
Figure 1. The neural network structure of the CAE based filter.
Applsci 12 09654 g001
Figure 2. Example of CAE based filter result.
Figure 2. Example of CAE based filter result.
Applsci 12 09654 g002
Figure 3. The neural network structure of the encoder of ARP.
Figure 3. The neural network structure of the encoder of ARP.
Applsci 12 09654 g003
Figure 4. The neural network structure of the decoder of ARP.
Figure 4. The neural network structure of the decoder of ARP.
Applsci 12 09654 g004
Figure 5. Example of the ARP algorithm result (1).
Figure 5. Example of the ARP algorithm result (1).
Applsci 12 09654 g005
Figure 6. Example of the ARP algorithm result (2).
Figure 6. Example of the ARP algorithm result (2).
Applsci 12 09654 g006
Figure 7. Example of the ARP algorithm result (3).
Figure 7. Example of the ARP algorithm result (3).
Applsci 12 09654 g007
Figure 8. Example of VM placement on resource space competition. (a) VM placement 1; (b) VM placement 2.
Figure 8. Example of VM placement on resource space competition. (a) VM placement 1; (b) VM placement 2.
Applsci 12 09654 g008
Figure 9. Example of VM placement on time competition: (a) VM placement 1; (b) VM placement 2.
Figure 9. Example of VM placement on time competition: (a) VM placement 1; (b) VM placement 2.
Applsci 12 09654 g009
Figure 10. Comparing the energy consumption of hosts by all methods in every time segment.
Figure 10. Comparing the energy consumption of hosts by all methods in every time segment.
Applsci 12 09654 g010
Figure 11. Comparing the total energy consumption hosts by all methods.
Figure 11. Comparing the total energy consumption hosts by all methods.
Applsci 12 09654 g011
Figure 12. Comparing the SLAV by all methods regarding four resources.
Figure 12. Comparing the SLAV by all methods regarding four resources.
Applsci 12 09654 g012
Figure 13. Comparing the total SLAV penalty cost by all methods.
Figure 13. Comparing the total SLAV penalty cost by all methods.
Applsci 12 09654 g013
Figure 14. Comparing the number of VM migrations triggered by all methods in every time segment.
Figure 14. Comparing the number of VM migrations triggered by all methods in every time segment.
Applsci 12 09654 g014
Figure 15. Comparing the total number of VM migrations triggered by all methods.
Figure 15. Comparing the total number of VM migrations triggered by all methods.
Applsci 12 09654 g015
Figure 16. Comparing the total cost by all methods.
Figure 16. Comparing the total cost by all methods.
Applsci 12 09654 g016
Table 1. Resource parameters of the hosts.
Table 1. Resource parameters of the hosts.
Host TypeCPUMemoryDisk ThroughputNetwork Throughput
H l a r g e 4 times Intel Xeon Northwood CPU (single core)8 GB399 MB/s1 GB/s
H m e d i u m 3 times Intel Xeon Northwood CPU (single core)6 GB266 MB/s1 GB/s
H s m a l l 2 times Intel Xeon Northwood CPU (single core)4 GB133 MB/s1 GB/s
Table 2. Power parameters of the hosts.
Table 2. Power parameters of the hosts.
Host TypeValueCPU (kW)Memory (kW)Disk (kW)NIC (kW)
H l a r g e p p e a k 0.2320.217360.021060.002
p i d l e 0.11240.175760.013260.00078
H m e d i u m p p e a k 0.1740.108680.014040.002
p i d l e 0.8430.087880.008840.00078
H s m a l l p p e a k 0.1160.054340.007020.002
p i d l e 0.5620.043940.004420.00078
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Li, H.; Shen, Y.; Xi, H.; Xiao, Y. Complementary in Time and Space: Optimization on Cost and Performance with Multiple Resources Usage by Server Consolidation in Cloud Data Center. Appl. Sci. 2022, 12, 9654. https://doi.org/10.3390/app12199654

AMA Style

Li H, Shen Y, Xi H, Xiao Y. Complementary in Time and Space: Optimization on Cost and Performance with Multiple Resources Usage by Server Consolidation in Cloud Data Center. Applied Sciences. 2022; 12(19):9654. https://doi.org/10.3390/app12199654

Chicago/Turabian Style

Li, Huixi, Yongluo Shen, Huidan Xi, and Yinhao Xiao. 2022. "Complementary in Time and Space: Optimization on Cost and Performance with Multiple Resources Usage by Server Consolidation in Cloud Data Center" Applied Sciences 12, no. 19: 9654. https://doi.org/10.3390/app12199654

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop