Next Article in Journal
The Design of a Safe Charging System Based on PKS Architecture
Previous Article in Journal
Detection of Fake Replay Attack Signals on Remote Keyless Controlled Vehicles Using Pre-Trained Deep Neural Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

More than Meets One Core: An Energy-Aware Cost Optimization in Dynamic Multi-Core Processor Server Consolidation for Cloud Data Center

1
School of Information Science, Guangdong University of Finance and Economics, Guangzhou 510320, China
2
Guangdong Intelligent Business Engineering Technology Research Center, Guangdong University of Finance and Economics, Guangzhou 510320, China
3
School of Chinese Language and Literature, Nanjing Xiaozhuang University, Nanjing 211171, China
*
Author to whom correspondence should be addressed.
Electronics 2022, 11(20), 3377; https://doi.org/10.3390/electronics11203377
Submission received: 29 September 2022 / Revised: 14 October 2022 / Accepted: 18 October 2022 / Published: 19 October 2022

Abstract

:
The massive number of users has brought severe challenges in managing cloud data centers (CDCs) composed of multi-core processor that host cloud service providers. Guaranteeing the quality of service (QoS) of multiple users as well as reducing the operating costs of CDCs are major problems that need to be solved. To solve these problems, this paper establishes a cost model based on multi-core hosts in CDCs, which comprehensively consider the hosts’ energy costs, virtual machine (VM) migration costs, and service level agreement violation (SLAV) penalty costs. To optimize the goal, we design the following solution. We employ a DAE-based filter to preprocess the VM historical workload and use an SRU-based method to predict the computing resource usage of the VMs in future periods. Based on the predicted results, we trigger VM migrations before the hosts move into the overloaded state to reduce the occurrence of SLAV. A multi-core-aware heuristic algorithm is proposed to solve the placement problem. Simulations driven by the VM real workload dataset validate the effectiveness of our proposed method. Compared with the existing baseline methods, our proposed method reduces the total operating cost by 20.9~34.4%.

1. Introduction

The world is entering a post-coronavirus era. Since countries and multinational cooperative organizations still have not formed a unified, reliable, and effective means of epidemic prevention, a local epidemic that could break out at any time brings a high risk of spreading to the world. This situation has forced people to further embrace cloud computing, migrating much of their economic, social, and personal activities online. For example, about 82% of Hong Kong businesses plan to maintain remote work in the post-COVID-19 era [1]. This trend has brought opportunities for cloud computing, as well as management pressure. According to estimates, the current compound annual growth rate of the Hong Kong data center market value is 12.6%, which means that the value will reach HKD 4.12 billion by 2026 [2]. The increase in market value means that practitioners need more cost investment.
Increasing the resource rate of cloud data centers (CDCs) is one of the most effective means to reduce management costs, but there is a conflict between reducing costs and the performance that cloud service customers receive. To improve resource usage, virtual machines (VMs) or containers assigned to users must be highly concentrated on physical hosts. However, a high degree of centralization brings a high degree of resource competition. When the competition is too intense, the host may be overloaded, thereby reducing the performance and user experience of VMs. To ensure the user experience, service level agreements (SLAs) are used to quantitatively describe the corresponding quality of service (QoS). If the SLA cannot be maintained, the QoS is threatened, a and SLA violation (SLAV) is generated. When a SLAV appears, cloud service providers (CSPs) need to provide compensation to users as punishment for failing to meet user performance requirements. Currently, server consolidation is used to dynamically adjust the load balance between hosts in a CDC. Server consolidation periodically checks the load of hosts in the cluster and initiates VM migration to achieve load balancing, thereby maintaining a balance between resource utilization and performance.
Multiple works designing server consolidation solutions assume that the physical host is equipped with a single-core CPU, and multi-core processors have long been popular in personal entertainment, scientific research, and data centers. A CPU package consists of multiple dies, and each die encapsulates multiple cores. Due to the involvement of inter-core communication, inter-die communication, and other CPU components, the power consumption of a multi-core CPU is much higher than that calculated by the single-core CPU power consumption model. Therefore, the server consolidation model based on a single-core processor cannot accurately describe the user’s energy demand. In addition, CSPs need to provide additional overhead to maintain VM migration in server consolidation and possible SLAV compensation. In this paper, we establish a server consolidation cost model based on the use of multi-core processor memory resources, VM migration, and SLAV compensation and propose corresponding solutions to achieve a balance between cost and performance. Our contributions are as follows:
(A)
We formally define a host power consumption model based on multi-core CPU and memory resource usage and describe the cost of VM migration and SLAV on this basis. After proposing the cost model, we give the corresponding optimization problem.
(B)
A denoise autoencoder-based filter is used to denoise the VM workload trace. Subsequently, we use the SRU-based RNN method to predict the workload of VMs. Based on the predicted results, a host load detection strategy is proposed that considers both current and future load conditions.
(C)
To minimize the total cost of server consolidation, we propose a VM selection strategy and a VM placement algorithm. These methods take into account the scheduling and placement of VMs between different cores of the same CPU and between different CPUs of different hosts, as well as the current and future requirements of VMs for different resources.
(D)
We conduct simulations to evaluate the performance of our proposed solution MMCC. The simulations’ results indicate that MMCC can reduce host energy consumption by 10~43.9%, SLAV cost by 33.5~51.7%, and total cost by 20.9~34.4% compared to the baseline methods.
The remainder of the paper is organized as follows. In Section 2, we survey the related work. In Section 3, we formalize the cost model and define the corresponding optimization problem. In Section 4, we propose a heuristic algorithm to solve this problem. In Section 5, we evaluate the performance of our proposed method using trace-driven simulations based on real VM workloads. In Section 6, we include the paper and discuss future works.

2. Related Work

In this section, we survey the CDC cost model related to server consolidation and the corresponding solutions.

2.1. Server Consolidation Cost Models

Based on single-core CPU usage or performance, a large number of works on server consolidation proposed host energy models [3,4,5,6,7,8,9,10,11,12]. Nagadevi et al. [13] proposed a VM placement algorithm based on multi-core processors, but they did not consider factors related to dynamic consolidation, energy consumption, and cost throughout the data center life cycle. The above work also did not consider the energy consumption of the processor at the die level and the chip level.
In addition, the composition of a host’s energy consumption is not only related to the CPU factor. Therefore, several works have proposed multi-resource utilization-oriented host energy models [14,15,16,17,18,19]. However, these models only consider the energy consumption when the host acts as an independent object and do not consider the additional energy consumption of the VM migration due to the increase of the host load during server consolidation.
To ensure user performance and service quality, Buyya et al. [20] proposed a CPU-based SLAV calculation method, which was widely adopted in many subsequent works [21,22,23,24,25,26,27,28,29]. However, the quality of service (QoS) of users when using VMs cannot be measured only by CPU performance, and SLAV must involve the use of multiple resources.

2.2. Server Consolidation Solutions

Buyya et al. [20] first proposed the classic four-step server consolidation solution. The first step is host load detection, which picks out overloaded and underloaded hosts in the cluster. The second step is VM selection for overloaded hosts. In order to reduce the host load and the occurrences of SLAV, suitable VMs are selected and added into a VM migration list. The third step is VM placement, which selects the suitable destination hosts for all objects in the VM migration list. After VM placement, underloaded hosts are handled. By migrating all the VMs on the underloaded host to other suitable hosts as much as possible and shutting down or switching these underloaded hosts to an energy-saving state, the host energy cost of the CDC can be further reduced. At present, most of the specific execution strategies for solving server consolidation are heuristic. Based on multiple resource constraints, Li et al. [30] proposed a server consolidation method that not only reduces energy consumption but also ensures QoS, but this method only guarantees the QoS of the users in terms of CPU usage. YADAV et al. [25] mainly considered the network overhead and proposed an adaptive host overloaded detection method and VM selection algorithm. Sayadnavard et al. [31] proposed a server consolidation method based on multiple resource constraints, but the optimization goal is to minimize the number of hosts used by the VM placement, and it ignores other types of costs. Yuan et al. [32] used the culture multiple-ant-colony algorithm to solve the server consolidation problem without SLAV constraints.
None of the models proposed in the above works simultaneously consider the costs associated with multi-resource usage, multi-core processors, multi-resource SLAV, and VM migration.

3. Cost Model and Problem Description

In this section, we first formally describe the multi-core processor-based cost model in server consolidation of CDC and then formulate a problem description based on this.

3.1. Cost Model

In CDC, the cost related to server consolidation mainly involves hosts, VM migrations, and SLAV compensation.
Before giving a specific cost model, we first describe the time and objects of the entire system. There are N heterogeneous hosts in the CDC, forming the host set H = h 1 , h 2 , , h N . The total amount of resources that a host h i can provide is marked as a scalar C i = ( c i c p u , c i m e m ) , where c i c p u and c i m e m are the total amount of CPU and memory resources, respectively. The CPU is multi-core; hence we have c i c p u = ( c i c o r e 1 , c i c o r e 2 , , c i c o r e c n i ) , where c n i is the number of cores in the processor on h i . Generally speaking, we make c i c o r e 1 = c i c o r e 2 = = c i c o r e c n i , where c i c o r e c n i is the total amount of computing resources that each core can provide. There are M VMs running on these hosts, forming a VM set V = v 1 , v 2 , , v M . When a user makes a VM v j request, the submitted resource requirements are marked as scalar D i = ( d j c p u , d j m e m ) , where d j c p u and d j m e m are the total requirements of v j for CPU and memory, respectively. We assume that each VM is a single-core task; that is, only the computing resources of a single core can be used by a certain VM.
The life cycle of a CDC [ 0 , L T ] is divided into L small and equal-length consecutive time segments t 1 , t 2 , , t L , and each time segment has a length of T. In a certain time segment t k , if a host h i is in working state, λ i , k = 1 , otherwise λ i , k = 0 . At this time, the amount that the host can provide for each resource is R i , k = ( r i , k c p u , r i , k m e m ) , where r i , k c p u = ( r i , k c o r e 1 , r i , k c o r e 2 , , r i , k c o r e c n i ) , where r i , k c o r e c n i is the amount of resources that the c n i -th core can provide in t k . In t k , the amount of resources demanded by the VM v j is denoted as S j , k = ( s j , k c p u , s j , k m e m ) .
We summarize the total cost of a CDC for a given lifetime by analyzing the performance of each computing device in each time slice. In general, in addition to the operating cost of the hosts, it is also necessary to consider the cost of VM migration during each server consolidation and the penalty caused by the occurrences of SLAV. We will discuss them separately in the following summaries.

Host Cost Model

Given a host h i , its running cost C h i is mainly related to the electricity charge E P and its power p h i , t at a given time t, namely:
C h i = E P × 0 T L p h i , t d t .
It should be noted that if h i is powered off or in a power-saving state, its power consumption is negligible, so it will not incur any electricity-related costs. The analysis  [33] of VM traces in the Alibaba CDC shows that the demand for CPU and memory resources of VMs far exceeds that of disk and network I/O. In this paper, we consider that the power of a host is related to the CPU, memory, and other basic components (motherboard, network card, disk, etc.). We also consider the power consumption of basic components to be a fixed value, so the power consumption of CPU and memory is discussed below.
  • CPU power model
Buyya et al. [20] leveraged a single-core-based host power model in server consolidation; that is, the power of the CPU is related to its only core. Modern processors are multi-core architectures. Multiple cores are packaged on multiple CPU dies. The general architecture of a multi-core CPU is shown in Figure 1.
The total power consumption of the processor involves chip-level mandatory components, cores, die-level mandatory components, communication between cores, and communication between dies. In addition, modern processors employ energy-efficient mechanisms (such as Intel’s SpeedStep) to optimize the power consumption of the CPU, which means that the power consumption of the CPU is not linearly related to its usage. We describe the power description of a given CPU at a given moment as:
P c p u = ( 1 r ) × ( P c m + P d i e s + P i n t e r d i e ) ,
where r is the energy-efficient factor, P c m is the power consumption of chip-level mandatory components, P d i e s is the power consumption of dies, and P i n t e r d i e is the power consumption of inter-die communication. Next, we give the models of the above factors and energy consumption, respectively. In the case of not using an energy-efficient mechanism, the actual power when the n cores of the processor perform calculations at the same time is P a c t :
P a c t = P c m + P d i e s + P i n t e r d i e .
In addition, we denote the total power of all cores as:
P c o r e s = k = 1 n P c o r e k ,
where P c o r e k is the power consumption of the k-th core when other cores are idle and it is computing alone.
Basmadjian et al. [34] performed experiments to analyze the power consumption of chip-level mandatory components such as voltage regulators for:
P c m = P c o r e s P a c t = s ( v , f ) ,
where s is the capacitance function, v is the voltage, and f is the frequency.
s ( v , f ) = c e × v 2 × f ,
where c e is the effective capacitance [35].
Communication between dies occurs when cores on different dies access data at the same memory address. The power consumption of inter-die communication is:
P i n t e r d i e = j = 1 | d j D | d | 1 c e × v j 2 × f j ,
where v j and f j are the voltage and frequency of the corresponding cores on d i e j , d j is the set of active cores on the j-th die, and D is the set of dies related to communication, and they are:
D = { d j | d j } ,
d j = { c o r e i , j | u ( c o r e i , j ) > 0 } ,
where c o r e i , j is the i-th core on the j-th die, u ( c o r e i , j ) is the current utilization of c o r e i , j , i [ 1 , n j ] , and n j is the total number of cores on the j-th die. We also have:
v j = m a x { v c o r e i , j | u ( c o r e i , j ) > 0 } ,
v j = m a x { f c o r e i , j | u ( c o r e i , j ) > 0 } .
Equations (10) and (11) show that when there is only one active core on the j-th die, v j and f j of the j-th die are the voltage and frequency of the core.
The power of a single die can be described as:
P d i e j = P m d j + P c o r e s j + P o f f j ,
where P m d j is the power consumption of die-level mandatory components, P c o r e s j is the power consumption of n j constituent cores, and P o f f j is the power consumption of off-chip caches. We leverage Equation (5) to model P m d j .
Inter-core communication occurs between multiple cores on a single die j. Therefore, the core-level power consumption model is:
P c o r e s j = P d c j + P i n t e r c o r e j ,
where P d c j is the power consumption of all active cores on j-th die, and P i n t e r c o r e j is the inter-core communication power consumption between the active cores.
The power consumption of a single core c o r e i , j is described as:
P c o r e i , j j = P e x c c o r e i , j + P o n ,
where P e x c c o r e i , j and P o n are the power consumptions of exclusive components (e.g., ALU) and on-chip caches of c o r e i , j , respectively. Based on the model in [20], we consider that P e x c c o r e i , j is linearly related to the utilization of the core, therefore:
P e x c c o r e i , j = P m a x c o r e i , j × u ( c o r e i , j ) 100 ,
where P m a x c o r e i , j is the power consumption of c o r e i , j at the maximum utilization, which can be calculated by the model in Equation (5).
The power consumption of on-chip caches is:
P o n = i = 1 s P L i ,
where s is the number of the on-chip caches, P L i is the power consumption of on-chip cache L i , which can be calculated by the model in Equation (5).
Hence, the power consumption of all active cores on the j-th die is:
P d c j = i = 1 | c o r e i , j d j n j P c o r e i , j .
By dynamically adjusting voltage and frequency and turning off temporarily unused components, the energy-efficient mechanism can effectively optimize processor power consumption. This part of the power consumption reduction is mainly affected by three factors: (1) components and communication between cores, (2) changes in the frequency of a single core, and (3) the number of cores. Here we define the three factors.
The first factor is
α = 1 P a c t P c o r e s .
The second factor is
β = α f ,
where f is the given frequency. For a multi-core processor, we have:
f = a v e r a g e { f c o r e i , j | c o r e i , j d i , j , j [ 1 , m ] } ,
where m is the number of dies.
The third factor is
γ = α k , k 2 , 0 , o t h e r w i s e ,
where k = j = 1 m | d j | is the total number of all active cores on the processor.
Based on Equations (18), (19), and (21), we obtain the power reduction factor:
r = α + β + γ .
Based on the above analysis, the processor power consumption P c p u of a given host can be obtained. For the host h i , the power consumption of its processor at a certain time t is denoted as P i , t c p u . It can be said that P i , t c p u is a function of the current utilization of each core on the processor.
  • Memory power model
In all current public data traces of a VM, workload records in the CDC provide the memory usage of monitored objects within a certain period of time. Therefore, the current footprint u i , t m e m of the memory is used to estimate the power consumption P i , t m e m of the host h i at a given time t:
P i , t m e m = P i d l e , i m e m + α i m e m × u i , t m e m ,
where P i d l e , i m e m is the memory power consumption when h i is idle, and α i m e m is the memory power factor. According to the analysis by Esfandiarpoor et al. [27], when α i m e m = 0.3 W / G B , the power consumption of a DDR memory system can be estimated more accurately.
In summary, we obtain the total power of host h i :
P h i , t = P i , t c p u + P i , t m e m + P i , t b a s e
Combining Equation (24) into Equation (1), we obtain the energy consumption cost C H of all hosts. We divide the life cycle of the CDC into multiple time segments and analyze the energy consumption separately in each time segment. Then, we have:
C H = E P × k = 1 L 0 T λ i , k × P h i , t d t .

3.2. VM Migration Cost

We assume that at the beginning of each time segment, the CDC performs server consolidation to achieve the balance between the CSP’s cost and the user’s performance. VM migration is an important part of server consolidation. In a cluster composed of multi-core processor hosts, there are two types of VM migrations. The first is the inter-core migration on the same host, and the second is the inter-host migration between different hosts. Inter-core migration occurs when the core where the VMs are located is overloaded, and other cores of the same processor have sufficient computing resources. The VM migrates from one core of the processor to another core in a very short period of time through inter-core or inter-die communication. The inter-core migration does not involve memory, and the main impact is the hit rate of the processor cache. Therefore, the energy overhead of inter-core migration is negligible.
Next, we discuss the energy cost of inter-host migration. We use live migration technology to migrate VMs between different hosts. During live migration of a VM, the memory data of the VM is transmitted. Although VMs generate dirty pages during live migration, the research [28] indicates that the energy consumption of a VM live migration is positively related to the memory size of that VM. Therefore, we can assume that the larger the VM memory size is , the longer the migration time and the more energy consumption will be.
When migrating a VM v j from host h i to another host h i , we assume that h i reserves enough resources to support the migration of v j , and h i also reserves enough resources to run v j . Buyya et al. [20] assumed that a VM would consume an extra 10% CPU usage to maintain the migration. In this paper, we extend this assumption to the memory resource usage of VM migration. In addition, we assume that the CDC deploys an exclusive network for VM migrations. We denote the size of the dedicated migration bandwidth of h i as M I G _ N E T i . The total cost of VM migrations in a given life cycle is denoted as C m i g . C m i g is described as:
C m i g = k = 1 L ( C k m i g _ c p u + C k m i g _ m e m ) ,
where C k m i g _ c p u and C k m i g _ m e m are the migration costs caused by CPU and memory in t k , respectively.
C k m i g _ m e m is calculated as:
C k m i g _ m e m = i = 1 N j = 1 M [ E P × t = 0 t j , k m i g ( γ j , i , x i , i , x i , k × P j , k m i g _ m e m ) d k ] ,
where γ j , i , x i , i , x i , k is a 0-1 indicator, P j , k m i g _ m e m is the power consumption generated by migrating the memory data of v j , and t j , k m i g is the time spent migrating v j . If VM v j needs to be migrated from the x i -th core of the processor of the host h i to the x i -th core of another host h i , then γ j , i , x i , i , x i , k = 1 ; otherwise γ j , i , x i , i , x i , k = 0 . Since VM memory is the main data transferred during migration, we have:
t j , k m i g = s j , k m e m m i g _ b w i , k ,
where m i g _ b w i , k is the migration bandwidth size assigned to v j . We consider that the migration bandwidth is evenly assigned to every migrated VM within t k on h i . Hence, for a given source host h i and a destination host h i , we obtain:
m i g _ b w i , k = M I G _ N E T i i = 1 N j = 1 M i = 1 N γ j , i , x i , i , x i , k ,
then we have
t j , k m i g = s j , k m e m × i = 1 N j = 1 M i = 1 N γ j , i , x i , i , x i , k m i g _ b w i , k × M I G _ N E T i .
After this, we substitute Equation (30) into Equation (27). We let p j , k v m e m be the memory power of v j within t k , and the memory migration cost of v j is 0.1 × p j , k v m e m = 0.1 × α i m e m × s j , k m e m .
Next, we discuss C k m i g _ c p u . We assume here that the power consumption generated by a host in a CDC is mainly used to keep the VM running. Since the processor power consumption P i , k c p u is related to its respective core in the current utilization ( c i c o r e c n i r i c o r e 1 , c i c o r e c n i r i c o r e 2 , , c i c o r e c n i r i c o r e c n i ) , it can be written as P i , k c p u ( c i c o r e c n i r i , k c o r e 1 , c i c o r e c n i r i , k c o r e 2 , , c i c o r e c n i r i , k c o r e c n i ) . For a given core on the processor c o r e x , where x [ 1 , c n i ] , if a VM needs to be migrated to another host at this time, its CPU utilization u i , k m i g _ c o r e x is:
u i , k m i g _ c o r e x = c i c o r e x r i , k c o r e x + 0.1 × j = 1 M ( γ j , i , x i , i , x i , k × s j , k c p u ) .
Hence, the power consumption of host h i during inter-host migration is:
( P i , k c p u ) = P i , k c p u ( u i , k m i g _ c o r e 1 , , u i , k m i g _ c o r e c n i ) .
Then, we combine Equation (32) into Equation (2). We denote the updated host energy consumption cost C H as C H .

SLAV Penalty Cost

In a CDC, to guarantee user QoS, CSPs must provide SLAV compensation to relevant users in some form. This part of the overhead needs to be included in the cost consideration of the CDC. In this paper, we extend the single-core CPU SLAV definition by Buyya et al. [20] to multi-core CPU and memory. They are denoted as S L A V c p u and S L A V m e m , respectively.
For the processor, it is considered overloaded only if all its cores are overloaded. Hence, we have
S L A V c p u = 1 N i = 1 N T i s , c p u T i a , c p u × 1 M j = 1 M k = 1 L u d i , k d , c p u s i , k r , c p u ,
where T i s , c p u is CPU SLAV duration caused by all cores overloaded on h i , T i a , c p u is the total working duration of the host, and d i d , c p u is the size of the unsatisfied CPU resource demand as a result of v j migration in t k .
Likewise, we propose the formal definition of S L A V m e m :
S L A V m e m = 1 N i = 1 N T i s , m e m T i a , m e m × 1 M j = 1 M k = 1 L u d i , k d , m e m s i , k r , m e m .
We denote CPU and memory SLAV compensation price indices as p u n c p u and p u n m e m , respectively. Then, we have:
C S L A V = p u n c p u × S L A V c p u + p u n m e m × S L A V m e m .

3.3. Problem Description

In the above Section 3.1, we analyze the factors involved in the operating cost in a CDC, which are the host energy consumption cost C H , the VM migration cost C m i g , and the SLAV penalty cost C S L A V . In this paper, our research goal is to minimize the associated operating cost C of the CDC. Combining the above models, we have a minimizing multi-core-host-based cost problem in server consolidation (MMCC):
M I N C = C H + k = 1 L C k m i g _ m e m + C S L A V .
A 0–1 indicator β i , j , x i , k is used to mark whether the VM v j is running on the x i -th core of the host h i ’s processor at the beginning of the t k time period. If v j runs on the x i -th core of the host h i , then β i , j , x i , k = 1 , otherwise β i , j , x i , k = 0 . The constraints of the MMCC problem are:
i = 1 N x i = 1 c o r e i β i , j , x i , k = 1 , j , k ,
i = 1 N x i = 1 c o r e i γ j , i , x i , i , x i , k = 1 , i i , j , k ,
j = 1 M β i , j , x i , k × s j , k c p u r i , k c o r e i , i , x i , k ,
j = 1 M β i , j , x i , k × s j , k m e m r i , k m e m , i , k .
Constraint (37) means that in any period, any VM can only run on a specific core on a unique specific host. Constraint (38) means that in any period, a VM migrated from any host can only have a unique destination host and a unique core. Constraint (39) and (40) mean that in any period, the CPU and memory resources provided by each host to the VM cannot exceed its resource upper limits.
In the following, we analyze the complexity of the MMCC problem by considering a simple case of the problem. If the hosts in the CDC are homogeneous, the resource requirements of any VM v j in any time segment t k are fixed values and satisfy constraints (39) and (40). Then, the VM migration cost and SLAV penalty cost are both zero, and the objective function of the MMCC problem is:
M I N C = C H .
Obviously, the MMCC problem in this simple case can be reduced to the bin-packing problem. Since the bin-packing problem is NP-hard, the MMCC problem is also NP-hard.

4. Solution for MMCC Problem

Since the MMCC problem is NP-hard, we propose a heuristic based on the traditional four-step method for dealing with server consolidation. The first step is host workload detection, the second step is VM selection, the third and fourth steps are VM placements for VM from the overloaded and underloaded hosts. Before performing host overloading detection and VM selection, we will first predict the future workload trends of the VM based on its workload history. The purpose of this is to balance the load of hosts before they become overloaded and trigger SLAV occurrences, thereby reducing costs as much as possible.

4.1. VM Workload Prediction

Before predicting the future workload of a VM, we first need to preprocess its workload trace. The sampling frequency and precision cause a certain deviation between the historical sampling records and the actual usage of resources by the VM. To minimize the impact of these biases on the final result, we denoise by assuming that there is noise in the workload’s history. In addition, we do not need to spend high computing power and a lot of time to obtain accurate prediction results. We only need to roughly judge a general trend of the VM’s resource usage in the future.
In this paper, we utilize a classic denoise autoencoder [36] (DAE) based filter algorithm to preprocess the workload of VMs. Figure 2 shows the general structure of the DEA mechanism.
In Figure 2, x is the initialization input, and x ˜ q D ( x ˜ | x ) is the stochastic mapping of x. Then, the autoencoder maps x ˜ to y with the encoder f θ and generates the reconstruction z with the decoder g θ . The reconstruction error is measured by the loss function L H ( x , z ) . In our proposed DAE-based filter, three autoencoders and one compression decoder are assembled, and their network structures are shown in Figure 3, Figure 4, Figure 5 and Figure 6. Figure 7 shows the result of processing a segment of CPU usage records of a VM using the DEA-based filter.
Traditional RNNs cannot be parallelized, so there is a problem of slow training speed. To address this issue, we employ an SRU-based approach to predict the workload of VMs. Simple recurrent units (SRU) [37] eliminate the time dependency of most operations, enabling parallel processing. Experiments [37] show that the processing speed of the SRU is more than ten times faster than that of traditional LSTM under the condition of similar result accuracy. Since the SRU has been open-sourced and its usage method is not much different from LSTM, we will not discuss the theoretical details of SRU in this article.
After predicting the resource usage of each VM at the next time segment, we can perform host load detection and VM selection.

4.2. Host Workload Detection

The purpose of host overloaded detection is to avoid and eliminate the fierce competition of VMs for resources, thereby reducing the occurrences of SLAV. Common host overloaded detection methods are divided into two categories, static threshold method and dynamic threshold method. In the static threshold method, the resource uasge thresholds are set as fixed values. When the usage exceeds the threshold, the host is in an overloaded state, and SLAV occurs. At this time, the VMs must be migrated to reduce the load. In the dynamic threshold, CSPs analyze the use of computing resources through various statistical methods to determine whether the competition for resources is fierce and whether the hosts are overloaded. The advantage of the static threshold method is that the host resources are fully utilized, but the disadvantage is that more overhead is required to reduce the SLAV. The advantage of the dynamic threshold method is that it can effectively reduce the SLAV, but the disadvantage is that sometimes the usages of hosts’ resources are not sufficient. Therefore, we combine the advantages of the two and propose the double insurance-based fixed threshold overloading detection method (DIFT).
In DIFT, the first insurance is that the host cannot overload the CPU and memory resources during the current period. The second insurance is that the host cannot overload the CPU and memory resources in the next period. For a given host h i , DIFT first detects whether the usages of various resources on h i exceed the given thresholds at the beginning of the t k time period, and then, based on the prediction results of the SRU method, we judge in the next time period t k + 1 whether the usages of various resources on h i exceed the given thresholds.
Since the VM migrations are divided into inter-core migrations and inter-host migrations, we correspondingly divide the CPU overload of the host into two situations: processor-overloaded and cores-overloaded. When the host is processor-overloaded, all cores on the processor are in an overloaded state. When the host is cores-overloaded, some (but not all) cores on the processor are in the overloaded state.
Let the overloaded threshold be T H u p = T H u p c p u , T H u p m e m , where both T H u p c p u and T H u p m e m are in the interval ( 0 , 1 ) . For any c o r e x i on h i , when the following inequality holds in t k , it is in the state of processor-overloaded:
j = 1 M β i , j , x i , k × s j , k c p u > T H u p c p u × c i c o r e x i ,
j = 1 M β i , j , x i , k × s j , k + 1 c p u > T H u p c p u × c i c o r e x i .
For some c o r e x i on h i (the number of c o r e x i that satisfy the condition cannot exceed c n i ), when the above in Equations (42) and (43) are established in t k , it is in the cores-overloaded state.
Host h i is in a memory-overloaded state when the following inequality holds in t k :
j = 1 M β i , j , x i , k × s j , k m e m > T H u p m e m × c i m e m ,
j = 1 M β i , j , x i , k × s j , k + 1 m e m > T H u p m e m × c i m e m .
When the host is memory-overloaded or processor-overloaded, it must be in the host-overloaded state, and VM inter-host migration is required at this time. The situation where the host has only cores-overloaded is called semi-overloaded, and inter-core migrations can be preferentially leveraged at this time.
For an underloaded host, all VMs on it are migrated to other suitable hosts through inter-host migration; hence there is no need to consider inter-core migration requirements. Let the underloaded threshold be T H d o w n = T H d o w n c p u , T H d o w n m e m , where both T H d o w n c p u and T H d o w n m e m are in the interval ( 0 , 1 ) . For c o r e x i on h i , when the following inequalities hold in t k , it is in the host-underloaded state:
j = 1 M β i , j , x i , k × s j , k c p u < T H d o w n c p u × c i c o r e x i ,
j = 1 M β i , j , x i , k × s j , k + 1 c p u < T H d o w n c p u × c i c o r e x i .
j = 1 M β i , j , x i , k × s j , k m e m < T H d o w n m e m × c i m e m ,
j = 1 M β i , j , x i , k × s j , k + 1 m e m < T H d o w n m e m × c i m e m .

4.3. VM Selection

VM selection is for overloaded hosts. The reason why we use the DIFT method is to avoid host overload and SLAV as much as possible, rather than react passively after SLAV occurs. Therefore, we can assume that in the t k + 1 time segment, there would be slight SLAV and host overload in the CDC. Our priority in VM selection is to select the VMs on each host h i that may cause h i to be overloaded during t ( k + 1 ) at the t k time segment and form a list of VMs to be migrated. If after the migrations of these VMs are completed, h i is still in the overloaded state during the t k period, then targeted processing will be performed. We discuss VM selection strategies under various overloaded states within t l (e.g., l = k + 1 ) in different cases.
  • Case 1: Host with semi-overloaded
In this case, we need to reduce the load on each overloaded core. Given the j-th overloaded core c o r e i , j , l on host h i in t l , we denote the set of n VMs running on it as V i , j , k = { v i , j , k 1 , v i , j , k 2 , , v i , j , k n } , its total resources are c c o r e i , j , k , and the current available resource is r c o r e i , j , k . For a VM v i , j , k q V i , j , k , the amount of CPU resources it uses is denoted as c p u v i , j , k q . Then, each selection chooses the VM v i , j , k q that has the minimum value of | ( 1 T H u p c p u ) × c c o r e i , j , k ( r c o r e i , j , k + c p u v i , j , k q ) | into the inter-core migration list. We select a VM at a time until r c o r e i , j , k ( 1 T H u p c p u ) × c c o r e i , j , k .
  • Case 2: Host with only memory overloaded
Given a memory-overloaded host h i in t l , we denote the set of n VMs running on it as V i , j , k = { v i , j , k 1 , v i , j , k 2 , , v i , j , k n } , the total amount of memory resource it has is c m e m i , l , and the currently available amount of resources is r m e m i , l . For a VM v i , k q V i , k , the amount of memory resources used by it is recorded as m e m v i , l q . Then, each selection chooses the VM v i , k q that has the minimum value of | ( 1 T H u p m e m ) × c m e m i , k ( r m e m i , k + m e m v i , k q ) | into the inter-host migration list. We select a VM at a time until r m e m i , k ( 1 T H u p m e m ) × c m e m i , k .
  • Case 3: Host with only processor overloaded
We select VMs from each core in the same method as proposed in Case 1. All selected VMs are added into the inter-host migration list.
  • Case 4: Host with processor overloaded or cores overloaded and memory overloaded
We first use the method in Case 1 to select VMs from each overloaded core. After the load of all cores drops under the overloaded threshold, if the memory load also drops under the overload threshold, the VM selection is completed; otherwise, the method in Case 2 is used to select VMs to reduce the memory load. All selected VMs are put into the inter-host migration list.
For a given overloaded host, at the beginning of the t k time segment, the above VM selection strategies are executed for its overloaded condition in t k + 1 . If the host is still in an overloaded state in the t k time segment, the above strategies are executed again to reduce the current load.

4.4. VM Placement

To make full use of the resources of the hosts, we should fully consider the space and time competition of different VMs for different resources when placing VMs on hosts.
In the VM selection phase, we obtain the inter-core migration list and inter-host migration list. Regarding a semi-overloaded host, it should be noted that the load of the cores may not be reduced through inter-core migration. Therefore, in the VM placement phase, we first process the inter-core migration list and then add the remaining VMs to the inter-host migration list to process together.
There are two goals of VM placement: (1) to ensure that the resources of the target host can be fully utilized during the t k period; and (2) after the VM is placed on the destination host h i , the host will not be in the overloaded state during the t k + 1 period.
We address the inter-core migration list first. For the a semi-overloaded host h i , we sort all un-overloaded cores { c o r e 1 , , c o r e c n i } in ascending order of load, where c n i is the number of overloaded cores. We denote this orderd sequence as o r d e r e d _ u o _ c o r e s i , k . We arrange the VMs in the inter-core migration list i c m i of h i in descending order according to the current demand for CPU resources to form the list o r d e r e d _ i c m i . We take the first VM from o r d e r e d _ i c m i and traverse o r d e r e d _ u o _ c o r e s i , k for it in order to find the first core with enough CPU resources. If a suitable core cannot be found in o r d e r e d _ u o _ c o r e s i , k for this VM, we add it to the inter-host migration list. The VM is then removed from o r d e r e d _ i c m i . We repeat the above operations until o r d e r e d _ i c m i is empty. Each semi-overloaded host needs to execute this placement algorithm for its VMs in i c m i . Algorithm 1 demonstrates the pseudocode of the inter-core VM placement algorithm.
Algorithm 1 Inter-Core VM Placement Algorithm
Input: host h i , inter-core migration list i c m i of h i
Output: allocation of VMs on certain cores
1:
   G e t _ s o r t e d ( c o r e 1 , , c o r e c n i ) o r d e r e d _ u o _ c o r e s i , k
2:
   G e t _ s o r t e d ( i c m i ) o r d e r e d _ i c m i
3:
  for each v m j in o r d e r e d _ i c m i  do
4:
      for each c o r e p in o r d e r e d _ u o _ c o r e s i , k  do
5:
            if  c o r e p is available for v m j in t k and t k + 1  then
6:
                   a l l o c a t i o n . a d d ( v m j , h j . c o r e p )
7:
                   o r d e r e d _ i c m i . r e m o v e ( v m j )
8:
            end if
9:
            break
10:
    end for
11:
end forreturn allocation
Next, the inter-host migration list is processed. First, all the hosts are divided into two categories according to the intensity of CPU and memory usage: CPU-intensive hosts and memory-intensive hosts. The following calculation method is used to classify a given host h i . We take the workload trace of h i in 12 consecutive time segments (one hour), where the normalized CPU workload time series is L D i , k c p u = { l d i , k 10 c p u , , l d i , k c p u , l d i , k + 1 c p u } , and the time series of normalized memory workload is L D i , k m e m = { l d i , k 10 m e m , , l d i , k m e m , l d i , k + 1 m e m } . Since h i has a multi-core CPU, its normalized CPU workload at period t k is:
l d i , k c p u = x = 1 c n i c i c o r e c n i r i , k c o r e x m a x { c i c o r e c n i | i [ 1 , N ] } .
The reason why the denominator is m a x { c i c o r e c n i | i [ 1 , N ] } is that CPUs with different performances can be compared with each other through normalization. The smaller the value of l d i , k c p u is, the lower the CPU load of h i in the time period t k .
At a certain time period t k , its normalized memory workload is:
l d i , k m e m = c i m e m r i , k m e m m a x { c i m e m | i [ 1 , N ] } .
The smaller the value of l d i , k m e m is, the lower the memory load of h i in the time period t k .
Based on Equation (50), we calculate the CPU score of h i :
s c o r e i , k c p u = 1 10 ( y = 1 12 l d i , k c p u m a x ( L D i , k c p u ) m i n ( L D i , k c p u ) ) ,
where m a x ( L D i , k c p u ) is the maximum value in the normalized CPU workload sequence, and m i n ( L D i , k c p u ) is the minimum value in the normalized CPU workload sequence. We remove m a x ( L D i , k c p u ) and m i n ( L D i , k c p u ) from y = 1 12 l d i , k c p u to minimize the impact of possible severe load fluctuations on the score.
Based on Equation (51), we calculate the memory score of h i :
s c o r e i , k m e m = 1 10 ( y = 1 12 l d i , k m e m m a x ( L D i , k m e m ) m i n ( L D i , k m e m ) ) ,
where m a x ( L D i , k m e m ) is the maximum value in the normalized memory workload sequence, and m i n ( L D i , k m e m ) is the minimum value in the normalized memory workload sequence.
If s c o r e i , k c p u s c o r e i , k m e m , h i is CPU-intensive type; otherwise, h i is a memory-intensive type. The CPU-intensive type hosts have more abundant available memory resources, and the memory-intensive type hosts have more abundant available CPU resources. Therefore, in the time period t k , the hosts of the CPU-intensive type are arranged in ascending order according to their values of l d i , k m e m , forming a list m e m o r d e r e d _ c p u _ h o s t s _ l i s t . Memory-intensive hosts are sorted in ascending order according to their l d i , k c p u values to form a list c p u o r d e r e d _ m e m _ h o s t s _ l i s t . The reason for using the above sorting method is: CPU-intensive hosts have enough remaining memory resources, so VMs that require more memory resources can be placed on them; memory-intensive hosts have enough remaining CPU resources, so they can be placed with VMs that require more CPU resources.
In the following, we sort the VMs in the inter-host migration list by their resource usage requirements. The VMs to be migrated are also divided into CPU-intensive type and memory-intensive type. The CPU-intensive type VMs need to be placed on the memory-intensive type hosts as much as possible, and the memory-intensive VMs need to be placed on the CPU-intensive type hosts as much as possible. We use the following calculation method to classify a given VM v j . We take the workload trace of v j in 12 consecutive time segments (one hour), where the normalized CPU workload time series is V L D j , k c p u = { v l d j , k 10 c p u , , v l d j , k c p u , v l d j , k + 1 c p u } , and the time series of normalized memory workload is V L D j , k m e m = { v l d j , k 10 m e m , , v l d j , k m e m , v l d j , k + 1 m e m } . At certain time period t l , the normalized CPU workload of v j is:
v l d j , k c p u = s j , k c p u m i n { s x , k c p u | x [ 1 , M ] } m a x { s x , k c p u | x [ 1 , M ] } m i n { s x , k c p u | x [ 1 , M ] } .
The smaller the value of v l d j , k c p u , the lower the CPU demand of v j i in the time period t k .
At certain time period t l , the normalized memory workload of v j is:
v l d j , k m e m = s j , k m e m m i n { s x , k m e m | x [ 1 , M ] } m a x { s x , k m e m | x [ 1 , M ] } m i n { s x , k m e m | x [ 1 , M ] } .
Based on Equation (54), we calculate the CPU score of v j :
v s c o r e j , k c p u = 1 10 ( y = 1 12 v l d j , k c p u m a x ( V L D j , k c p u ) m i n ( V L D j , k c p u ) ) ,
where m a x ( V L D j , k c p u ) is the maximum value in the normalized CPU workload sequence of v j , and m i n ( V L D j , k c p u ) is the minimum value in the normalized CPU workload sequence of v j .
Based on Equation (55), we calculate the memory score of v j :
v s c o r e j , k m e m = 1 10 ( y = 1 12 v l d j , k m e m m a x ( V L D j , k m e m ) m i n ( V L D j , k m e m ) ) ,
where m a x ( V L D j , k m e m ) is the maximum value in the normalized memory workload sequence of v j , and m i n ( V L D j , k m e m ) is the minimum value in the normalized memory workload sequence of v j .
If v s c o r e j , k c p u v s c o r e j , k m e m , v j is a CPU-intensive type; otherwise, v j is a memory-intensive type.
In the time period t k , CPU-intensive type VMs are arranged in descending order according to their v l d j , k c p u values, forming a list o r d e r e d _ c p u _ v m s _ l i s t . We pick out the VMs in o r d e r e d _ c p u _ v m s _ l i s t in turn, traverse the c p u o r d e r e d _ m e m _ h o s t s _ l i s t , and find the first host that can meet the resource requirements of the current VM and will not be overloaded at both t k and future t k + 1 .
Since the hosts have multi-core CPUs, we design the following judgment when deciding which core of the host h i will be used by the VM v j . We sort the cores in h i ’s processor in descending order by their available resources r i , k c o r e 1 , r i , k c o r e 2 , , r i , k c o r e c n i , which constitute the sequence o r d e r e d _ c o r e s i , k . Then, the VM v j will be preferentially placed on the front core in o r d e r e d _ c o r e s i , k (and this core will also meet the resource requirements of v j in the t k + 1 time period).
In the time period t k , memory-intensive type VMs are arranged in descending order according to their values of v l d j , k m e m , forming a list o r d e r e d _ m e m _ v m s _ l i s t . We pick out the VMs in the o r d e r e d _ m e m _ v m s _ l i s t in turn, traverse the m e m o r d e r e d _ c p u _ h o s t s _ l i s t , and find the first host that can meet the resource requirements of the current VM and will not be overloaded at both t k and future t k + 1 . On the current host, the same multi-core placement method is used for processing o r d e r e d _ c o r e s i , k .
When the destination host is determined for a given VM to be migrated, this VM is removed from the inter-host migration list. Algorithm 2 demonstrates the pseudocode of the inter-host VM placement algorithm. After the above o r d e r e d _ m e m _ v m s _ l i s t and o r d e r e d _ c p u _ v m s _ l i s t are traversed, and there are still VMs to be migrated, the first-fit method is used to find available hosts in the host list for them. If there are still VMs to be migrated, the hosts in the energy-saving state are powered on one by one until all the VMs to be migrated are assigned destination hosts.
After the above process, we perform underloaded host detection on the hosts in the CDC. If there are still underloaded hosts at this time, the VMs on the underloaded hosts are added to form a VM migration list, and Algorithm 2 is executed.
Algorithm 2 Inter-Host VM Placement Algorithm
Input: hostlist, inter-host migration list
Output: allocation of VMs
1:
   G e t _ c l a s s i f i c a t i o n ( h o s t ) c p u _ i n t e n s i v e _ h o s t s k , m e m _ i n t e n s i v e _ h o s t s k
2:
   G e t _ c l a s s i f i c a t i o n ( i n t e r h o s t m i g r a t i o n l i s t ) c p u _ i n t e n s i v e _ v m s k ,
m e m _ i n t e n s i v e _ v m s k
3:
   G e t _ s o r t e d ( c p u _ i n t e n s i v e _ h o s t s k ) m e m o r d e r e d _ c p u _ h o s t s _ l i s t
4:
   G e t _ s o r t e d ( m e m _ i n t e n s i v e _ h o s t s k ) c p u o r d e r e d _ m e m _ h o s t s _ l i s t
5:
   G e t _ s o r t e d ( c p u _ i n t e n s i v e _ v m s k ) o r d e r e d _ c p u _ v m s _ l i s t
6:
   G e t _ s o r t e d ( m e m _ i n t e n s i v e _ v m s k ) o r d e r e d _ m e m _ v m s _ l i s t
7:
  for each v m j in o r d e r e d _ c p u _ v m s _ l i s t  do
8:
        for each h i in c p u o r d e r e d _ m e m _ h o s t s _ l i s t  do
9:
               G e t _ s o r t e d ( c o r e 1 , , c o r e c n i ) o r d e r e d _ c o r e s i , k
10:
             for each c o r e p in o r d e r e d _ c o r e s i , k  do
11:
                    if  c o r e p is available for v m j in t k and t k + 1  then
12:
                             a l l o c a t i o n . a d d ( v m j , h i . c o r e p )
13:
                             o r d e r e d c p u v m s l i s t . r e m o v e ( v m j )
14:
                    end if
15:
                    break
16:
             end for
17:
       end for
18:
end for
19:
for each v m j in o r d e r e d _ m e m _ v m s _ l i s t  do
20:
       for each h i in m e m o r d e r e d _ c p u _ h o s t s _ l i s t  do
21:
               G e t _ s o r t e d ( c o r e 1 , , c o r e c n i ) o r d e r e d _ c o r e s i , k
22:
              for each c o r e p in o r d e r e d _ c o r e s i , k  do
23:
                     if  c o r e p is available for v m j in t k and t k + 1  then
24:
                            a l l o c a t i o n . a d d ( v m j , h i . c o r e p )
25:
                            o r d e r e d c p u v m s l i s t . r e m o v e ( v m j )
26:
                      end if
27:
                      break
28:
              end for
29:
        end for
30:
end forreturn allocation

5. Performance Evaluation

In this section, we evaluate the performance of our proposed solution, named MMCC, with a real VM workload trace-driven simulation.

5.1. Experiment Setup

According to the energy consumption analysis and statistics of the hosts by Basmadjian et al. [34], Minartz et al. [38], and Jin et al. [39], we simulated three types of hosts as H l a r g e , H m e d i u m , and H s m a l l , respectively. Their resource parameters are shown in Table 1, the power parameters are shown in Table 2 and Table 3, and the capacitances of different components of the processor are given in Table 4. The numbers of H s m a l l hosts, H m e d i u m hosts, and H l a r g e hosts in the simulated CDC are both 100.
The VM workload trace dataset is from the Alibaba CDC [33]. The VM traces in the dataset are recorded by sampling every five minutes. We selected 1000 VMs in one day (a total of 288 time segments) from the dataset to simulate the consumers’ demands for cloud services. The simulation was implemented on CloudMatrix Lite [40]. The DAE-based filter and the SRU algorithm (the source code is available at https://github.com/asappresearch/sru accessed on 19 October 2022) was based on PyTorch [41].
We set the electricity price as E P = 0.25 $ / kWh. The SLAV penalty is a static value p u n c p u = p u n m e m = 0.01 $  [42]. The host should reserve an extra 10% resources for migrations. Thereby, we set T H u p c p u = T H u p m e m = 0.9 . We also set T H d o w n c p u = T H d o w n m e m = T H d o w n d i s k = T H d o w n n e t = 0.1 .
We combined four overloading detection algorithms (MAD [20], IQR [20], and LR [20]), three VM selection algorithms (MMT [20,25,30], MC [20,43], and RS [20]), and one VM placement algorithm (PABFD [20]) as nine baseline methods to compare with our proposed solution MMCC. All the abovementioned workload detection and selection algorithms were initially designed for single-core; hence, we modified them to work in the multi-core hosts by seeing the capacity of the CPU as the sum of its capacities of the cores. Moreover, The PABFD placement algorithm and its corresponding energy consumption model only take into account the host’s sinlge-CPU. Therefore, we modified it here to suit our multi-core (by randomly selecting a core in the CPU for the VM) and multi-resource scenario. The new PABFD placement algorithm is PABFDM, as shown in Algorithm 3 for the pseudocode.
Algorithm 3 PABFDM algorithm
Input: hostList, vmList
Output: allocation of the VMs
1:
vmList.sortDecreasingUtilization()
2:
for each VM in vmList do
3:
      minPower ← MAX
4:
      for host in hostList do
5:
             if no SLAV on this host and this host meets the CPU and memory resource requirement for VM then
6:
                 power ← e s t i m a t e P o w e r ( h o s t , V M )
7:
                 if power<minPower then
8:
                       minPower←power
9:
                 end if
10:
           end if
11:
    end for
12:
     a l l o c a t i o n . a d d ( V M , h o s t . r a n d o m ( c o r e ) )
13:
end for
        return allocation

5.2. Evaluation

The metrics involved in the evaluation are host energy consumption cost, SLAV penalty cost, and the number of VM migrations. Since the CPU cost of the VM migration energy consumption belongs to the hosts’ energy consumption during calculation, we used the number of VM migrations to indirectly measure the migration cost.
Figure 8 shows the host energy consumption for each time slice of the day when all the methods are used to perform server consolidation. Figure 9 compares the total host energy consumption over the day when all the methods are used to perform server consolidation. From Figure 8, it can be seen that the host energy consumption generated by MMCC is less than that of the baseline methods in most of the time periods. From Figure 9, it can be seen that the total host energy consumption generated by MMCC in a day is about 10% less than that of LR-MMT (the best in the baseline methods) and is about 43.9% less than that of MAD-RS (the worst in the baseline methods). In a cluster composed of multi-core processor hosts, MMCC can effectively schedule tasks among multiple cores to optimize energy consumption.
The comparison of CPU and memory SLAV produced by all the methods in a day is shown in Figure 10. The CPU SLAV generated by MMCC is much smaller than that of the baseline methods. For example, MMCC produces about 54% less CPU SLAV than that of MAD-RS and about 39% less than that of LR-MMT. Likewise, the memory SLAV produced by MMCC is much smaller than that of the baseline methods. A comparison of the total SLAV cost produced by all methods in one day is shown in Figure 11. MMCC outperforms the baseline methods. For instance, the total cost generated by MMCC is about 51.7% less than that generated by IQR-RS (the worst of the baseline methods) and about 33.5% less than that generated by LR-MMT (the best of the baseline methods). It can be said that the traditional server consolidation method represented by the baseline methods do not perform well in a cluster composed of multi-core processor hosts, while MMCC can better handle this scenario.
Figure 12 shows the number of VM migrations triggered in each time slice of the day when all the methods are used to perform server consolidation. Figure 13 compares the total number of VM migrations triggered in a day using all the methods to perform server consolidation. As can be seen from Figure 13, compared to the baseline method, the number of migrations triggered by MMCC does not have a large advantage. For example, MMCC triggers about 9.5% fewer migrations than that of IQR-RS. However, it should be noted that the VM migrations caused by MMCC in time t k is mainly to deal with the possible overloaded hosts in the future. Therefore, the SLAV produced by MMCC is much smaller than that of the baseline methods. In addition, part of the migrations caused by MMCC are inter-core migrations, which only happen inside the host. The cost of inter-core migration is negligible. The traditional baseline methods do not consider the inter-core migration in the case of multi-core processors.
Figure 14 shows and compares the total cost generated by all the methods in one day. MMCC outperforms the baseline methods. For instance, the total cost generated by MMCC is about 20.9% less than that of LR-MMT (the best of the baseline methods) and about 34.4% less than that of MAD-RS (the worst of the baseline methods). MMCC can not only optimize the energy consumption in the environment of multi-core processor hosts, but also reduce the SLAV in server consolidation through the host load detection and VM selection strategies based on the prediction method.

6. Conclusions

In this paper, we focus on reducing the total cost of server consolidation in a CDC, which is composed of multi-core processor hosts, operating costs while ensuring consumers’ QoS. We established a cost model based on multi-core and multi-resource usage in the CDC, taking into account the host energy cost, VM migration cost, and SLAV penalty cost. Based on this model, we define the MMCC problem in server consolidation. We designed a heuristic solution to deal with this problem. We employ a DAE-based filter to preprocess the VM workload dataset and to reduce noise in the workload trace. Subsequently, an SRU-based method is used to predict the usage of computing resources, allowing us to trigger inter-core or inter-host VM migrations before the host enters the state. We design a muliti-core-aware heuristic algorithm to solve the VM placement problem. Finally, simulations driven by real VM workload traces verify the effectiveness of our proposed method. Compared with the existing server consolidation methods, our proposed MMCC can reduce host energy consumption from 10% to 43.9%, SLAV cost by 33.5% to 51.7%, and total cost by 20.9% to 34.4% in a multi-core hosts cluster.
In the future, we will first consider a more comprehensive cost model, such as taking into account the operational life span of the host, the network topology of CDC, and cooling system.

Author Contributions

Conceptualization, H.L. and Y.S.; methodology, H.L.; software, H.L.; validation, H.L., L.W. and Y.S.; formal analysis, H.L.; investigation, H.L. and Y.L.; resources, H.L. and Y.L.; data curation, H.L.; writing—original draft preparation, H.L., L.W. and Y.L.; writing—review and editing, H.L., L.W., Y.L. and Y.S.; visualization, H.L.; supervision, Y.S.; project administration, Y.S.; funding acquisition, H.L. and Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (No.62002067), the Guangzhou Youth Talent Program (QT20220101174), the Department of Education of Guangdong Province (No.2020KTSCX039), Foundation of The Chinese Education Commission (22YJAZH091), and the SRP of Guangdong Education Dept (2019KZDZX1031).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Almost 82% Hong Kong Businesses Plan to Keep Remote Working Post-COVID-19. Available online: https://hongkongbusiness.hk/information-technology/more-news/almost-82-hong-kong-businesses-plan-keep-remote-working-post-covid- (accessed on 27 September 2022).
  2. Hong Kong Data Center Market—Growth, Trends, COVID-19 Impact, and Forecasts (2021–2026). Available online: https://www.reportlinker.com/p06187432/Hong-Kong-Data-Center-Market-Growth-Trends-COVID-19-Impact-and-Forecasts.html (accessed on 27 September 2022).
  3. Dhiman, G.; Mihic, K.; Rosing, T. A system for online power prediction in virtualized environments using gaussian mixture models. In Proceedings of the 47th Design Automation Conference, Anaheim, CA, USA, 13–18 June 2010; pp. 807–812. [Google Scholar]
  4. Ham, S.; Kim, M.; Choi, B.; Jeong, J. Simplified server model to simulate data center cooling energy consumption. Energy Build. 2015, 86, 328–339. [Google Scholar] [CrossRef]
  5. Kavanagh, R.; Djemame, K. Rapid and accurate energy models through calibration with IPMI and RAPL. Concurr. Comput. Pract. Exp. 2019, 31, e5124. [Google Scholar] [CrossRef]
  6. Gupta, V.; Nathuji, R.; Schwan, K. An analysis of power reduction in datacenters using heterogeneous chip multiprocessors. ACM Sigmetrics Perform. Eval. Rev. 2011, 39, 87–91. [Google Scholar] [CrossRef]
  7. Lefurgy, C.; Wang, X.; Ware, M. Server-level power control. In Proceedings of the Fourth International Conference on Autonomic Computing (ICAC’07), Jacksonville, FL, USA, 11–15 June 2007; p. 4. [Google Scholar]
  8. Beloglazov, A.; Abawajy, J.; Buyya, R. Energy-aware resource allocation heuristics for efficient management of data centers for cloud computing. Future Gener. Comput. Syst. 2012, 28, 755–768. [Google Scholar] [CrossRef] [Green Version]
  9. Rezaei-Mayahi, M.; Rezazad, M.; Sarbazi-Azad, H. Temperature-aware power consumption modeling in Hyperscale cloud data centers. Future Gener. Comput. Syst. 2019, 94, 130–139. [Google Scholar] [CrossRef]
  10. Chen, Y.; Das, A.; Qin, W.; Sivasubramaniam, A.; Wang, Q.; Gautam, N. Managing server energy and operational costs in hosting centers. In Proceedings of the 2005 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, Banff, AB, Canada, 6–10 June 2005; pp. 303–314. [Google Scholar]
  11. Wu, W.; Lin, W.; Peng, Z. An intelligent power consumption model for virtual machines under CPU-intensive workload in cloud environment. Soft Comput. 2017, 21, 5755–5764. [Google Scholar] [CrossRef]
  12. Lien, C.; Bai, Y.; Lin, M. Estimation by software for the power consumption of streaming-media servers. IEEE Trans. Instrum. Meas. 2007, 56, 1859–1870. [Google Scholar] [CrossRef]
  13. Raja, K. Multi-core Aware Virtual Machine Placement for Cloud Data Centers with Constraint Programming. In Intelligent Computing; Springer: Cham, Switzerland, 2022; pp. 439–457. [Google Scholar]
  14. Economou, D.; Rivoire, S.; Kozyrakis, C.; Ranganathan, P. Full-system power analysis and modeling for server environments. In Proceedings of the International Symposium on Computer Architecture, Ouro Preto, Brazil, 17–20 October 2006. [Google Scholar]
  15. Alan, I.; Arslan, E.; Kosar, T. Energy-aware data transfer tuning. In Proceedings of the 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Chicago, IL, USA, 26–29 May 2014; pp. 626–634. [Google Scholar]
  16. Li, Y.; Wang, Y.; Yin, B.; Guan, L. An online power metering model for cloud environment. In Proceedings of the 2012 IEEE 11th International Symposium on Network Computing and Applications, Cambridge, MA, USA, 23–25 August 2012; pp. 175–180. [Google Scholar]
  17. Lent, R. A model for network server performance and power consumption. Sustain. Comput. Inform. Syst. 2013, 3, 80–93. [Google Scholar] [CrossRef]
  18. Kansal, A.; Zhao, F.; Liu, J.; Kothari, N.; Bhattacharya, A. Virtual machine power metering and provisioning. In Proceedings of the 1st ACM Symposium on Cloud Computing, Indianapolis, IN, USA, 10–11 June 2010; pp. 39–50. [Google Scholar]
  19. Lin, W.; Wang, W.; Wu, W.; Pang, X.; Liu, B.; Zhang, Y. A heuristic task scheduling algorithm based on server power efficiency model in cloud environments. Sustain. Comput. Inform. Syst. 2018, 20, 56–65. [Google Scholar] [CrossRef]
  20. Beloglazov, A.; Buyya, R. Optimal online deterministic algorithms and adaptive heuristics for energy and performance efficient dynamic consolidation of virtual machines in cloud data centers. Concurr. Comput. Pract. Exp. 2012, 24, 1397–1420. [Google Scholar] [CrossRef]
  21. Li, H.; Li, W.; Wang, H.; Wang, J. An optimization of virtual machine selection and placement by using memory content similarity for server consolidation in cloud. Future Gener. Comput. Syst. 2018, 84, 98–107. [Google Scholar] [CrossRef]
  22. Li, H.; Li, W.; Zhang, S.; Wang, H.; Pan, Y.; Wang, J. Page-sharing-based virtual machine packing with multi-resource constraints to reduce network traffic in migration for clouds. Future Gener. Comput. Syst. 2019, 96, 462–471. [Google Scholar] [CrossRef]
  23. Li, H.; Li, W.; Feng, Q.; Zhang, S.; Wang, H.; Wang, J. Leveraging content similarity among vmi files to allocate virtual machines in cloud. Future Gener. Comput. Syst. 2018, 79, 528–542. [Google Scholar] [CrossRef]
  24. Li, H.; Wang, S.; Ruan, C. A fast approach of provisioning virtual machines by using image content similarity in cloud. IEEE Access 2019, 7, 45099–45109. [Google Scholar] [CrossRef]
  25. Yadav, R.; Zhang, W.; Kaiwartya, O.; Singh, P.; Elgendy, I.; Tian, Y. Adaptive energy-aware algorithms for minimizing energy consumption and SLA violation in cloud computing. IEEE Access 2018, 6, 55923–55936. [Google Scholar] [CrossRef]
  26. Hieu, N.; Di Francesco, M.; Ylä-Jääski, A. Virtual machine consolidation with multiple usage prediction for energy-efficient cloud data centers. IEEE Trans. Serv. Comput. 2017, 13, 186–199. [Google Scholar] [CrossRef] [Green Version]
  27. Esfandiarpoor, S.; Pahlavan, A.; Goudarzi, M. Structure-aware online virtual machine consolidation for datacenter energy improvement in cloud computing. Comput. Electr. Eng. 2015, 42, 74–89. [Google Scholar] [CrossRef]
  28. Arianyan, E.; Taheri, H.; Sharifian, S. Novel energy and SLA efficient resource management heuristics for consolidation of virtual machines in cloud data centers. Comput. Electr. Eng. 2015, 47, 222–240. [Google Scholar] [CrossRef]
  29. Rodero, I.; Viswanathan, H.; Lee, E.; Gamell, M.; Pompili, D.; Parashar, M. Energy-efficient thermal-aware autonomic management of virtualized HPC cloud infrastructure. J. Grid Comput. 2012, 10, 447–473. [Google Scholar] [CrossRef]
  30. Li, Z.; Yan, C.; Yu, L.; Yu, X. Energy-aware and multi-resource overload probability constraint-based virtual machine dynamic consolidation method. Future Gener. Comput. Syst. 2018, 80, 139–156. [Google Scholar] [CrossRef]
  31. Sayadnavard, M.; Toroghi Haghighat, A.; Rahmani, A. A reliable energy-aware approach for dynamic virtual machine consolidation in cloud data centers. J. Supercomput. 2019, 75, 2126–2147. [Google Scholar] [CrossRef]
  32. Yuan, C.; Sun, X. Server consolidation based on culture multiple-ant-colony algorithm in cloud computing. Sensors 2019, 19, 2724. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  33. Lu, C.; Ye, K.; Xu, G.; Xu, C.; Bai, T. Imbalance in the cloud: An analysis on alibaba cluster trace. In Proceedings of the 2017 IEEE International Conference on Big Data (Big Data), Boston, MA, USA, 11–14 December 2017; pp. 2884–2892. [Google Scholar]
  34. Basmadjian, R.; De Meer, H. Evaluating and modeling power consumption of multi-core processors. In Proceedings of the 2012 Third International Conference On Future Systems: Where Energy, Computing and Communication Meet (e-Energy), Madrid, Spain, 9–11 May 2012; pp. 1–10. [Google Scholar]
  35. Brodersen, R. Minimizing Power Consumption in CMOS Circuits; Department of EECS University of California at Berkeley: Berkeley, CA, USA; Available online: https://sablok.tripod.com/verilog/paper.fm.pdf (accessed on 27 September 2022).
  36. Vincent, P.; Larochelle, H.; Lajoie, I.; Bengio, Y.; Manzagol, P.; Bottou, L. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 2010, 11, 3371–3408. [Google Scholar]
  37. Lei, T.; Zhang, Y.; Wang, S.; Dai, H.; Artzi, Y. Simple recurrent units for highly parallelizable recurrence. arXiv 2017, arXiv:1709.02755. [Google Scholar]
  38. Minartz, T.; Kunkel, J.; Ludwig, T. Simulation of power consumption of energy efficient cluster hardware. Comput. Sci.-Res. Dev. 2010, 25, 165–175. [Google Scholar] [CrossRef]
  39. Jin, Y.; Wen, Y.; Chen, Q.; Zhu, Z. An empirical investigation of the impact of server virtualization on energy efficiency for green data center. Comput. J. 2013, 56, 977–990. [Google Scholar] [CrossRef]
  40. Li, H.; Xiao, Y. CloudMatrix Lite: A Real Trace Driven Lightweight Cloud Data Center Simulation Framework. In Proceedings of the 2020 2nd International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI), Taiyuan, China, 23–25 October 2020; pp. 424–429. [Google Scholar]
  41. Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 8024–8035. [Google Scholar]
  42. Aljoumah, E.; Al-Mousawi, F.; Ahmad, I.; Al-Shammri, M.; Al-Jady, Z. SLA in cloud computing architectures: A comprehensive study. Int. J. Grid Distrib. Comput. 2015, 8, 7–32. [Google Scholar] [CrossRef]
  43. Cao, Z.; Dong, S. Dynamic VM consolidation for energy-aware and SLA violation reduction in cloud computing. In Proceedings of the 2012 13th International Conference on Parallel and Distributed Computing, Applications And Technologies, Beijing, China, 14–16 December 2012; pp. 363–369. [Google Scholar]
Figure 1. The general architecture of a multi-core CPU.
Figure 1. The general architecture of a multi-core CPU.
Electronics 11 03377 g001
Figure 2. Denoise autoencoder.
Figure 2. Denoise autoencoder.
Electronics 11 03377 g002
Figure 3. The network structure of the first autoencoder of the DAE-based filter.
Figure 3. The network structure of the first autoencoder of the DAE-based filter.
Electronics 11 03377 g003
Figure 4. The network structure of the second autoencoder of the DAE-based filter.
Figure 4. The network structure of the second autoencoder of the DAE-based filter.
Electronics 11 03377 g004
Figure 5. The network structure of the third autoencoder of the DAE-based filter.
Figure 5. The network structure of the third autoencoder of the DAE-based filter.
Electronics 11 03377 g005
Figure 6. The network structure of the compression decoder of the DAE-based filter.
Figure 6. The network structure of the compression decoder of the DAE-based filter.
Electronics 11 03377 g006
Figure 7. Example of the DAE-based filter.
Figure 7. Example of the DAE-based filter.
Electronics 11 03377 g007
Figure 8. Comparing the energy consumption of hosts by all methods in every time segment.
Figure 8. Comparing the energy consumption of hosts by all methods in every time segment.
Electronics 11 03377 g008
Figure 9. Comparing the total energy consumption hosts by all methods.
Figure 9. Comparing the total energy consumption hosts by all methods.
Electronics 11 03377 g009
Figure 10. Comparing the SLAV by all methods regarding CPU and memory.
Figure 10. Comparing the SLAV by all methods regarding CPU and memory.
Electronics 11 03377 g010
Figure 11. Comparing the total SLAV penalty cost by all methods.
Figure 11. Comparing the total SLAV penalty cost by all methods.
Electronics 11 03377 g011
Figure 12. Comparing the number of VM migrations triggered by all methods in every time segment.
Figure 12. Comparing the number of VM migrations triggered by all methods in every time segment.
Electronics 11 03377 g012
Figure 13. Comparing the total number of VM migrations triggered by all methods.
Figure 13. Comparing the total number of VM migrations triggered by all methods.
Electronics 11 03377 g013
Figure 14. Comparing the total cost by all methods.
Figure 14. Comparing the total cost by all methods.
Electronics 11 03377 g014
Table 1. Resource parameters of the hosts.
Table 1. Resource parameters of the hosts.
Host TypeCPUMemory
H l a r g e Intel Xeon CPU (16 cores)8 GB
H m e d i u m Intel Xeon CPU (8 cores)6 GB
H s m a l l Intel Xeon CPU (4 cores)4 GB
Table 2. Base power of the hosts.
Table 2. Base power of the hosts.
Host TypeBase Value (kW)
H l a r g e 108.2
H m e d i u m 103.8
H s m a l l 98.5
Table 3. Memory power parameters.
Table 3. Memory power parameters.
Host TypeValueMemory (kW)
H l a r g e p p e a k 0.21736
p i d l e 0.17576
H m e d i u m p p e a k 0.10868
p i d l e 0.08788
H s m a l l p p e a k 0.05434
p i d l e 0.04394
Table 4. Capacitance of different components of the processor.
Table 4. Capacitance of different components of the processor.
ComponentCapacitance
Chip-Level Mandatory0.103
Die-Level Mandatory0.301
On-chip Cache0.165
Off-chip Cache3.759
Inter-die0.595
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Li, H.; Wen, L.; Liu, Y.; Shen, Y. More than Meets One Core: An Energy-Aware Cost Optimization in Dynamic Multi-Core Processor Server Consolidation for Cloud Data Center. Electronics 2022, 11, 3377. https://doi.org/10.3390/electronics11203377

AMA Style

Li H, Wen L, Liu Y, Shen Y. More than Meets One Core: An Energy-Aware Cost Optimization in Dynamic Multi-Core Processor Server Consolidation for Cloud Data Center. Electronics. 2022; 11(20):3377. https://doi.org/10.3390/electronics11203377

Chicago/Turabian Style

Li, Huixi, Langyi Wen, Yinghui Liu, and Yongluo Shen. 2022. "More than Meets One Core: An Energy-Aware Cost Optimization in Dynamic Multi-Core Processor Server Consolidation for Cloud Data Center" Electronics 11, no. 20: 3377. https://doi.org/10.3390/electronics11203377

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop