More than Meets One Core: An Energy-Aware Cost Optimization in Dynamic Multi-Core Processor Server Consolidation for Cloud Data Center

Li, Huixi; Wen, Langyi; Liu, Yinghui; Shen, Yongluo

doi:10.3390/electronics11203377

Open AccessArticle

More than Meets One Core: An Energy-Aware Cost Optimization in Dynamic Multi-Core Processor Server Consolidation for Cloud Data Center

by

Huixi Li

^1,2,

Langyi Wen

¹,

Yinghui Liu

³ and

Yongluo Shen

^1,2,*

¹

School of Information Science, Guangdong University of Finance and Economics, Guangzhou 510320, China

²

Guangdong Intelligent Business Engineering Technology Research Center, Guangdong University of Finance and Economics, Guangzhou 510320, China

³

School of Chinese Language and Literature, Nanjing Xiaozhuang University, Nanjing 211171, China

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(20), 3377; https://doi.org/10.3390/electronics11203377

Submission received: 29 September 2022 / Revised: 14 October 2022 / Accepted: 18 October 2022 / Published: 19 October 2022

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The massive number of users has brought severe challenges in managing cloud data centers (CDCs) composed of multi-core processor that host cloud service providers. Guaranteeing the quality of service (QoS) of multiple users as well as reducing the operating costs of CDCs are major problems that need to be solved. To solve these problems, this paper establishes a cost model based on multi-core hosts in CDCs, which comprehensively consider the hosts’ energy costs, virtual machine (VM) migration costs, and service level agreement violation (SLAV) penalty costs. To optimize the goal, we design the following solution. We employ a DAE-based filter to preprocess the VM historical workload and use an SRU-based method to predict the computing resource usage of the VMs in future periods. Based on the predicted results, we trigger VM migrations before the hosts move into the overloaded state to reduce the occurrence of SLAV. A multi-core-aware heuristic algorithm is proposed to solve the placement problem. Simulations driven by the VM real workload dataset validate the effectiveness of our proposed method. Compared with the existing baseline methods, our proposed method reduces the total operating cost by 20.9~34.4%.

Keywords:

cloud computing; multi-core processor; server consolidation; VM migration; SLAV; energy consumption

1. Introduction

The world is entering a post-coronavirus era. Since countries and multinational cooperative organizations still have not formed a unified, reliable, and effective means of epidemic prevention, a local epidemic that could break out at any time brings a high risk of spreading to the world. This situation has forced people to further embrace cloud computing, migrating much of their economic, social, and personal activities online. For example, about 82% of Hong Kong businesses plan to maintain remote work in the post-COVID-19 era [1]. This trend has brought opportunities for cloud computing, as well as management pressure. According to estimates, the current compound annual growth rate of the Hong Kong data center market value is 12.6%, which means that the value will reach HKD 4.12 billion by 2026 [2]. The increase in market value means that practitioners need more cost investment.

Increasing the resource rate of cloud data centers (CDCs) is one of the most effective means to reduce management costs, but there is a conflict between reducing costs and the performance that cloud service customers receive. To improve resource usage, virtual machines (VMs) or containers assigned to users must be highly concentrated on physical hosts. However, a high degree of centralization brings a high degree of resource competition. When the competition is too intense, the host may be overloaded, thereby reducing the performance and user experience of VMs. To ensure the user experience, service level agreements (SLAs) are used to quantitatively describe the corresponding quality of service (QoS). If the SLA cannot be maintained, the QoS is threatened, a and SLA violation (SLAV) is generated. When a SLAV appears, cloud service providers (CSPs) need to provide compensation to users as punishment for failing to meet user performance requirements. Currently, server consolidation is used to dynamically adjust the load balance between hosts in a CDC. Server consolidation periodically checks the load of hosts in the cluster and initiates VM migration to achieve load balancing, thereby maintaining a balance between resource utilization and performance.

Multiple works designing server consolidation solutions assume that the physical host is equipped with a single-core CPU, and multi-core processors have long been popular in personal entertainment, scientific research, and data centers. A CPU package consists of multiple dies, and each die encapsulates multiple cores. Due to the involvement of inter-core communication, inter-die communication, and other CPU components, the power consumption of a multi-core CPU is much higher than that calculated by the single-core CPU power consumption model. Therefore, the server consolidation model based on a single-core processor cannot accurately describe the user’s energy demand. In addition, CSPs need to provide additional overhead to maintain VM migration in server consolidation and possible SLAV compensation. In this paper, we establish a server consolidation cost model based on the use of multi-core processor memory resources, VM migration, and SLAV compensation and propose corresponding solutions to achieve a balance between cost and performance. Our contributions are as follows:

(A): We formally define a host power consumption model based on multi-core CPU and memory resource usage and describe the cost of VM migration and SLAV on this basis. After proposing the cost model, we give the corresponding optimization problem.
(B): A denoise autoencoder-based filter is used to denoise the VM workload trace. Subsequently, we use the SRU-based RNN method to predict the workload of VMs. Based on the predicted results, a host load detection strategy is proposed that considers both current and future load conditions.
(C): To minimize the total cost of server consolidation, we propose a VM selection strategy and a VM placement algorithm. These methods take into account the scheduling and placement of VMs between different cores of the same CPU and between different CPUs of different hosts, as well as the current and future requirements of VMs for different resources.
(D): We conduct simulations to evaluate the performance of our proposed solution MMCC. The simulations’ results indicate that MMCC can reduce host energy consumption by 10~43.9%, SLAV cost by 33.5~51.7%, and total cost by 20.9~34.4% compared to the baseline methods.

The remainder of the paper is organized as follows. In Section 2, we survey the related work. In Section 3, we formalize the cost model and define the corresponding optimization problem. In Section 4, we propose a heuristic algorithm to solve this problem. In Section 5, we evaluate the performance of our proposed method using trace-driven simulations based on real VM workloads. In Section 6, we include the paper and discuss future works.

2. Related Work

In this section, we survey the CDC cost model related to server consolidation and the corresponding solutions.

2.1. Server Consolidation Cost Models

Based on single-core CPU usage or performance, a large number of works on server consolidation proposed host energy models [3,4,5,6,7,8,9,10,11,12]. Nagadevi et al. [13] proposed a VM placement algorithm based on multi-core processors, but they did not consider factors related to dynamic consolidation, energy consumption, and cost throughout the data center life cycle. The above work also did not consider the energy consumption of the processor at the die level and the chip level.

In addition, the composition of a host’s energy consumption is not only related to the CPU factor. Therefore, several works have proposed multi-resource utilization-oriented host energy models [14,15,16,17,18,19]. However, these models only consider the energy consumption when the host acts as an independent object and do not consider the additional energy consumption of the VM migration due to the increase of the host load during server consolidation.

To ensure user performance and service quality, Buyya et al. [20] proposed a CPU-based SLAV calculation method, which was widely adopted in many subsequent works [21,22,23,24,25,26,27,28,29]. However, the quality of service (QoS) of users when using VMs cannot be measured only by CPU performance, and SLAV must involve the use of multiple resources.

2.2. Server Consolidation Solutions

Buyya et al. [20] first proposed the classic four-step server consolidation solution. The first step is host load detection, which picks out overloaded and underloaded hosts in the cluster. The second step is VM selection for overloaded hosts. In order to reduce the host load and the occurrences of SLAV, suitable VMs are selected and added into a VM migration list. The third step is VM placement, which selects the suitable destination hosts for all objects in the VM migration list. After VM placement, underloaded hosts are handled. By migrating all the VMs on the underloaded host to other suitable hosts as much as possible and shutting down or switching these underloaded hosts to an energy-saving state, the host energy cost of the CDC can be further reduced. At present, most of the specific execution strategies for solving server consolidation are heuristic. Based on multiple resource constraints, Li et al. [30] proposed a server consolidation method that not only reduces energy consumption but also ensures QoS, but this method only guarantees the QoS of the users in terms of CPU usage. YADAV et al. [25] mainly considered the network overhead and proposed an adaptive host overloaded detection method and VM selection algorithm. Sayadnavard et al. [31] proposed a server consolidation method based on multiple resource constraints, but the optimization goal is to minimize the number of hosts used by the VM placement, and it ignores other types of costs. Yuan et al. [32] used the culture multiple-ant-colony algorithm to solve the server consolidation problem without SLAV constraints.

None of the models proposed in the above works simultaneously consider the costs associated with multi-resource usage, multi-core processors, multi-resource SLAV, and VM migration.

3. Cost Model and Problem Description

In this section, we first formally describe the multi-core processor-based cost model in server consolidation of CDC and then formulate a problem description based on this.

3.1. Cost Model

In CDC, the cost related to server consolidation mainly involves hosts, VM migrations, and SLAV compensation.

Before giving a specific cost model, we first describe the time and objects of the entire system. There are N heterogeneous hosts in the CDC, forming the host set

H = h_{1}, h_{2}, \dots, h_{N}

. The total amount of resources that a host

h_{i}

can provide is marked as a scalar

C_{i} = (c_{i}^{c p u}, c_{i}^{m e m})

, where

c_{i}^{c p u}

and

c_{i}^{m e m}

are the total amount of CPU and memory resources, respectively. The CPU is multi-core; hence we have

c_{i}^{c p u} = (c_{i}^{c o r e_{1}}, c_{i}^{c o r e_{2}}, \dots, c_{i}^{c o r e_{c n_{i}}})

, where

c n_{i}

is the number of cores in the processor on

h_{i}

. Generally speaking, we make

c_{i}^{c o r e_{1}} = c_{i}^{c o r e_{2}} = \dots = c_{i}^{c o r e_{c n_{i}}}

, where

c_{i}^{c o r e_{c n_{i}}}

is the total amount of computing resources that each core can provide. There are M VMs running on these hosts, forming a VM set

V = v_{1}, v_{2}, \dots, v_{M}

. When a user makes a VM

v_{j}

request, the submitted resource requirements are marked as scalar

D_{i} = (d_{j}^{c p u}, d_{j}^{m e m})

, where

d_{j}^{c p u}

and

d_{j}^{m e m}

are the total requirements of

v_{j}

for CPU and memory, respectively. We assume that each VM is a single-core task; that is, only the computing resources of a single core can be used by a certain VM.

The life cycle of a CDC

[0, L T]

is divided into L small and equal-length consecutive time segments

t_{1}, t_{2}, \dots, t_{L}

, and each time segment has a length of T. In a certain time segment

t_{k}

, if a host

h_{i}

is in working state,

λ_{i, k} = 1

, otherwise

λ_{i, k} = 0

. At this time, the amount that the host can provide for each resource is

R_{i, k} = (r_{i, k}^{c p u}, r_{i, k}^{m e m})

, where

r_{i, k}^{c p u} = (r_{i, k}^{c o r e_{1}}, r_{i, k}^{c o r e_{2}}, \dots, r_{i, k}^{c o r e_{c n_{i}}})

, where

r_{i, k}^{c o r e_{c n_{i}}}

is the amount of resources that the

c n_{i}

-th core can provide in

t_{k}

. In

t_{k}

, the amount of resources demanded by the VM

v_{j}

is denoted as

S_{j, k} = (s_{j, k}^{c p u}, s_{j, k}^{m e m})

.

We summarize the total cost of a CDC for a given lifetime by analyzing the performance of each computing device in each time slice. In general, in addition to the operating cost of the hosts, it is also necessary to consider the cost of VM migration during each server consolidation and the penalty caused by the occurrences of SLAV. We will discuss them separately in the following summaries.

Host Cost Model

Given a host

h_{i}

, its running cost

C_{h_{i}}

is mainly related to the electricity charge

E P

and its power

p_{h_{i}, t}

at a given time t, namely:

C_{h_{i}} = E P \times \int_{0}^{T L} p_{h_{i}, t} d t .

(1)

It should be noted that if

h_{i}

is powered off or in a power-saving state, its power consumption is negligible, so it will not incur any electricity-related costs. The analysis [33] of VM traces in the Alibaba CDC shows that the demand for CPU and memory resources of VMs far exceeds that of disk and network I/O. In this paper, we consider that the power of a host is related to the CPU, memory, and other basic components (motherboard, network card, disk, etc.). We also consider the power consumption of basic components to be a fixed value, so the power consumption of CPU and memory is discussed below.

CPU power model

Buyya et al. [20] leveraged a single-core-based host power model in server consolidation; that is, the power of the CPU is related to its only core. Modern processors are multi-core architectures. Multiple cores are packaged on multiple CPU dies. The general architecture of a multi-core CPU is shown in Figure 1.

The total power consumption of the processor involves chip-level mandatory components, cores, die-level mandatory components, communication between cores, and communication between dies. In addition, modern processors employ energy-efficient mechanisms (such as Intel’s SpeedStep) to optimize the power consumption of the CPU, which means that the power consumption of the CPU is not linearly related to its usage. We describe the power description of a given CPU at a given moment as:

P_{c p u} = (1 - r) \times (P_{c m} + P_{d i e s} + P_{i n t e r d i e}),

(2)

where r is the energy-efficient factor,

P_{c m}

is the power consumption of chip-level mandatory components,

P_{d i e s}

is the power consumption of dies, and

P_{i n t e r d i e}

is the power consumption of inter-die communication. Next, we give the models of the above factors and energy consumption, respectively. In the case of not using an energy-efficient mechanism, the actual power when the n cores of the processor perform calculations at the same time is

P_{a c t}

:

P_{a c t} = P_{c m} + P_{d i e s} + P_{i n t e r d i e} .

(3)

In addition, we denote the total power of all cores as:

P_{c o r e s} = \sum_{k = 1}^{n} P_{c o r e}^{k},

(4)

where

P_{c o r e}^{k}

is the power consumption of the k-th core when other cores are idle and it is computing alone.

Basmadjian et al. [34] performed experiments to analyze the power consumption of chip-level mandatory components such as voltage regulators for:

P_{c m} = P_{c o r e s} - P_{a c t} = s (v, f),

(5)

where s is the capacitance function, v is the voltage, and f is the frequency.

s (v, f) = c e \times v^{2} \times f,

(6)

where

c e

is the effective capacitance [35].

Communication between dies occurs when cores on different dies access data at the same memory address. The power consumption of inter-die communication is:

P_{i n t e r d i e} = \sum_{j = 1 | d_{j} \in D}^{| d | - 1} c e \times v_{j}^{2} \times f_{j},

(7)

where

v_{j}

and

f_{j}

are the voltage and frequency of the corresponding cores on

d i e j

,

d_{j}

is the set of active cores on the j-th die, and D is the set of dies related to communication, and they are:

D = {d_{j} | d_{j} \neq ⌀},

(8)

d_{j} = {c o r e_{i, j} | u (c o r e_{i, j}) > 0},

(9)

where

c o r e_{i, j}

is the i-th core on the j-th die,

u (c o r e_{i, j})

is the current utilization of

c o r e_{i, j}

,

i \in [1, n_{j}]

, and

n_{j}

is the total number of cores on the j-th die. We also have:

v_{j} = m a x {v_{c o r e_{i, j}} | u (c o r e_{i, j}) > 0},

(10)

v_{j} = m a x {f_{c o r e_{i, j}} | u (c o r e_{i, j}) > 0} .

(11)

Equations (10) and (11) show that when there is only one active core on the j-th die,

v_{j}

and

f_{j}

of the j-th die are the voltage and frequency of the core.

The power of a single die can be described as:

P_{d i e}^{j} = P_{m d}^{j} + P_{c o r e s}^{j} + P_{o f f}^{j},

(12)

where

P_{m d}^{j}

is the power consumption of die-level mandatory components,

P_{c o r e s}^{j}

is the power consumption of

n_{j}

constituent cores, and

P_{o f f}^{j}

is the power consumption of off-chip caches. We leverage Equation (5) to model

P_{m d}^{j}

.

Inter-core communication occurs between multiple cores on a single die j. Therefore, the core-level power consumption model is:

P_{c o r e s}^{j} = P_{d c}^{j} + P_{i n t e r c o r e}^{j},

(13)

where

P_{d c}^{j}

is the power consumption of all active cores on j-th die, and

P_{i n t e r c o r e}^{j}

is the inter-core communication power consumption between the active cores.

The power consumption of a single core

c o r e_{i, j}

is described as:

P_{c o r e_{i, j}}^{j} = P_{e x c}^{c o r e_{i, j}} + P_{o n},

(14)

where

P_{e x c}^{c o r e_{i, j}}

and

P_{o n}

are the power consumptions of exclusive components (e.g., ALU) and on-chip caches of

c o r e_{i, j}

, respectively. Based on the model in [20], we consider that

P_{e x c}^{c o r e_{i, j}}

is linearly related to the utilization of the core, therefore:

P_{e x c}^{c o r e_{i, j}} = P_{m a x}^{c o r e_{i, j}} \times \frac{u (c o r e_{i, j})}{100},

(15)

where

P_{m a x}^{c o r e_{i, j}}

is the power consumption of

c o r e_{i, j}

at the maximum utilization, which can be calculated by the model in Equation (5).

The power consumption of on-chip caches is:

P_{o n} = \sum_{i = 1}^{s} P_{L_{i}},

(16)

where s is the number of the on-chip caches,

P_{L_{i}}

is the power consumption of on-chip cache

L_{i}

, which can be calculated by the model in Equation (5).

Hence, the power consumption of all active cores on the j-th die is:

P_{d c}^{j} = \sum_{i = 1 | c o r e_{i, j} \in d_{j}}^{n_{j}} P_{c o r e_{i, j}} .

(17)

By dynamically adjusting voltage and frequency and turning off temporarily unused components, the energy-efficient mechanism can effectively optimize processor power consumption. This part of the power consumption reduction is mainly affected by three factors: (1) components and communication between cores, (2) changes in the frequency of a single core, and (3) the number of cores. Here we define the three factors.

The first factor is

α = 1 - \frac{P_{a c t}}{P_{c o r e s}} .

(18)

The second factor is

β = \frac{α}{f},

(19)

where f is the given frequency. For a multi-core processor, we have:

f = a v e r a g e {f_{c o r e_{i, j}} | c o r e_{i, j} \in d_{i, j}, j \in [1, m]},

(20)

where m is the number of dies.

The third factor is

γ = \{\begin{matrix} \frac{α}{k}, & k \geq 2, \\ 0, & o t h e r w i s e, \end{matrix}

(21)

where

k = \sum_{j = 1}^{m} | d_{j} |

is the total number of all active cores on the processor.

Based on Equations (18), (19), and (21), we obtain the power reduction factor:

r = α + β + γ .

(22)

Based on the above analysis, the processor power consumption

P_{c p u}

of a given host can be obtained. For the host

h_{i}

, the power consumption of its processor at a certain time t is denoted as

P_{i, t}^{c p u}

. It can be said that

P_{i, t}^{c p u}

is a function of the current utilization of each core on the processor.

Memory power model

In all current public data traces of a VM, workload records in the CDC provide the memory usage of monitored objects within a certain period of time. Therefore, the current footprint

u_{i, t}^{m e m}

of the memory is used to estimate the power consumption

P_{i, t}^{m e m}

of the host

h_{i}

at a given time t:

P_{i, t}^{m e m} = P_{i d l e, i}^{m e m} + α_{i}^{m e m} \times u_{i, t}^{m e m},

(23)

where

P_{i d l e, i}^{m e m}

is the memory power consumption when

h_{i}

is idle, and

α_{i}^{m e m}

is the memory power factor. According to the analysis by Esfandiarpoor et al. [27], when

α_{i}^{m e m} = 0.3 W / G B

, the power consumption of a DDR memory system can be estimated more accurately.

In summary, we obtain the total power of host

h_{i}

:

P_{h_{i}, t} = P_{i, t}^{c p u} + P_{i, t}^{m e m} + P_{i, t}^{b a s e}

(24)

Combining Equation (24) into Equation (1), we obtain the energy consumption cost

C_{H}

of all hosts. We divide the life cycle of the CDC into multiple time segments and analyze the energy consumption separately in each time segment. Then, we have:

C_{H} = E P \times \sum_{k = 1}^{L} \int_{0}^{T} λ_{i, k} \times P_{h_{i}, t} d t .

(25)

3.2. VM Migration Cost

We assume that at the beginning of each time segment, the CDC performs server consolidation to achieve the balance between the CSP’s cost and the user’s performance. VM migration is an important part of server consolidation. In a cluster composed of multi-core processor hosts, there are two types of VM migrations. The first is the inter-core migration on the same host, and the second is the inter-host migration between different hosts. Inter-core migration occurs when the core where the VMs are located is overloaded, and other cores of the same processor have sufficient computing resources. The VM migrates from one core of the processor to another core in a very short period of time through inter-core or inter-die communication. The inter-core migration does not involve memory, and the main impact is the hit rate of the processor cache. Therefore, the energy overhead of inter-core migration is negligible.

Next, we discuss the energy cost of inter-host migration. We use live migration technology to migrate VMs between different hosts. During live migration of a VM, the memory data of the VM is transmitted. Although VMs generate dirty pages during live migration, the research [28] indicates that the energy consumption of a VM live migration is positively related to the memory size of that VM. Therefore, we can assume that the larger the VM memory size is , the longer the migration time and the more energy consumption will be.

When migrating a VM

v_{j}

from host

h_{i}

to another host

h_{i}^{'}

, we assume that

h_{i}

reserves enough resources to support the migration of

v_{j}

, and

h_{i^{'}}

also reserves enough resources to run

v_{j}

. Buyya et al. [20] assumed that a VM would consume an extra 10% CPU usage to maintain the migration. In this paper, we extend this assumption to the memory resource usage of VM migration. In addition, we assume that the CDC deploys an exclusive network for VM migrations. We denote the size of the dedicated migration bandwidth of

h_{i}

as

M I G_N E T_{i}

. The total cost of VM migrations in a given life cycle is denoted as

C_{m i g}

.

C_{m i g}

is described as:

C_{m i g} = \sum_{k = 1}^{L} (C_{k}^{m i g_c p u} + C_{k}^{m i g_m e m}),

(26)

where

C_{k}^{m i g_c p u}

and

C_{k}^{m i g_m e m}

are the migration costs caused by CPU and memory in

t_{k}

, respectively.

C_{k}^{m i g_m e m}

is calculated as:

C_{k}^{m i g_m e m} = \sum_{i = 1}^{N} \sum_{j = 1}^{M} [E P \times \int_{t = 0}^{t_{j, k}^{m i g}} (γ_{j, i, x_{i}, i^{'}, x_{i^{'}}, k} \times P_{j, k}^{m i g_m e m}) d k],

(27)

where

γ_{j, i, x_{i}, i^{'}, x_{i^{'}}, k}

is a 0-1 indicator,

P_{j, k}^{m i g_m e m}

is the power consumption generated by migrating the memory data of

v_{j}

, and

t_{j, k}^{m i g}

is the time spent migrating

v_{j}

. If VM

v_{j}

needs to be migrated from the

x_{i}

-th core of the processor of the host

h_{i}

to the

x_{i^{'}}

-th core of another host

h_{i^{'}}

, then

γ_{j, i, x_{i}, i^{'}, x_{i^{'}}, k} = 1

; otherwise

γ_{j, i, x_{i}, i^{'}, x_{i^{'}}, k} = 0

. Since VM memory is the main data transferred during migration, we have:

t_{j, k}^{m i g} = \frac{s_{j, k}^{m e m}}{m i g_b w_{i, k}},

(28)

where

m i g_b w_{i, k}

is the migration bandwidth size assigned to

v_{j}

. We consider that the migration bandwidth is evenly assigned to every migrated VM within

t_{k}

on

h_{i}

. Hence, for a given source host

h_{i}

and a destination host

h_{i^{'}}

, we obtain:

m i g_b w_{i, k} = \frac{M I G_N E T_{i}}{\sum_{i = 1}^{N} \sum_{j = 1}^{M} \sum_{i^{'} = 1}^{N} γ_{j, i, x_{i}, i^{'}, x_{i^{'}}, k}},

(29)

then we have

t_{j, k}^{m i g} = \frac{s_{j, k}^{m e m} \times \sum_{i = 1}^{N} \sum_{j = 1}^{M} \sum_{i^{'} = 1}^{N} γ_{j, i, x_{i}, i^{'}, x_{i^{'}}, k}}{m i g_b w_{i, k} \times M I G_N E T_{i}} .

(30)

After this, we substitute Equation (30) into Equation (27). We let

p_{j, k}^{v m e m}

be the memory power of

v_{j}

within

t_{k}

, and the memory migration cost of

v_{j}

is

0.1 \times p_{j, k}^{v m e m} = 0.1 \times α_{i}^{m e m} \times s_{j, k}^{m e m}

.

Next, we discuss

C_{k}^{m i g_c p u}

. We assume here that the power consumption generated by a host in a CDC is mainly used to keep the VM running. Since the processor power consumption

P_{i, k}^{c p u}

is related to its respective core in the current utilization

(c_{i}^{c o r e_{c n_{i}}} - r_{i}^{c o r e_{1}}, c_{i}^{c o r e_{c n_{i}}} - r_{i}^{c o r e_{2}}, \dots, c_{i}^{c o r e_{c n_{i}}} - r_{i}^{c o r e_{c n_{i}}})

, it can be written as

P_{i, k}^{c p u} (c_{i}^{c o r e_{c n_{i}}} - r_{i, k}^{c o r e_{1}}, c_{i}^{c o r e_{c n_{i}}} - r_{i, k}^{c o r e_{2}}, \dots,

c_{i}^{c o r e_{c n_{i}}} - r_{i, k}^{c o r e_{c n_{i}}})

. For a given core on the processor

c o r e_{x}

, where

x \in [1, c n_{i}]

, if a VM needs to be migrated to another host at this time, its CPU utilization

u_{i, k}^{m i g_c o r e_{x}}

is:

u_{i, k}^{m i g_c o r e_{x}} = c_{i}^{c o r e_{x}} - r_{i, k}^{c o r e_{x}} + 0.1 \times \sum_{j = 1}^{M} (γ_{j, i, x_{i}, i^{'}, x_{i^{'}}, k} \times s_{j, k}^{c p u}) .

(31)

Hence, the power consumption of host

h_{i}

during inter-host migration is:

{(P_{i, k}^{c p u})}^{'} = P_{i, k}^{c p u} (u_{i, k}^{m i g_c o r e_{1}}, \dots, u_{i, k}^{m i g_c o r e_{c n_{i}}}) .

(32)

Then, we combine Equation (32) into Equation (2). We denote the updated host energy consumption cost

C_{H}

as

C_{H}^{'}

.

SLAV Penalty Cost

In a CDC, to guarantee user QoS, CSPs must provide SLAV compensation to relevant users in some form. This part of the overhead needs to be included in the cost consideration of the CDC. In this paper, we extend the single-core CPU SLAV definition by Buyya et al. [20] to multi-core CPU and memory. They are denoted as

S L A V_{c p u}

and

S L A V_{m e m}

, respectively.

For the processor, it is considered overloaded only if all its cores are overloaded. Hence, we have

S L A V_{c p u} = \frac{1}{N} \sum_{i = 1}^{N} \frac{T_{i}^{s, c p u}}{T_{i}^{a, c p u}} \times \frac{1}{M} \sum_{j = 1}^{M} \sum_{k = 1}^{L} \frac{u d_{i, k}^{d, c p u}}{s_{i, k}^{r, c p u}},

(33)

where

T_{i}^{s, c p u}

is CPU SLAV duration caused by all cores overloaded on

h_{i}

,

T_{i}^{a, c p u}

is the total working duration of the host, and

d_{i}^{d, c p u}

is the size of the unsatisfied CPU resource demand as a result of

v_{j}

migration in

t_{k}

.

Likewise, we propose the formal definition of

S L A V_{m e m}

:

S L A V_{m e m} = \frac{1}{N} \sum_{i = 1}^{N} \frac{T_{i}^{s, m e m}}{T_{i}^{a, m e m}} \times \frac{1}{M} \sum_{j = 1}^{M} \sum_{k = 1}^{L} \frac{u d_{i, k}^{d, m e m}}{s_{i, k}^{r, m e m}} .

(34)

We denote CPU and memory SLAV compensation price indices as

p u n_{c p u}

and

p u n_{m e m}

, respectively. Then, we have:

C_{S L A V} = p u n_{c p u} \times S L A V_{c p u} + p u n_{m e m} \times S L A V_{m e m} .

(35)

3.3. Problem Description

In the above Section 3.1, we analyze the factors involved in the operating cost in a CDC, which are the host energy consumption cost

C_{H}^{'}

, the VM migration cost

C_{m i g}

, and the SLAV penalty cost

C_{S L A V}

. In this paper, our research goal is to minimize the associated operating cost C of the CDC. Combining the above models, we have a minimizing multi-core-host-based cost problem in server consolidation (MMCC):

M I N C = C_{H}^{'} + \sum_{k = 1}^{L} C_{k}^{m i g_m e m} + C_{S L A V} .

(36)

A 0–1 indicator

β_{i, j, x_{i}, k}

is used to mark whether the VM

v_{j}

is running on the

x_{i}

-th core of the host

h_{i}

’s processor at the beginning of the

t_{k}

time period. If

v_{j}

runs on the

x_{i}

-th core of the host

h_{i}

, then

β_{i, j, x_{i}, k} = 1

, otherwise

β_{i, j, x_{i}, k} = 0

. The constraints of the MMCC problem are:

\sum_{i = 1}^{N} \sum_{x_{i} = 1}^{c o r e_{i}} β_{i, j, x_{i}, k} = 1, \forall j, \forall k,

(37)

\sum_{i^{'} = 1}^{N} \sum_{x_{i^{'}} = 1}^{c o r e_{i^{'}}} γ_{j, i, x_{i}, i^{'}, x_{i^{'}}, k} = 1, i \neq i^{'}, \forall j, \forall k,

(38)

\sum_{j = 1}^{M} β_{i, j, x_{i}, k} \times s_{j, k}^{c p u} \leq r_{i, k}^{c o r e_{i}}, \forall i, \forall x_{i}, \forall k,

(39)

\sum_{j = 1}^{M} β_{i, j, x_{i}, k} \times s_{j, k}^{m e m} \leq r_{i, k}^{m e m}, \forall i, \forall k .

(40)

Constraint (37) means that in any period, any VM can only run on a specific core on a unique specific host. Constraint (38) means that in any period, a VM migrated from any host can only have a unique destination host and a unique core. Constraint (39) and (40) mean that in any period, the CPU and memory resources provided by each host to the VM cannot exceed its resource upper limits.

In the following, we analyze the complexity of the MMCC problem by considering a simple case of the problem. If the hosts in the CDC are homogeneous, the resource requirements of any VM

v_{j}

in any time segment

t_{k}

are fixed values and satisfy constraints (39) and (40). Then, the VM migration cost and SLAV penalty cost are both zero, and the objective function of the MMCC problem is:

M I N C = C_{H} .

(41)

Obviously, the MMCC problem in this simple case can be reduced to the bin-packing problem. Since the bin-packing problem is NP-hard, the MMCC problem is also NP-hard.

4. Solution for MMCC Problem

Since the MMCC problem is NP-hard, we propose a heuristic based on the traditional four-step method for dealing with server consolidation. The first step is host workload detection, the second step is VM selection, the third and fourth steps are VM placements for VM from the overloaded and underloaded hosts. Before performing host overloading detection and VM selection, we will first predict the future workload trends of the VM based on its workload history. The purpose of this is to balance the load of hosts before they become overloaded and trigger SLAV occurrences, thereby reducing costs as much as possible.

4.1. VM Workload Prediction

Before predicting the future workload of a VM, we first need to preprocess its workload trace. The sampling frequency and precision cause a certain deviation between the historical sampling records and the actual usage of resources by the VM. To minimize the impact of these biases on the final result, we denoise by assuming that there is noise in the workload’s history. In addition, we do not need to spend high computing power and a lot of time to obtain accurate prediction results. We only need to roughly judge a general trend of the VM’s resource usage in the future.

In this paper, we utilize a classic denoise autoencoder [36] (DAE) based filter algorithm to preprocess the workload of VMs. Figure 2 shows the general structure of the DEA mechanism.

In Figure 2, x is the initialization input, and

\tilde{x} \sim q_{D} (\tilde{x} | x)

is the stochastic mapping of x. Then, the autoencoder maps

\tilde{x}

to y with the encoder

f_{θ}

and generates the reconstruction z with the decoder

g_{θ^{'}}

. The reconstruction error is measured by the loss function

L_{H} (x, z)

. In our proposed DAE-based filter, three autoencoders and one compression decoder are assembled, and their network structures are shown in Figure 3, Figure 4, Figure 5 and Figure 6. Figure 7 shows the result of processing a segment of CPU usage records of a VM using the DEA-based filter.

Traditional RNNs cannot be parallelized, so there is a problem of slow training speed. To address this issue, we employ an SRU-based approach to predict the workload of VMs. Simple recurrent units (SRU) [37] eliminate the time dependency of most operations, enabling parallel processing. Experiments [37] show that the processing speed of the SRU is more than ten times faster than that of traditional LSTM under the condition of similar result accuracy. Since the SRU has been open-sourced and its usage method is not much different from LSTM, we will not discuss the theoretical details of SRU in this article.

After predicting the resource usage of each VM at the next time segment, we can perform host load detection and VM selection.

4.2. Host Workload Detection

The purpose of host overloaded detection is to avoid and eliminate the fierce competition of VMs for resources, thereby reducing the occurrences of SLAV. Common host overloaded detection methods are divided into two categories, static threshold method and dynamic threshold method. In the static threshold method, the resource uasge thresholds are set as fixed values. When the usage exceeds the threshold, the host is in an overloaded state, and SLAV occurs. At this time, the VMs must be migrated to reduce the load. In the dynamic threshold, CSPs analyze the use of computing resources through various statistical methods to determine whether the competition for resources is fierce and whether the hosts are overloaded. The advantage of the static threshold method is that the host resources are fully utilized, but the disadvantage is that more overhead is required to reduce the SLAV. The advantage of the dynamic threshold method is that it can effectively reduce the SLAV, but the disadvantage is that sometimes the usages of hosts’ resources are not sufficient. Therefore, we combine the advantages of the two and propose the double insurance-based fixed threshold overloading detection method (DIFT).

In DIFT, the first insurance is that the host cannot overload the CPU and memory resources during the current period. The second insurance is that the host cannot overload the CPU and memory resources in the next period. For a given host

h_{i}

, DIFT first detects whether the usages of various resources on

h_{i}

exceed the given thresholds at the beginning of the

t_{k}

time period, and then, based on the prediction results of the SRU method, we judge in the next time period

t_{k + 1}

whether the usages of various resources on

h_{i}

exceed the given thresholds.

Since the VM migrations are divided into inter-core migrations and inter-host migrations, we correspondingly divide the CPU overload of the host into two situations: processor-overloaded and cores-overloaded. When the host is processor-overloaded, all cores on the processor are in an overloaded state. When the host is cores-overloaded, some (but not all) cores on the processor are in the overloaded state.

Let the overloaded threshold be

T H_{u p} = T H_{u p}^{c p u}, T H_{u p}^{m e m}

, where both

T H_{u p}^{c p u}

and

T H_{u p}^{m e m}

are in the interval

(0, 1)

. For any

c o r e_{x_{i}}

on

h_{i}

, when the following inequality holds in

t_{k}

, it is in the state of processor-overloaded:

\sum_{j = 1}^{M} β_{i, j, x_{i}, k} \times s_{j, k}^{c p u} > T H_{u p}^{c p u} \times c_{i}^{c o r e_{x_{i}}},

(42)

\sum_{j = 1}^{M} β_{i, j, x_{i}, k} \times s_{j, k + 1}^{c p u} > T H_{u p}^{c p u} \times c_{i}^{c o r e_{x_{i}}} .

(43)

For some

c o r e_{x_{i}}

on

h_{i}

(the number of

c o r e_{x_{i}}

that satisfy the condition cannot exceed

c n_{i}

), when the above in Equations (42) and (43) are established in

t_{k}

, it is in the cores-overloaded state.

Host

h_{i}

is in a memory-overloaded state when the following inequality holds in

t_{k}

:

\sum_{j = 1}^{M} β_{i, j, x_{i}, k} \times s_{j, k}^{m e m} > T H_{u p}^{m e m} \times c_{i}^{m e m},

(44)

\sum_{j = 1}^{M} β_{i, j, x_{i}, k} \times s_{j, k + 1}^{m e m} > T H_{u p}^{m e m} \times c_{i}^{m e m} .

(45)

When the host is memory-overloaded or processor-overloaded, it must be in the host-overloaded state, and VM inter-host migration is required at this time. The situation where the host has only cores-overloaded is called semi-overloaded, and inter-core migrations can be preferentially leveraged at this time.

For an underloaded host, all VMs on it are migrated to other suitable hosts through inter-host migration; hence there is no need to consider inter-core migration requirements. Let the underloaded threshold be

T H_{d o w n} = T H_{d o w n}^{c p u}, T H_{d o w n}^{m e m}

, where both

T H_{d o w n}^{c p u}

and

T H_{d o w n}^{m e m}

are in the interval

(0, 1)

. For

c o r e_{x_{i}}

on

h_{i}

, when the following inequalities hold in

t_{k}

, it is in the host-underloaded state:

\sum_{j = 1}^{M} β_{i, j, x_{i}, k} \times s_{j, k}^{c p u} < T H_{d o w n}^{c p u} \times c_{i}^{c o r e_{x_{i}}},

(46)

\sum_{j = 1}^{M} β_{i, j, x_{i}, k} \times s_{j, k + 1}^{c p u} < T H_{d o w n}^{c p u} \times c_{i}^{c o r e_{x_{i}}} .

(47)

\sum_{j = 1}^{M} β_{i, j, x_{i}, k} \times s_{j, k}^{m e m} < T H_{d o w n}^{m e m} \times c_{i}^{m e m},

(48)

\sum_{j = 1}^{M} β_{i, j, x_{i}, k} \times s_{j, k + 1}^{m e m} < T H_{d o w n}^{m e m} \times c_{i}^{m e m} .

(49)

4.3. VM Selection

VM selection is for overloaded hosts. The reason why we use the DIFT method is to avoid host overload and SLAV as much as possible, rather than react passively after SLAV occurs. Therefore, we can assume that in the

t_{k + 1}

time segment, there would be slight SLAV and host overload in the CDC. Our priority in VM selection is to select the VMs on each host

h_{i}

that may cause

h_{i}

to be overloaded during

t_{(} k + 1)

at the

t_{k}

time segment and form a list of VMs to be migrated. If after the migrations of these VMs are completed,

h_{i}

is still in the overloaded state during the

t_{k}

period, then targeted processing will be performed. We discuss VM selection strategies under various overloaded states within

t_{l}

(e.g.,

l = k + 1

) in different cases.

Case 1: Host with semi-overloaded

In this case, we need to reduce the load on each overloaded core. Given the j-th overloaded core

c o r e_{i, j, l}

on host

h_{i}

in

t_{l}

, we denote the set of n VMs running on it as

V_{i, j, k} = {v_{i, j, k}^{1}, v_{i, j, k}^{2}, \dots, v_{i, j, k}^{n}}

, its total resources are

c_{c o r e_{i, j, k}}

, and the current available resource is

r_{c o r e_{i, j, k}}

. For a VM

v_{i, j, k}^{q} \in V_{i, j, k}

, the amount of CPU resources it uses is denoted as

c p u_{v_{i, j, k}^{q}}

. Then, each selection chooses the VM

v_{i, j, k}^{q}

that has the minimum value of

| (1 - T H_{u p}^{c p u}) \times c_{c o r e_{i, j, k}} - (r_{c o r e_{i, j, k}} + c p u_{v_{i, j, k}^{q}}) |

into the inter-core migration list. We select a VM at a time until

r_{c o r e_{i, j, k}} \geq (1 - T H_{u p}^{c p u}) \times c_{c o r e_{i, j, k}}

.

Case 2: Host with only memory overloaded

Given a memory-overloaded host

h_{i}

in

t_{l}

, we denote the set of n VMs running on it as

V_{i, j, k} = {v_{i, j, k}^{1}, v_{i, j, k}^{2}, \dots, v_{i, j, k}^{n}}

, the total amount of memory resource it has is

c_{m e m_{i, l}}

, and the currently available amount of resources is

r_{m e m_{i, l}}

. For a VM

v_{i, k}^{q} \in V_{i, k}

, the amount of memory resources used by it is recorded as

m e m_{v_{i, l}^{q}}

. Then, each selection chooses the VM

v_{i, k}^{q}

that has the minimum value of

| (1 - T H_{u p}^{m e m}) \times c_{m e m_{i, k}} - (r_{m e m_{i, k}} + m e m_{v_{i, k}^{q}}) |

into the inter-host migration list. We select a VM at a time until

r_{m e m_{i, k}} \geq (1 - T H_{u p}^{m e m}) \times c_{m e m_{i, k}}

.

Case 3: Host with only processor overloaded

We select VMs from each core in the same method as proposed in Case 1. All selected VMs are added into the inter-host migration list.

Case 4: Host with processor overloaded or cores overloaded and memory overloaded

We first use the method in Case 1 to select VMs from each overloaded core. After the load of all cores drops under the overloaded threshold, if the memory load also drops under the overload threshold, the VM selection is completed; otherwise, the method in Case 2 is used to select VMs to reduce the memory load. All selected VMs are put into the inter-host migration list.

For a given overloaded host, at the beginning of the

t_{k}

time segment, the above VM selection strategies are executed for its overloaded condition in

t_{k + 1}

. If the host is still in an overloaded state in the

t_{k}

time segment, the above strategies are executed again to reduce the current load.

4.4. VM Placement

To make full use of the resources of the hosts, we should fully consider the space and time competition of different VMs for different resources when placing VMs on hosts.

In the VM selection phase, we obtain the inter-core migration list and inter-host migration list. Regarding a semi-overloaded host, it should be noted that the load of the cores may not be reduced through inter-core migration. Therefore, in the VM placement phase, we first process the inter-core migration list and then add the remaining VMs to the inter-host migration list to process together.

There are two goals of VM placement: (1) to ensure that the resources of the target host can be fully utilized during the

t_{k}

period; and (2) after the VM is placed on the destination host

h_{i}

, the host will not be in the overloaded state during the

t_{k + 1}

period.

We address the inter-core migration list first. For the a semi-overloaded host

h_{i}

, we sort all un-overloaded cores

{c o r e_{1}, \dots, c o r e_{c n_{i}^{'}}}

in ascending order of load, where

c n_{i}^{'}

is the number of overloaded cores. We denote this orderd sequence as

o r d e r e d_u o_c o r e s_{i, k}

. We arrange the VMs in the inter-core migration list

i c m_{i}

of

h_{i}

in descending order according to the current demand for CPU resources to form the list

o r d e r e d_i c m_{i}

. We take the first VM from

o r d e r e d_i c m_{i}

and traverse

o r d e r e d_u o_c o r e s_{i, k}

for it in order to find the first core with enough CPU resources. If a suitable core cannot be found in

o r d e r e d_u o_c o r e s_{i, k}

for this VM, we add it to the inter-host migration list. The VM is then removed from

o r d e r e d_i c m_{i}

. We repeat the above operations until

o r d e r e d_i c m_{i}

is empty. Each semi-overloaded host needs to execute this placement algorithm for its VMs in

i c m_{i}

. Algorithm 1 demonstrates the pseudocode of the inter-core VM placement algorithm.

Algorithm 1 Inter-Core VM Placement Algorithm

Input: host

h_{i}

, inter-core migration list

i c m_{i}

of

h_{i}

Output: allocation of VMs on certain cores

1:: $G e t_s o r t e d (c o r e_{1}, \dots, c o r e_{c n_{i}^{'}}) \to o r d e r e d_u o_c o r e s_{i, k}$
2:: $G e t_s o r t e d (i c m_{i}) \to o r d e r e d_i c m_{i}$
3:: for each $v m_{j}$ in $o r d e r e d_i c m_{i}$ do
4:: for each $c o r e_{p}$ in $o r d e r e d_u o_c o r e s_{i, k}$ do
5:: if $c o r e_{p}$ is available for $v m_{j}$ in $t_{k}$ and $t_{k + 1}$ then
6:: $a l l o c a t i o n . a d d (v m_{j}, h_{j} . c o r e_{p})$
7:: $o r d e r e d_i c m_{i} . r e m o v e (v m_{j})$
8:: end if
9:: break
10:: end for
11:: end forreturn allocation

Next, the inter-host migration list is processed. First, all the hosts are divided into two categories according to the intensity of CPU and memory usage: CPU-intensive hosts and memory-intensive hosts. The following calculation method is used to classify a given host

h_{i}

. We take the workload trace of

h_{i}

in 12 consecutive time segments (one hour), where the normalized CPU workload time series is

L D_{i, k}^{c p u} = {l d_{i, k - 10}^{c p u}, \dots, l d_{i, k}^{c p u}, l d_{i, k + 1}^{c p u}}

, and the time series of normalized memory workload is

L D_{i, k}^{m e m} = {l d_{i, k - 10}^{m e m}, \dots, l d_{i, k}^{m e m}, l d_{i, k + 1}^{m e m}}

. Since

h_{i}

has a multi-core CPU, its normalized CPU workload at period

t_{k}

is:

l d_{i, k}^{c p u} = \prod_{x = 1}^{c n_{i}} \frac{c_{i}^{c o r e_{c n_{i}}} - r_{i, k}^{c o r e_{x}}}{m a x {c_{i}^{c o r e_{c n_{i}}} | i \in [1, N]}} .

(50)

The reason why the denominator is

m a x {c_{i}^{c o r e_{c n_{i}}} | i \in [1, N]}

is that CPUs with different performances can be compared with each other through normalization. The smaller the value of

l d_{i, k}^{c p u}

is, the lower the CPU load of

h_{i}

in the time period

t_{k}

.

At a certain time period

t_{k}

, its normalized memory workload is:

l d_{i, k}^{m e m} = \frac{c_{i}^{m e m} - r_{i, k}^{m e m}}{m a x {c_{i}^{m e m} | i \in [1, N]}} .

(51)

The smaller the value of

l d_{i, k}^{m e m}

is, the lower the memory load of

h_{i}

in the time period

t_{k}

.

Based on Equation (50), we calculate the CPU score of

h_{i}

:

s c o r e_{i, k}^{c p u} = \frac{1}{10} (\sum_{y = 1}^{12} l d_{i, k}^{c p u} - m a x (L D_{i, k}^{c p u}) - m i n (L D_{i, k}^{c p u})),

(52)

where

m a x (L D_{i, k}^{c p u})

is the maximum value in the normalized CPU workload sequence, and

m i n (L D_{i, k}^{c p u})

is the minimum value in the normalized CPU workload sequence. We remove

m a x (L D_{i, k}^{c p u})

and

m i n (L D_{i, k}^{c p u})

from

\sum_{y = 1}^{12} l d_{i, k}^{c p u}

to minimize the impact of possible severe load fluctuations on the score.

Based on Equation (51), we calculate the memory score of

h_{i}

:

s c o r e_{i, k}^{m e m} = \frac{1}{10} (\sum_{y = 1}^{12} l d_{i, k}^{m e m} - m a x (L D_{i, k}^{m e m}) - m i n (L D_{i, k}^{m e m})),

(53)

where

m a x (L D_{i, k}^{m e m})

is the maximum value in the normalized memory workload sequence, and

m i n (L D_{i, k}^{m e m})

is the minimum value in the normalized memory workload sequence.

If

s c o r e_{i, k}^{c p u} \geq s c o r e_{i, k}^{m e m}

,

h_{i}

is CPU-intensive type; otherwise,

h_{i}

is a memory-intensive type. The CPU-intensive type hosts have more abundant available memory resources, and the memory-intensive type hosts have more abundant available CPU resources. Therefore, in the time period

t_{k}

, the hosts of the CPU-intensive type are arranged in ascending order according to their values of

l d_{i, k}^{m e m}

, forming a list

m e m o r d e r e d_c p u_h o s t s_l i s t

. Memory-intensive hosts are sorted in ascending order according to their

l d_{i, k}^{c p u}

values to form a list

c p u o r d e r e d_m e m_h o s t s_l i s t

. The reason for using the above sorting method is: CPU-intensive hosts have enough remaining memory resources, so VMs that require more memory resources can be placed on them; memory-intensive hosts have enough remaining CPU resources, so they can be placed with VMs that require more CPU resources.

In the following, we sort the VMs in the inter-host migration list by their resource usage requirements. The VMs to be migrated are also divided into CPU-intensive type and memory-intensive type. The CPU-intensive type VMs need to be placed on the memory-intensive type hosts as much as possible, and the memory-intensive VMs need to be placed on the CPU-intensive type hosts as much as possible. We use the following calculation method to classify a given VM

v_{j}

. We take the workload trace of

v_{j}

in 12 consecutive time segments (one hour), where the normalized CPU workload time series is

V L D_{j, k}^{c p u} = {v l d_{j, k - 10}^{c p u}, \dots, v l d_{j, k}^{c p u}, v l d_{j, k + 1}^{c p u}}

, and the time series of normalized memory workload is

V L D_{j, k}^{m e m} = {v l d_{j, k - 10}^{m e m}, \dots, v l d_{j, k}^{m e m}, v l d_{j, k + 1}^{m e m}}

. At certain time period

t_{l}

, the normalized CPU workload of

v_{j}

is:

v l d_{j, k}^{c p u} = \frac{s_{j, k}^{c p u} - m i n {s_{x, k}^{c p u} | x \in [1, M]}}{m a x {s_{x, k}^{c p u} | x \in [1, M]} - m i n {s_{x, k}^{c p u} | x \in [1, M]}} .

(54)

The smaller the value of

v l d_{j, k}^{c p u}

, the lower the CPU demand of

v_{j} i

in the time period

t_{k}

.

At certain time period

t_{l}

, the normalized memory workload of

v_{j}

is:

v l d_{j, k}^{m e m} = \frac{s_{j, k}^{m e m} - m i n {s_{x, k}^{m e m} | x \in [1, M]}}{m a x {s_{x, k}^{m e m} | x \in [1, M]} - m i n {s_{x, k}^{m e m} | x \in [1, M]}} .

(55)

Based on Equation (54), we calculate the CPU score of

v_{j}

:

v s c o r e_{j, k}^{c p u} = \frac{1}{10} (\sum_{y = 1}^{12} v l d_{j, k}^{c p u} - m a x (V L D_{j, k}^{c p u}) - m i n (V L D_{j, k}^{c p u})),

(56)

where

m a x (V L D_{j, k}^{c p u})

is the maximum value in the normalized CPU workload sequence of

v_{j}

, and

m i n (V L D_{j, k}^{c p u})

is the minimum value in the normalized CPU workload sequence of

v_{j}

.

Based on Equation (55), we calculate the memory score of

v_{j}

:

v s c o r e_{j, k}^{m e m} = \frac{1}{10} (\sum_{y = 1}^{12} v l d_{j, k}^{m e m} - m a x (V L D_{j, k}^{m e m}) - m i n (V L D_{j, k}^{m e m})),

(57)

where

m a x (V L D_{j, k}^{m e m})

is the maximum value in the normalized memory workload sequence of

v_{j}

, and

m i n (V L D_{j, k}^{m e m})

is the minimum value in the normalized memory workload sequence of

v_{j}

.

If

v s c o r e_{j, k}^{c p u} \geq v s c o r e_{j, k}^{m e m}

,

v_{j}

is a CPU-intensive type; otherwise,

v_{j}

is a memory-intensive type.

In the time period

t_{k}

, CPU-intensive type VMs are arranged in descending order according to their

v l d_{j, k}^{c p u}

values, forming a list

o r d e r e d_c p u_v m s_l i s t

. We pick out the VMs in

o r d e r e d_c p u_v m s_l i s t

in turn, traverse the

c p u o r d e r e d_m e m_h o s t s_l i s t

, and find the first host that can meet the resource requirements of the current VM and will not be overloaded at both

t_{k}

and future

t_{k + 1}

.

Since the hosts have multi-core CPUs, we design the following judgment when deciding which core of the host

h_{i}

will be used by the VM

v_{j}

. We sort the cores in

h_{i}

’s processor in descending order by their available resources

r_{i, k}^{c o r e_{1}}, r_{i, k}^{c o r e_{2}}, \dots, r_{i, k}^{c o r e_{c n_{i}}}

, which constitute the sequence

o r d e r e d_c o r e s_{i, k}

. Then, the VM

v_{j}

will be preferentially placed on the front core in

o r d e r e d_c o r e s_{i, k}

(and this core will also meet the resource requirements of

v_{j}

in the

t_{k + 1}

time period).

In the time period

t_{k}

, memory-intensive type VMs are arranged in descending order according to their values of

v l d_{j, k}^{m e m}

, forming a list

o r d e r e d_m e m_v m s_l i s t

. We pick out the VMs in the

o r d e r e d_m e m_v m s_l i s t

in turn, traverse the

m e m o r d e r e d_c p u_h o s t s_l i s t

, and find the first host that can meet the resource requirements of the current VM and will not be overloaded at both

t_{k}

and future

t_{k + 1}

. On the current host, the same multi-core placement method is used for processing

o r d e r e d_c o r e s_{i, k}

.

When the destination host is determined for a given VM to be migrated, this VM is removed from the inter-host migration list. Algorithm 2 demonstrates the pseudocode of the inter-host VM placement algorithm. After the above

o r d e r e d_m e m_v m s_l i s t

and

o r d e r e d_c p u_v m s_l i s t

are traversed, and there are still VMs to be migrated, the first-fit method is used to find available hosts in the host list for them. If there are still VMs to be migrated, the hosts in the energy-saving state are powered on one by one until all the VMs to be migrated are assigned destination hosts.

After the above process, we perform underloaded host detection on the hosts in the CDC. If there are still underloaded hosts at this time, the VMs on the underloaded hosts are added to form a VM migration list, and Algorithm 2 is executed.

Algorithm 2 Inter-Host VM Placement Algorithm

Input: hostlist, inter-host migration list
Output: allocation of VMs

1:: $G e t_c l a s s i f i c a t i o n (h o s t) \to c p u_i n t e n s i v e_h o s t s_{k}, m e m_i n t e n s i v e_h o s t s_{k}$
2:: $G e t_c l a s s i f i c a t i o n (i n t e r - h o s t m i g r a t i o n l i s t) \to c p u_i n t e n s i v e_v m s_{k},$
$m e m_i n t e n s i v e_v m s_{k}$
3:: $G e t_s o r t e d (c p u_i n t e n s i v e_h o s t s_{k}) \to m e m o r d e r e d_c p u_h o s t s_l i s t$
4:: $G e t_s o r t e d (m e m_i n t e n s i v e_h o s t s_{k}) \to c p u o r d e r e d_m e m_h o s t s_l i s t$
5:: $G e t_s o r t e d (c p u_i n t e n s i v e_v m s_{k}) \to o r d e r e d_c p u_v m s_l i s t$
6:: $G e t_s o r t e d (m e m_i n t e n s i v e_v m s_{k}) \to o r d e r e d_m e m_v m s_l i s t$
7:: for each $v m_{j}$ in $o r d e r e d_c p u_v m s_l i s t$ do
8:: for each $h_{i}$ in $c p u o r d e r e d_m e m_h o s t s_l i s t$ do
9:: $G e t_s o r t e d (c o r e_{1}, \dots, c o r e_{c n_{i}}) \to o r d e r e d_c o r e s_{i, k}$
10:: for each $c o r e_{p}$ in $o r d e r e d_c o r e s_{i, k}$ do
11:: if $c o r e_{p}$ is available for $v m_{j}$ in $t_{k}$ and $t_{k + 1}$ then
12:: $a l l o c a t i o n . a d d (v m_{j}, h_{i} . c o r e_{p})$
13:: $o r d e r e d_{c} p u_{v} m s_{l} i s t . r e m o v e (v m_{j})$
14:: end if
15:: break
16:: end for
17:: end for
18:: end for
19:: for each $v m_{j}$ in $o r d e r e d_m e m_v m s_l i s t$ do
20:: for each $h_{i}$ in $m e m o r d e r e d_c p u_h o s t s_l i s t$ do
21:: $G e t_s o r t e d (c o r e_{1}, \dots, c o r e_{c n_{i}}) \to o r d e r e d_c o r e s_{i, k}$
22:: for each $c o r e_{p}$ in $o r d e r e d_c o r e s_{i, k}$ do
23:: if $c o r e_{p}$ is available for $v m_{j}$ in $t_{k}$ and $t_{k + 1}$ then
24:: $a l l o c a t i o n . a d d (v m_{j}, h_{i} . c o r e_{p})$
25:: $o r d e r e d_{c} p u_{v} m s_{l} i s t . r e m o v e (v m_{j})$
26:: end if
27:: break
28:: end for
29:: end for
30:: end forreturn allocation

5. Performance Evaluation

In this section, we evaluate the performance of our proposed solution, named MMCC, with a real VM workload trace-driven simulation.

5.1. Experiment Setup

According to the energy consumption analysis and statistics of the hosts by Basmadjian et al. [34], Minartz et al. [38], and Jin et al. [39], we simulated three types of hosts as

H_{l a r g e}

,

H_{m e d i u m}

, and

H_{s m a l l}

, respectively. Their resource parameters are shown in Table 1, the power parameters are shown in Table 2 and Table 3, and the capacitances of different components of the processor are given in Table 4. The numbers of

H_{s m a l l}

hosts,

H_{m e d i u m}

hosts, and

H_{l a r g e}

hosts in the simulated CDC are both 100.

The VM workload trace dataset is from the Alibaba CDC [33]. The VM traces in the dataset are recorded by sampling every five minutes. We selected 1000 VMs in one day (a total of 288 time segments) from the dataset to simulate the consumers’ demands for cloud services. The simulation was implemented on CloudMatrix Lite [40]. The DAE-based filter and the SRU algorithm (the source code is available at https://github.com/asappresearch/sru accessed on 19 October 2022) was based on PyTorch [41].

We set the electricity price as

E P = 0.25 $ /

kWh. The SLAV penalty is a static value

p u n_{c p u} = p u n_{m e m} = 0.01 $

[42]. The host should reserve an extra 10% resources for migrations. Thereby, we set

T H_{u p}^{c p u} = T H_{u p}^{m e m} = 0.9

. We also set

T H_{d o w n}^{c p u} = T H_{d o w n}^{m e m} = T H_{d o w n}^{d i s k} = T H_{d o w n}^{n e t} = 0.1

.

We combined four overloading detection algorithms (MAD [20], IQR [20], and LR [20]), three VM selection algorithms (MMT [20,25,30], MC [20,43], and RS [20]), and one VM placement algorithm (PABFD [20]) as nine baseline methods to compare with our proposed solution MMCC. All the abovementioned workload detection and selection algorithms were initially designed for single-core; hence, we modified them to work in the multi-core hosts by seeing the capacity of the CPU as the sum of its capacities of the cores. Moreover, The PABFD placement algorithm and its corresponding energy consumption model only take into account the host’s sinlge-CPU. Therefore, we modified it here to suit our multi-core (by randomly selecting a core in the CPU for the VM) and multi-resource scenario. The new PABFD placement algorithm is PABFDM, as shown in Algorithm 3 for the pseudocode.

Algorithm 3 PABFDM algorithm

Input: hostList, vmList
Output: allocation of the VMs

1:: vmList.sortDecreasingUtilization()
2:: for each VM in vmList do
3:: minPower ← MAX
4:: for host in hostList do
5:: if no SLAV on this host and this host meets the CPU and memory resource requirement for VM then
6:: power ← $e s t i m a t e P o w e r (h o s t, V M)$
7:: if power<minPower then
8:: minPower←power
9:: end if
10:: end if
11:: end for
12:: $a l l o c a t i o n . a d d (V M, h o s t . r a n d o m (c o r e))$
13:: end for
return allocation

5.2. Evaluation

The metrics involved in the evaluation are host energy consumption cost, SLAV penalty cost, and the number of VM migrations. Since the CPU cost of the VM migration energy consumption belongs to the hosts’ energy consumption during calculation, we used the number of VM migrations to indirectly measure the migration cost.

Figure 8 shows the host energy consumption for each time slice of the day when all the methods are used to perform server consolidation. Figure 9 compares the total host energy consumption over the day when all the methods are used to perform server consolidation. From Figure 8, it can be seen that the host energy consumption generated by MMCC is less than that of the baseline methods in most of the time periods. From Figure 9, it can be seen that the total host energy consumption generated by MMCC in a day is about 10% less than that of LR-MMT (the best in the baseline methods) and is about 43.9% less than that of MAD-RS (the worst in the baseline methods). In a cluster composed of multi-core processor hosts, MMCC can effectively schedule tasks among multiple cores to optimize energy consumption.

The comparison of CPU and memory SLAV produced by all the methods in a day is shown in Figure 10. The CPU SLAV generated by MMCC is much smaller than that of the baseline methods. For example, MMCC produces about 54% less CPU SLAV than that of MAD-RS and about 39% less than that of LR-MMT. Likewise, the memory SLAV produced by MMCC is much smaller than that of the baseline methods. A comparison of the total SLAV cost produced by all methods in one day is shown in Figure 11. MMCC outperforms the baseline methods. For instance, the total cost generated by MMCC is about 51.7% less than that generated by IQR-RS (the worst of the baseline methods) and about 33.5% less than that generated by LR-MMT (the best of the baseline methods). It can be said that the traditional server consolidation method represented by the baseline methods do not perform well in a cluster composed of multi-core processor hosts, while MMCC can better handle this scenario.

Figure 12 shows the number of VM migrations triggered in each time slice of the day when all the methods are used to perform server consolidation. Figure 13 compares the total number of VM migrations triggered in a day using all the methods to perform server consolidation. As can be seen from Figure 13, compared to the baseline method, the number of migrations triggered by MMCC does not have a large advantage. For example, MMCC triggers about 9.5% fewer migrations than that of IQR-RS. However, it should be noted that the VM migrations caused by MMCC in time

t_{k}

is mainly to deal with the possible overloaded hosts in the future. Therefore, the SLAV produced by MMCC is much smaller than that of the baseline methods. In addition, part of the migrations caused by MMCC are inter-core migrations, which only happen inside the host. The cost of inter-core migration is negligible. The traditional baseline methods do not consider the inter-core migration in the case of multi-core processors.

Figure 14 shows and compares the total cost generated by all the methods in one day. MMCC outperforms the baseline methods. For instance, the total cost generated by MMCC is about 20.9% less than that of LR-MMT (the best of the baseline methods) and about 34.4% less than that of MAD-RS (the worst of the baseline methods). MMCC can not only optimize the energy consumption in the environment of multi-core processor hosts, but also reduce the SLAV in server consolidation through the host load detection and VM selection strategies based on the prediction method.

6. Conclusions

In this paper, we focus on reducing the total cost of server consolidation in a CDC, which is composed of multi-core processor hosts, operating costs while ensuring consumers’ QoS. We established a cost model based on multi-core and multi-resource usage in the CDC, taking into account the host energy cost, VM migration cost, and SLAV penalty cost. Based on this model, we define the MMCC problem in server consolidation. We designed a heuristic solution to deal with this problem. We employ a DAE-based filter to preprocess the VM workload dataset and to reduce noise in the workload trace. Subsequently, an SRU-based method is used to predict the usage of computing resources, allowing us to trigger inter-core or inter-host VM migrations before the host enters the state. We design a muliti-core-aware heuristic algorithm to solve the VM placement problem. Finally, simulations driven by real VM workload traces verify the effectiveness of our proposed method. Compared with the existing server consolidation methods, our proposed MMCC can reduce host energy consumption from 10% to 43.9%, SLAV cost by 33.5% to 51.7%, and total cost by 20.9% to 34.4% in a multi-core hosts cluster.

In the future, we will first consider a more comprehensive cost model, such as taking into account the operational life span of the host, the network topology of CDC, and cooling system.

Author Contributions

Conceptualization, H.L. and Y.S.; methodology, H.L.; software, H.L.; validation, H.L., L.W. and Y.S.; formal analysis, H.L.; investigation, H.L. and Y.L.; resources, H.L. and Y.L.; data curation, H.L.; writing—original draft preparation, H.L., L.W. and Y.L.; writing—review and editing, H.L., L.W., Y.L. and Y.S.; visualization, H.L.; supervision, Y.S.; project administration, Y.S.; funding acquisition, H.L. and Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (No.62002067), the Guangzhou Youth Talent Program (QT20220101174), the Department of Education of Guangdong Province (No.2020KTSCX039), Foundation of The Chinese Education Commission (22YJAZH091), and the SRP of Guangdong Education Dept (2019KZDZX1031).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Almost 82% Hong Kong Businesses Plan to Keep Remote Working Post-COVID-19. Available online: https://hongkongbusiness.hk/information-technology/more-news/almost-82-hong-kong-businesses-plan-keep-remote-working-post-covid- (accessed on 27 September 2022).
Hong Kong Data Center Market—Growth, Trends, COVID-19 Impact, and Forecasts (2021–2026). Available online: https://www.reportlinker.com/p06187432/Hong-Kong-Data-Center-Market-Growth-Trends-COVID-19-Impact-and-Forecasts.html (accessed on 27 September 2022).
Dhiman, G.; Mihic, K.; Rosing, T. A system for online power prediction in virtualized environments using gaussian mixture models. In Proceedings of the 47th Design Automation Conference, Anaheim, CA, USA, 13–18 June 2010; pp. 807–812. [Google Scholar]
Ham, S.; Kim, M.; Choi, B.; Jeong, J. Simplified server model to simulate data center cooling energy consumption. Energy Build. 2015, 86, 328–339. [Google Scholar] [CrossRef]
Kavanagh, R.; Djemame, K. Rapid and accurate energy models through calibration with IPMI and RAPL. Concurr. Comput. Pract. Exp. 2019, 31, e5124. [Google Scholar] [CrossRef]
Gupta, V.; Nathuji, R.; Schwan, K. An analysis of power reduction in datacenters using heterogeneous chip multiprocessors. ACM Sigmetrics Perform. Eval. Rev. 2011, 39, 87–91. [Google Scholar] [CrossRef]
Lefurgy, C.; Wang, X.; Ware, M. Server-level power control. In Proceedings of the Fourth International Conference on Autonomic Computing (ICAC’07), Jacksonville, FL, USA, 11–15 June 2007; p. 4. [Google Scholar]
Beloglazov, A.; Abawajy, J.; Buyya, R. Energy-aware resource allocation heuristics for efficient management of data centers for cloud computing. Future Gener. Comput. Syst. 2012, 28, 755–768. [Google Scholar] [CrossRef] [Green Version]
Rezaei-Mayahi, M.; Rezazad, M.; Sarbazi-Azad, H. Temperature-aware power consumption modeling in Hyperscale cloud data centers. Future Gener. Comput. Syst. 2019, 94, 130–139. [Google Scholar] [CrossRef]
Chen, Y.; Das, A.; Qin, W.; Sivasubramaniam, A.; Wang, Q.; Gautam, N. Managing server energy and operational costs in hosting centers. In Proceedings of the 2005 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, Banff, AB, Canada, 6–10 June 2005; pp. 303–314. [Google Scholar]
Wu, W.; Lin, W.; Peng, Z. An intelligent power consumption model for virtual machines under CPU-intensive workload in cloud environment. Soft Comput. 2017, 21, 5755–5764. [Google Scholar] [CrossRef]
Lien, C.; Bai, Y.; Lin, M. Estimation by software for the power consumption of streaming-media servers. IEEE Trans. Instrum. Meas. 2007, 56, 1859–1870. [Google Scholar] [CrossRef]
Raja, K. Multi-core Aware Virtual Machine Placement for Cloud Data Centers with Constraint Programming. In Intelligent Computing; Springer: Cham, Switzerland, 2022; pp. 439–457. [Google Scholar]
Economou, D.; Rivoire, S.; Kozyrakis, C.; Ranganathan, P. Full-system power analysis and modeling for server environments. In Proceedings of the International Symposium on Computer Architecture, Ouro Preto, Brazil, 17–20 October 2006. [Google Scholar]
Alan, I.; Arslan, E.; Kosar, T. Energy-aware data transfer tuning. In Proceedings of the 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Chicago, IL, USA, 26–29 May 2014; pp. 626–634. [Google Scholar]
Li, Y.; Wang, Y.; Yin, B.; Guan, L. An online power metering model for cloud environment. In Proceedings of the 2012 IEEE 11th International Symposium on Network Computing and Applications, Cambridge, MA, USA, 23–25 August 2012; pp. 175–180. [Google Scholar]
Lent, R. A model for network server performance and power consumption. Sustain. Comput. Inform. Syst. 2013, 3, 80–93. [Google Scholar] [CrossRef]
Kansal, A.; Zhao, F.; Liu, J.; Kothari, N.; Bhattacharya, A. Virtual machine power metering and provisioning. In Proceedings of the 1st ACM Symposium on Cloud Computing, Indianapolis, IN, USA, 10–11 June 2010; pp. 39–50. [Google Scholar]
Lin, W.; Wang, W.; Wu, W.; Pang, X.; Liu, B.; Zhang, Y. A heuristic task scheduling algorithm based on server power efficiency model in cloud environments. Sustain. Comput. Inform. Syst. 2018, 20, 56–65. [Google Scholar] [CrossRef]
Beloglazov, A.; Buyya, R. Optimal online deterministic algorithms and adaptive heuristics for energy and performance efficient dynamic consolidation of virtual machines in cloud data centers. Concurr. Comput. Pract. Exp. 2012, 24, 1397–1420. [Google Scholar] [CrossRef]
Li, H.; Li, W.; Wang, H.; Wang, J. An optimization of virtual machine selection and placement by using memory content similarity for server consolidation in cloud. Future Gener. Comput. Syst. 2018, 84, 98–107. [Google Scholar] [CrossRef]
Li, H.; Li, W.; Zhang, S.; Wang, H.; Pan, Y.; Wang, J. Page-sharing-based virtual machine packing with multi-resource constraints to reduce network traffic in migration for clouds. Future Gener. Comput. Syst. 2019, 96, 462–471. [Google Scholar] [CrossRef]
Li, H.; Li, W.; Feng, Q.; Zhang, S.; Wang, H.; Wang, J. Leveraging content similarity among vmi files to allocate virtual machines in cloud. Future Gener. Comput. Syst. 2018, 79, 528–542. [Google Scholar] [CrossRef]
Li, H.; Wang, S.; Ruan, C. A fast approach of provisioning virtual machines by using image content similarity in cloud. IEEE Access 2019, 7, 45099–45109. [Google Scholar] [CrossRef]
Yadav, R.; Zhang, W.; Kaiwartya, O.; Singh, P.; Elgendy, I.; Tian, Y. Adaptive energy-aware algorithms for minimizing energy consumption and SLA violation in cloud computing. IEEE Access 2018, 6, 55923–55936. [Google Scholar] [CrossRef]
Hieu, N.; Di Francesco, M.; Ylä-Jääski, A. Virtual machine consolidation with multiple usage prediction for energy-efficient cloud data centers. IEEE Trans. Serv. Comput. 2017, 13, 186–199. [Google Scholar] [CrossRef] [Green Version]
Esfandiarpoor, S.; Pahlavan, A.; Goudarzi, M. Structure-aware online virtual machine consolidation for datacenter energy improvement in cloud computing. Comput. Electr. Eng. 2015, 42, 74–89. [Google Scholar] [CrossRef]
Arianyan, E.; Taheri, H.; Sharifian, S. Novel energy and SLA efficient resource management heuristics for consolidation of virtual machines in cloud data centers. Comput. Electr. Eng. 2015, 47, 222–240. [Google Scholar] [CrossRef]
Rodero, I.; Viswanathan, H.; Lee, E.; Gamell, M.; Pompili, D.; Parashar, M. Energy-efficient thermal-aware autonomic management of virtualized HPC cloud infrastructure. J. Grid Comput. 2012, 10, 447–473. [Google Scholar] [CrossRef]
Li, Z.; Yan, C.; Yu, L.; Yu, X. Energy-aware and multi-resource overload probability constraint-based virtual machine dynamic consolidation method. Future Gener. Comput. Syst. 2018, 80, 139–156. [Google Scholar] [CrossRef]
Sayadnavard, M.; Toroghi Haghighat, A.; Rahmani, A. A reliable energy-aware approach for dynamic virtual machine consolidation in cloud data centers. J. Supercomput. 2019, 75, 2126–2147. [Google Scholar] [CrossRef]
Yuan, C.; Sun, X. Server consolidation based on culture multiple-ant-colony algorithm in cloud computing. Sensors 2019, 19, 2724. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Lu, C.; Ye, K.; Xu, G.; Xu, C.; Bai, T. Imbalance in the cloud: An analysis on alibaba cluster trace. In Proceedings of the 2017 IEEE International Conference on Big Data (Big Data), Boston, MA, USA, 11–14 December 2017; pp. 2884–2892. [Google Scholar]
Basmadjian, R.; De Meer, H. Evaluating and modeling power consumption of multi-core processors. In Proceedings of the 2012 Third International Conference On Future Systems: Where Energy, Computing and Communication Meet (e-Energy), Madrid, Spain, 9–11 May 2012; pp. 1–10. [Google Scholar]
Brodersen, R. Minimizing Power Consumption in CMOS Circuits; Department of EECS University of California at Berkeley: Berkeley, CA, USA; Available online: https://sablok.tripod.com/verilog/paper.fm.pdf (accessed on 27 September 2022).
Vincent, P.; Larochelle, H.; Lajoie, I.; Bengio, Y.; Manzagol, P.; Bottou, L. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 2010, 11, 3371–3408. [Google Scholar]
Lei, T.; Zhang, Y.; Wang, S.; Dai, H.; Artzi, Y. Simple recurrent units for highly parallelizable recurrence. arXiv 2017, arXiv:1709.02755. [Google Scholar]
Minartz, T.; Kunkel, J.; Ludwig, T. Simulation of power consumption of energy efficient cluster hardware. Comput. Sci.-Res. Dev. 2010, 25, 165–175. [Google Scholar] [CrossRef]
Jin, Y.; Wen, Y.; Chen, Q.; Zhu, Z. An empirical investigation of the impact of server virtualization on energy efficiency for green data center. Comput. J. 2013, 56, 977–990. [Google Scholar] [CrossRef]
Li, H.; Xiao, Y. CloudMatrix Lite: A Real Trace Driven Lightweight Cloud Data Center Simulation Framework. In Proceedings of the 2020 2nd International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI), Taiyuan, China, 23–25 October 2020; pp. 424–429. [Google Scholar]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 8024–8035. [Google Scholar]
Aljoumah, E.; Al-Mousawi, F.; Ahmad, I.; Al-Shammri, M.; Al-Jady, Z. SLA in cloud computing architectures: A comprehensive study. Int. J. Grid Distrib. Comput. 2015, 8, 7–32. [Google Scholar] [CrossRef]
Cao, Z.; Dong, S. Dynamic VM consolidation for energy-aware and SLA violation reduction in cloud computing. In Proceedings of the 2012 13th International Conference on Parallel and Distributed Computing, Applications And Technologies, Beijing, China, 14–16 December 2012; pp. 363–369. [Google Scholar]

Figure 1. The general architecture of a multi-core CPU.

Figure 2. Denoise autoencoder.

Figure 3. The network structure of the first autoencoder of the DAE-based filter.

Figure 4. The network structure of the second autoencoder of the DAE-based filter.

Figure 5. The network structure of the third autoencoder of the DAE-based filter.

Figure 6. The network structure of the compression decoder of the DAE-based filter.

Figure 7. Example of the DAE-based filter.

Figure 8. Comparing the energy consumption of hosts by all methods in every time segment.

Figure 9. Comparing the total energy consumption hosts by all methods.

Figure 10. Comparing the SLAV by all methods regarding CPU and memory.

Figure 11. Comparing the total SLAV penalty cost by all methods.

Figure 12. Comparing the number of VM migrations triggered by all methods in every time segment.

Figure 13. Comparing the total number of VM migrations triggered by all methods.

Figure 14. Comparing the total cost by all methods.

Table 1. Resource parameters of the hosts.

Host Type	CPU	Memory
$H_{l a r g e}$	Intel Xeon CPU (16 cores)	8 GB
$H_{m e d i u m}$	Intel Xeon CPU (8 cores)	6 GB
$H_{s m a l l}$	Intel Xeon CPU (4 cores)	4 GB

Table 2. Base power of the hosts.

Host Type	Base Value (kW)
$H_{l a r g e}$	108.2
$H_{m e d i u m}$	103.8
$H_{s m a l l}$	98.5

Table 3. Memory power parameters.

Host Type	Value	Memory (kW)
$H_{l a r g e}$	$p_{p e a k}$	0.21736
	$p_{i d l e}$	0.17576
$H_{m e d i u m}$	$p_{p e a k}$	0.10868
	$p_{i d l e}$	0.08788
$H_{s m a l l}$	$p_{p e a k}$	0.05434
	$p_{i d l e}$	0.04394

Table 4. Capacitance of different components of the processor.

Component	Capacitance
Chip-Level Mandatory	0.103
Die-Level Mandatory	0.301
On-chip Cache	0.165
Off-chip Cache	3.759
Inter-die	0.595

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, H.; Wen, L.; Liu, Y.; Shen, Y. More than Meets One Core: An Energy-Aware Cost Optimization in Dynamic Multi-Core Processor Server Consolidation for Cloud Data Center. Electronics 2022, 11, 3377. https://doi.org/10.3390/electronics11203377

AMA Style

Li H, Wen L, Liu Y, Shen Y. More than Meets One Core: An Energy-Aware Cost Optimization in Dynamic Multi-Core Processor Server Consolidation for Cloud Data Center. Electronics. 2022; 11(20):3377. https://doi.org/10.3390/electronics11203377

Chicago/Turabian Style

Li, Huixi, Langyi Wen, Yinghui Liu, and Yongluo Shen. 2022. "More than Meets One Core: An Energy-Aware Cost Optimization in Dynamic Multi-Core Processor Server Consolidation for Cloud Data Center" Electronics 11, no. 20: 3377. https://doi.org/10.3390/electronics11203377

APA Style

Li, H., Wen, L., Liu, Y., & Shen, Y. (2022). More than Meets One Core: An Energy-Aware Cost Optimization in Dynamic Multi-Core Processor Server Consolidation for Cloud Data Center. Electronics, 11(20), 3377. https://doi.org/10.3390/electronics11203377

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

More than Meets One Core: An Energy-Aware Cost Optimization in Dynamic Multi-Core Processor Server Consolidation for Cloud Data Center

Abstract

1. Introduction

2. Related Work

2.1. Server Consolidation Cost Models

2.2. Server Consolidation Solutions

3. Cost Model and Problem Description

3.1. Cost Model

Host Cost Model

3.2. VM Migration Cost

SLAV Penalty Cost

3.3. Problem Description

4. Solution for MMCC Problem

4.1. VM Workload Prediction

4.2. Host Workload Detection

4.3. VM Selection

4.4. VM Placement

5. Performance Evaluation

5.1. Experiment Setup

5.2. Evaluation

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI