EVMC: An Energy-Efficient Virtual Machine Consolidation Approach Based on Deep Q-Networks for Cloud Data Centers

Zhang, Peiying; Gao, Jingfei; Liu, Jing; Tan, Lizhuang

doi:10.3390/electronics14193813

Open AccessArticle

EVMC: An Energy-Efficient Virtual Machine Consolidation Approach Based on Deep Q-Networks for Cloud Data Centers

¹

Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China

²

Shandong Key Laboratory of Intelligent Oil & Gas Industrial Software, Qingdao 266580, China

³

Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan 250014, China

⁴

School of Cultural Heritage and Information Management, Shanghai University, Shanghai 200444, China

⁵

Library of Shanghai Lixin University of Accounting and Finance, Shanghai 201209, China

⁶

Shandong Provincial Key Laboratory of Computing Power Internet and Service Computing, Shandong Fundamental Research Center for Computer Science, Jinan 250014, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(19), 3813; https://doi.org/10.3390/electronics14193813

Submission received: 9 August 2025 / Revised: 18 September 2025 / Accepted: 22 September 2025 / Published: 26 September 2025

Download

Browse Figures

Versions Notes

Abstract

As the mainstream computing paradigm, cloud computing breaks the physical rigidity of traditional resource models and provides heterogeneous computing resources, better meeting the diverse needs of users. However, the frequent creation and termination of virtual machines (VMs) tends to induce resource fragmentation, resulting in resource wastage in cloud data centers. Virtual machine consolidation (VMC) technology effectively improves resource utilization by intelligently migrating virtual machines onto fewer physical hosts. However, most existing approaches lack rational host detection mechanisms and efficient migration strategies, often neglecting quality of service (QoS) guarantees while optimizing energy consumption, which can easily lead to Service Level Agreement Violations (SLAVs). To address these challenges, this paper proposes an energy-efficient virtual machine consolidation method (EVMC). First, a co-location coefficient model is constructed to detect the fewest suitable VMs on hosts. Then, leveraging the environment-aware decision-making capability of the DQN agent, dynamic VM migration strategies are implemented. Experimental results demonstrate that EVMC outperforms existing state-of-the-art approaches in terms of energy consumption and SLAV rate, showcasing its effectiveness and potential for practical application.

Keywords:

cloud computing; cloud data centers; virtual machine consolidation; deep reinforcement learning; deep Q-network

1. Introduction

With the rapid development of information and communication technology (ICT), an increasing number of computing tasks are being migrated to cloud environments. As a fundamental enabling technology in cloud computing, virtualization has overcome the rigidity of traditional network infrastructures, which are often fixed, inflexible, and difficult to adapt in terms of architecture, configuration, and resource management [1]. Specifically, virtualization abstracts physical resources—such as computing, storage, and networking components of servers, switches, routers, and other hardware—into logical units that are integrated into a shared resource pool for centralized management and allocation [2]. These virtualized resources can be dynamically provisioned and adjusted based on the real-time status of virtual machines (VMs) and business demands. By decoupling software from hardware, virtualization enables flexible, scalable, and efficient resource management, providing technical support for the efficient operation and elastic scaling of cloud platforms.

Cloud computing, as an efficient service-oriented computing model, features high scalability and flexible billing, among other characteristics. As illustrated in Figure 1, the core component of cloud computing infrastructure is the data center, which consists of large-scale deployments of physical servers and associated hardware to support high-volume computational workloads. Cloud service providers leverage virtualization technology to abstract and uniformly manage the physical resources in data centers. To meet the diverse computing needs of different users, service providers offer various VM resource configurations known as “flavors” [3]. Each flavor defines a specific combination of computing resources, such as two cores with 4 GB of random access memory (RAM), four cores with 8 GB of RAM, etc. Users can select the appropriate flavor based on their needs and use VMs following the “pay-as-you-go” principle, thus achieving efficient and cost-effective resource utilization in the dynamically changing cloud environment. To ensure the quality of service (QoS) of the provided resources, cloud service providers typically establish a Service Level Agreement (SLA) with users [4]. Specifically, the SLA clearly defines key metrics, such as VM availability and response time, and it offers corresponding compensation mechanisms in the event of SLA violations (SLAVs).

However, the variation in VM flavors selected by different users leads to the fragmentation of physical resources into heterogeneous units during long-term mixed deployments. With the frequent creation and termination of VMs over their lifecycle, these fragmented resources—being scattered, non-contiguous, or mismatched in specification—often cannot be effectively allocated to new VMs or used to scale existing ones. This results in substantial resource waste. Low resource utilization is a common issue in cloud data centers [5]. For instance, the average CPU and RAM utilization on Amazon Web Services (AWS) cloud platforms are only 25% and 29%, respectively [6]. Meanwhile, a study by Microsoft [7] reported that 60% of VMs in Azure exhibit CPU utilization below 20%. In addition, the RAM utilization in Google cloud data centers is as low as 20%, while idle servers still consume approximately 70% of their peak power [8]. Cloud data centers require high energy consumption not only for the servers themselves, but also for the cooling systems that support them. According to [9], cloud data centers are predicted to account for 13% of global electricity consumption by 2030. The low resource utilization of cloud data centers reduces economic efficiency, while high energy consumption poses significant pressure on environmental protection. Therefore, improving resource utilization to reduce energy consumption in cloud data centers is an urgent issue that needs to be addressed.

Virtual machine consolidation (VMC) is considered an effective solution for managing resources in cloud data centers [10], aiming to tackle the issues of low resource utilization and high energy consumption. VMC reduces overall energy consumption by performing VM migration from underloaded hosts and consolidating services onto fewer physical servers [11], allowing idle machines to be shut down or put into sleep mode. However, the VM migration process introduces performance overhead. Frequent or unnecessary migrations can increase system load and migration costs, thereby negatively impacting QoS. In addition, overly aggressive consolidation strategies may lead to insufficient resource elasticity on active hosts, reducing the ability of the system to handle dynamic workload fluctuations [12]. This can compromise availability and even result in SLAVs. Therefore, an ideal VMC strategy should, under the constraints of SLA compliance, maximize resource utilization and minimize energy consumption while maintaining sufficient system elasticity to cope with workload variations. Essentially, the goal is to achieve a balanced optimization between performance and energy efficiency.

Although VMC has been extensively studied, existing methods still suffer from several significant limitations. On the one hand, most current VM selection approaches rely solely on static thresholds for detecting overloaded hosts and determining migration priorities [13] without effectively identifying the resource contention and interference among co-located VMs, which may result in persistent resource bottlenecks even after migration. On the other hand, existing VM migration strategies struggle to balance the complex trade-offs among system performance, energy consumption, and SLA compliance [14], as well as often lack adaptability to dynamic and heterogeneous cloud environments.

To address these challenges, we propose a two-stage VMC approach. The main contributions of our work are as follows.

An energy-efficient VM consolidation method (EVMC) is proposed, aiming to optimize both energy consumption and SLA violation rate in cloud data centers.
A VM selection strategy based on a co-location coefficient model is proposed. By analyzing the resource utilization correlation among VMs, the strategy evaluates their co-location suitability, thereby identifying the VM least suitable for co-location on an overloaded host. This effectively mitigates resource contention and enhances the accuracy of selection decisions.
A VM migration algorithm based on deep Q-network (DQN) is proposed, in which VM migration and resource allocation are modeled as a Markov Decision Process (MDP). A composite reward function that considers both energy consumption and SLA violation rate was designed to guide the agent in learning optimal migration and allocation strategies through continuous interaction with the environment.
Extensive experiments were conducted to evaluate the performance of EVMC. The results demonstrate that the proposed method outperforms existing approaches in terms of energy consumption, SLAV rate, number of migrations, and multidimensional load balancing.

The remainder of this paper is organized as follows. Section 2 reviews the related work. The problem formulation is presented in Section 3. Section 4 describes the proposed algorithm. Simulation results and their analysis are provided in Section 5. Finally, Section 6 concludes the paper.

2. Related Work

In cloud data centers, VMC has become an effective approach for improving resource utilization. By optimizing VM placement to reduce energy consumption while ensuring QoS, it has emerged as a prominent focus in recent research.

Many existing studies regard VMC as a classical NP-hard problem—the multidimensional bin packing problem—and tackle it through heuristic approaches. A novel Best Fit Decreasing (BFD) algorithm was proposed by [15], which sorts VMs in descending order of resource demands and allocates them to the most suitable servers based on available capacity, thereby effectively reducing the number of active servers. Meanwhile, the authors of [16] integrated a statistical regression model with the First Fit Decreasing (FFD) algorithm to enhance resource utilization efficiency in data centers. In [17], the authors comprehensively evaluated CPU and RAM utilization, migrating all VMs from underloaded hosts for reallocation. However, these methods incurred high migration costs, making them unsuitable for practical deployment in large-scale data center environments.

Given the limitations of heuristic methods in complex cloud environments, more effective metaheuristic algorithms have gained increasing attention in VMC research. The authors of [18] proposed a multi-objective Artificial Bee Colony (ABC) algorithm to optimize VM placement in cloud data centers by balancing energy consumption and system reliability, with predictive modeling via a discrete-time Markov chain enhancing decision accuracy. In [19], a secure and multi-objective VMC framework based on a genetic algorithm (GA) was proposed, which significantly reduces power consumption and intercommunication cost in dynamic cloud environments. In [20], the authors proposed an adaptive optimization algorithm, which integrates the global search capability of Particle Swarm Optimization (PSO) with the directional sensing mechanism of Beetle Swarm Optimization (BSO) to enhance convergence speed and solution accuracy, thereby effectively reducing energy consumption in multi-objective VMC. Nevertheless, metaheuristic methods rely heavily on extensive parameter tuning, and the population is prone to premature convergence [21], making it difficult to cope with the demands of heterogeneous and complex cloud environments.

With the continuous advancement of research, machine learning (ML) has increasingly emerged as a prominent paradigm for developing efficient VMC frameworks in recent years. In this context, the authors of [22] proposed a reinforcement learning (RL)-based VMC algorithm, which optimizes VM placement in dynamic cloud data center environments by fully leveraging the capabilities of RL in environmental awareness and dynamic reasoning. Another study [23] applying RL to VMC put forward a failure-aware dynamic strategy: it adopts a distributed multi-agent RL approach to select appropriate operating modes for VM, and it then combines this with a centralized heuristic migration method to jointly optimize energy efficiency in large-scale cloud data centers. Meanwhile, the authors of [24] proposed an online RL algorithm for real-time migration, which enables scalable virtual resource management through dimensionality reduction without requiring prior knowledge of workloads, incurring minimal execution overhead. However, Q-learning is a tabular value-based RL method that suffers from the curse of dimensionality due to the exponential growth of the Q-table size with increasing problem scale, making it difficult to apply to large-scale cloud data centers.

To address the above challenges, some studies have focused on deep reinforcement learning (DRL), which uses neural networks to approximate the Q-table, effectively tackling the vast state-action space in cloud data centers and greatly expanding the application potential of reinforcement learning in extensive VMC problems. A notable example is the multi-objective DRL framework proposed in [25], which dynamically selects the optimal placement strategy based on VM characteristics, effectively reducing host failure rates while balancing energy consumption and performance. In [26], the authors proposed a DQN algorithm that integrates Q-learning with deep neural networks to address resource allocation in dynamic VM migration. By modeling the problem as a MDP and adopting an online training approach, the method effectively improves energy optimization in data centers. Another representative study [27] incorporates an LSTM-based state prediction network to accurately identify the virtual machines that most impact host performance, and it then combined this with a prediction-aware DQN approach for efficient migration, thereby significantly reducing energy consumption in cloud data centers. It is noteworthy that overly aggressive VMC strategies may degrade service performance or even violate SLA agreements. Therefore, striking the optimal balance between maintaining service quality and reducing energy consumption has become a key focus of current research.

Overall, existing consolidation approaches have achieved remarkable progress by leveraging heuristic, metaheuristic, and learning-based methods. However, most of them either overlook the correlation of resource utilization among VMs during the selection process, or they introduce excessive migration overhead when pursuing energy savings. Moreover, while recent reinforcement learning- and DRL-based methods provide adaptive decision making, they often entail high computational complexity and long training times, which hinder their deployment in practical large-scale cloud data centers. These limitations motivate our study: designing an energy-efficient VM consolidation method that differs from prior work in two aspects. First, we introduce a co-location coefficient model to guide VM selection by explicitly quantifying the suitability of VMs to be hosted together, thereby avoiding resource contention and reducing unnecessary migrations. Second, we integrate this selection strategy into a two-stage migration framework with a DQN-based optimization process, which jointly addresses energy consumption and SLAVs. Accordingly, the proposed approach is designed to explicitly resolve the trade-off between efficiency and overhead that has not been adequately addressed in prior work. A summary comparison with representative approaches is presented in Table 1.

3. Problem Formulation

3.1. Virtual Machine and Physical Machine Modeling

In IaaS environments, cloud data centers consist of a large number of physical hosts. Let the set of physical machines be denoted as

P = {p_{i} | 1 \leq i \leq n}

, where

p_{i}

represents the i-th physical host. For

\forall p_{i} \in P

, the set of hosted VMs is defined as

V = {v_{i j} | 1 \leq j \leq m}

, where

v_{i j}

denotes the j-th VM instance deployed on the

p_{i}

physical machine.

3.2. Energy Consumption

Energy consumption is a critical issue in cloud data centers. The power consumption of a server is typically linear with CPU utilization [28]. Assuming the power consumption of a single host i at time t is

E_{i} (t)

, the total energy consumption over time period T can be calculated as follows:

\begin{matrix} E_{i} (T) = \int_{0}^{T} E_{i} (t) d t . \end{matrix}

(1)

The total energy consumption of all n physical hosts in the cloud data center during time period T can be expressed as follows:

\begin{matrix} E_{total} (T) = \sum_{i = 1}^{n} E_{i} (T) . \end{matrix}

(2)

3.3. Migration Cost

VM migration refers to the process of dynamically transferring a running VM from a source host to a target host, enabling more flexible resource scheduling. However, migration can introduce latency and performance degradation to applications. The formula for calculating migration duration is as follows:

\begin{matrix} T_{m i g} = \frac{M (v_{i j})}{B (p_{i})}, \end{matrix}

(3)

where

M (v_{i j})

denotes the total amount of content occupied by

v_{i j}

, and

B (p_{i})

denotes the available bandwidth of the migration link, which is determined by the smaller value between

p_{i}

and the destination host. On the other hand, the performance degradation due to migration (PMD) can be expressed as follows:

\begin{matrix} P M D = \frac{1}{m} \sum_{j = 1}^{m} \frac{C_{j}^{m i g}}{C_{j}^{t o t a l}}, \end{matrix}

(4)

where

C_{j}^{m i g}

denotes the performance degradation of VM due to migration, and

C_{j}^{t o t a l}

denotes the total number of CPU resources requested during the VM lifecycle. Studies have shown that

C_{j}^{m i g}

is approximately 10% of CPU utilization during the migration process [28], and this degradation rate is now quoted as the default metric for migration cost in CloudSim 4.0.

3.4. Service Level Agreement Violation

The SLA is a contractual agreement between cloud service providers and clients, where SLAVs are typically employed to assess the provider’s capability in fulfilling service guarantees. In our work, SLAVs serve as another key metric for customer satisfaction with existing services. The calculation formula for SLAVs over a given time period T is as follows:

\begin{matrix} S L A V = S L A T A H + P D M, \end{matrix}

(5)

where

S L A T A H

is the proportion of violation time for each active physical host. The calculation formula is as follows:

\begin{matrix} S L A T A H = \frac{1}{n} \sum_{i = 1}^{n} \frac{T_{i}^{v i o}}{T_{i}^{t o t a l}} . \end{matrix}

(6)

3.5. Correlation of VM Resource Utilization

In multiple regression modeling, the multiple correlation coefficient is used to measure the degree of linear correlation between a variable and multiple other variables, which is usually denoted by R. The value of R is in the range of [0, 1], and when it takes the value of 0, it means that there is no phase relationship between the variables (on the contrary, when it takes the value of 1, it means that the variables are perfectly correlated). In our work, regression analysis of the last l resource (including CPU and RAM) utilization between VMs on a host allows us to assess the similarity of resource demand patterns between VMs.

The vector

U_{j}

is used to represent the last l resource utilization of VM

v_{j}

, and the augmented matrix

W_{i ∖ j}^{d}

is used to represent the last l resource utilization of

m - 1

VMs except

v_{j}

on the physical host

p_{i}

. The vector

U_{j}^{d}

and matrix

W_{i ∖ j}^{d}

are, respectively, represented as follows:

\begin{matrix} U_{j}^{d} = [\begin{matrix} u_{1, j} \\ ⋮ \\ u_{l, j} \end{matrix}], W_{i ∖ j}^{d} = [\begin{matrix} 1 & w_{1, 1} & \dots & w_{1, m - 1} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 1 & w_{l, 1} & \dots & w_{l, m - 1} \end{matrix}], d \in {C P U, R A M}, \end{matrix}

(7)

where

w_{l, m - 1}

denotes the l-th resource utilization of VM

v_{m - 1}

. The predicted resource utilization vector

\hat{u_{j, k}}

for

v_{j}

is denoted as follows:

\begin{matrix} \hat{U_{j}^{d}} = W_{i ∖ j}^{d} {(W_{i ∖ j}^{d T} W_{i ∖ j}^{d})}^{- 1} W_{i ∖ j}^{d T} U_{j}^{d} . \end{matrix}

(8)

The resource utilization correlation

R_{i ∖ j}

formula between the j-th VM on host

p_{i}

and other VMs is shown below:

\begin{matrix} R_{i ∖ j}^{d} = \frac{\sum_{k = 1}^{l} (u_{j, k} - \bar{u_{j}}) (\hat{u_{j, k}} - \bar{\hat{u_{j}}})}{\sqrt{\sum_{k = 1}^{l} {(u_{j, k} - \bar{u_{j}})}^{2} \sum_{k = 1}^{l} {(\hat{u_{j, k}} - \bar{\hat{u_{j}}})}^{2}}}, \end{matrix}

(9)

where

\bar{u_{j}}

and

\bar{\hat{u_{j}}}

denote the mean values of

u_{j, k}

and

\hat{u_{j, k}}

, respectively.

4. Proposed Algorithm

This study proposes a two-stage VMC method to improve cloud resource utilization while ensuring service quality. In the VM selection stage, the resource utilization of all physical hosts is first detected, and the hosts are marked as overloaded or underloaded according to the thresholds. The VMs in the underloaded hosts are constructed as a list to be migrated, and then the VM co-location coefficients are computed based on the resource utilization correlation analysis. The VMs with the lowest co-location coefficients are then identified in the overloaded hosts to be added to the list. In the virtual machine migration stage, a DRL decision-making model is constructed. A composite reward function of energy consumption and SLA violation rate is used to enable the agent to learn the optimal mapping strategy from VMs to hosts.

4.1. VM Selection

Cloud environments are highly dynamic and involve multiple types of resources, where workloads fluctuate frequently and demand varying resources. To address this complexity, we introduced adaptive overloaded and underloaded thresholds in the virtual machine selection stage, providing a more robust resource detection mechanism:

\{\begin{matrix} θ_{L} & = θ_{l} + α (\bar{u_{j}} - 0.5) - β σ, \\ θ_{H} & = θ_{h} + α (\bar{u_{j}} - 0.5) + β σ, \end{matrix}

(10)

where

θ_{h} = 0.8

and

θ_{l} = 0.2

are the baseline thresholds, which are widely used empirical values in the cloud computing literature;

{\bar{u}}_{j}

and

σ

represent the mean and standard deviation of the last l resource utilization, respectively; and the parameters

α

and

β

, controlling the influence of the average utilization and its fluctuation on the thresholds, are taken as 0.2 and 0.1, respectively.

In the VM selection phase, we first constructed the list of VMs to be migrated from underloaded hosts and then focused on overloaded physical hosts. Considering the interactions between VMs, we define a co-location coefficient K between VMs to evaluate whether the VMs are suitable to be placed on the same host or not. The formula for K is as follows:

\begin{matrix} K = e^{- R_{i ∖ j}^{d}}, \end{matrix}

(11)

where

R_{i ∖ j}^{d}

denote the resource utilization correlation between

v_{i j}

and other VMs on the same host. It is evident that a VM exhibiting higher resource utilization correlation with other VMs will yield a lower K value, making it less suitable for co-location with other VMs. In our approach, we aim to host VM combinations with more diversified resource utilization patterns on each physical host to better achieve flexible virtual resource allocation and load balancing.

Based on the above definition, we propose a co-location coefficient-based VM selection (CVMS) algorithm. The algorithm prioritizes migrating the VMs with the lowest K value during the iteration process. The complete process of the CVMS method is shown in Algorithm 1.

Algorithm 1 Co-location coefficient-based VM selection

Input: Overloaded host

p_{i}

;

1:: $migList \leftarrow ⌀$ ;
2:: $hostUtil \leftarrow GetUtilization (p_{i})$ ;
3:: while $hostUtil > θ_{L}$ do
4:: for each virtual machine $v m$ in $p_{i}$ do
5:: $curUtil \leftarrow GetUtilization (v m)$ ;
6:: $prevUtils \leftarrow GetPrevUtilizations (v m, l - 1)$ ;
7:: $U_{j}^{d} \leftarrow ConstructVector (curUtil, prevUtils)$ ;
8:: end for
9:: $W_{i ∖ j}^{d} \leftarrow ConstructMatrix (1, Vectors)$ ;
10:: for each virtual machine $v m$ in $p_{i}$ do
11:: $K \leftarrow CalculateK (U_{j}^{d}, W_{i ∖ j}^{d})$ ;
12:: if $K < minK$ then
13:: $minK \leftarrow K$ ;
14:: $vmToMigrate \leftarrow v m$ ;
15:: end if
16:: end for
17:: $migList \leftarrow migList \cup {vmToMigrate}$ ;
18:: $p_{i} \leftarrow p_{i} ∖ {vmToMigrate}$ ;
19:: $hostUtil \leftarrow hostUtil - GetUtilization (vmToMigrate)$ ;
20:: end while

Output: Migration list

m i g L i s t

;

In Algorithm 1, the overloaded host

p_{i}

is taken as the input. Each iteration of the loop selects a VM with the lowest co-location coefficient K for migration until

p_{i}

is no longer overloaded. Finally, the list of VMs to be migrated,

m i g L i s t

, is returned for the work of the next stage.

4.2. VM Migration

In the data center environment, the irrational allocation of virtual resources leads to a series of energy consumption problems. Moreover, the resource utilization of hosts is in a continuous state of dynamic change, which makes it difficult for traditional optimization learning algorithms to effectively handle the complex and ever-changing scenarios in data centers. When it comes to resource allocation and VM migration, achieving the simultaneous goals of minimizing system energy consumption, maximizing quality of service, and ensuring equitable load distribution is an extremely challenging task.

To address the above-mentioned issues, we propose a virtual machine migration algorithm based on DQN. The architecture of the overall algorithm model is shown in Figure 2. By formulating an optimization problem, we ingeniously model virtual machine migration and resource allocation as a MDP. With the help of the DQN-based algorithm, we train the network architecture within the cloud–data center collaborative framework to optimize VM migration and resource allocation. During the training process, the agent continuously interacts with the environment and updates its model in real time to derive the optimal strategies for VM migration and resource allocation. The following sections will define the elements of the MDP in detail, including the state space, action space, and reward function.

4.2.1. State Space

In order to make the agent learn the migration strategy better, the following host features were extracted in this paper.

CPU: CPU utilization was an essential element in our study. On the one hand, it was positively correlated with energy consumption. On the other hand, CPU load was closely linked to SLAVs, making it a crucial reference in our research on VM migration.
RAM: The RAM utilization also affects the duration of VM migration decisions. In addition, there is a significant correlation between RAM utilization and SLAVs, making it an important metric for evaluating service quality.

At time t, the resource utilization of host i is denoted as

u_{i}^{t} = {[u_{i}^{C P U}, u_{i}^{R A M}]}^{T}

. Accordingly, the state space of the entire system is defined as follows:

\begin{matrix} S_{t} = [u_{1}^{t}, u_{2}^{t}, \dots, u_{n}^{t}] . \end{matrix}

(12)

4.2.2. Action Space

In order for the VMs in the migration list to migrate to the appropriate host, we define the mapping relationship between the VMs, hosts

δ

, and the action space of the agent as follows:

\begin{matrix} A_{t} = [δ_{1}, δ_{2}, \dots, δ_{n}, δ_{n + 1}], \end{matrix}

(13)

where

δ_{n + 1}

means that when all hosts are not suitable to host this VM, it is mapped to a newly opened

p_{n + 1}

host. It is worth noting that

δ_{i}

needs to satisfy the following constraints:

\begin{matrix} δ_{i} \in {0, 1} i \in (1, n + 1), \end{matrix}

(14)

\begin{matrix} \sum_{i = 1}^{n + 1} δ_{i} \equiv 1 . \end{matrix}

(15)

Here, Constraint (14) defines whether a VM is migrated to this host, and Constraint (15) stipulates that a VM can only be migrated to one host.

4.2.3. Reward Function

We evaluate the decisions of the agent using a reward function, and in order to enable the agent to continuously reduce the energy consumption and SLA violation rate during VM migration, we constructed the energy consumption and SLAV differentials for each step as follows:

\begin{matrix} Δ_{t}^{E} = E_{t - 1} - E_{t}, \end{matrix}

(16)

\begin{matrix} Δ_{t}^{S L A V} = S L A V_{t - 1} - S L A V_{t} . \end{matrix}

(17)

To eliminate the disparity in magnitude between the changes in energy consumption and SLA violation rate, we employed Min-Max normalization to scale both quantities to the [0,1] interval. The normalized values are then aggregated, and the negative of this sum is used as the reward for the migration operation:

\begin{matrix} R_{t} = - (Δ_{S c a l e}^{E} + Δ_{S c a l e}^{S L A V}) . \end{matrix}

(18)

A larger R value indicates a more significant decrease in energy consumption and SLA violation rate compared to the previous time, which generates a stronger positive incentive for the agent.

4.2.4. Training Model

The training process of DQN-based VM migration (DVMM) is presented in Algorithm 2. The Q-network is implemented as a deep neural network (DNN) with two hidden layers of 128 units each and ReLU activations. It takes the list of VMs to be migrated (

m i g L i s t

) and the current training parameters as input, and it then outputs the migration decision for VM-host mapping. We employed the evaluate network to predict the Q-values under the current state and to leverage the target network to generate relatively stable training targets. To reduce oscillations in target values and to ensure training stability, the parameters

θ^{'}

of the target network are periodically synchronized with

θ

every 100 steps.

The training is conducted for 1000 episodes, with a maximum of 400 steps per episode. Mini-batches of size 32 are randomly sampled from a replay buffer of 30,000 interaction experiences to perform gradient updates. The learning rate of the Q-network is set to 0.001, and the discount factor

γ

is set to 0.95. During each episode, the agent selects migration actions based on an

ϵ

-greedy policy, where

ϵ

decays from 1.0 to 0.01 at a rate of 0.995 per episode, ensuring a smooth transition from exploration to exploitation.

Algorithm 2 DQN-based VM migration

Input:

m i g L i s t

, Q-network weight

θ

, exploration probability

ϵ

1:: Initialize evaluate network Q with $θ$ ;
2:: Initialize target network $Q^{'}$ with $θ^{'}$ = $θ$ ;
3:: Initialize environment;
4:: Initialize replay memory D to store experience tuples;
5:: for each episode do
6:: Initialize state $s_{t}$ from environment;
7:: while episode is not done do
8:: if random_uniform(0, 1) $< ϵ$ then
9:: select a random action $a_{t}$ ;
10:: else
11:: $a_{t} \leftarrow a r g m a x_{a} Q (s_{t}, a; θ)$ ;
12:: end if
13:: Observe $r_{t}$ and $s_{t + 1}$ ;
14:: Store experience $(s_{t}, a_{t}, r_{t}, s_{t + 1})$ in D;
15:: Sample random mini-batch from D;
16:: Perform gradient descent to minimize loss function;
17:: Copy $θ$ to $θ^{'}$ at specific fixed step intervals;
18:: Update $ϵ$ ;
19:: end while
20:: end for

Output: The decision of Q-network on VM-host mapping;

4.2.5. Complexity Analysis

The CVMS module constructs the migration list by detecting the VM that is least suitable to be co-located with others on overloaded hosts. With n hosts and m VMs, each VM needs to be evaluated with the remaining

m - 1

VMs based on their resource utilization over the past l time intervals, resulting in a time complexity of

O (n \cdot m^{2} \cdot l)

. The space complexity is primarily composed of the historical resource utilization data

O (n \cdot m \cdot l)

and the VM colocation evaluation matrix

O (m^{2})

, with a total of

O (n \cdot m \cdot l + m^{2})

.

The network of the DVMM module adopts a fixed structure with two hidden layers. The time complexity of network computation for a single sample is determined by the number of units per layer p, which is

O (p^{2})

. Considering the batch size b, the computation load for single-batch training is

O (b \cdot p^{2})

. Extending to single-episode training with s interaction steps, the time complexity becomes

O (s \cdot b \cdot p^{2})

. For a total of e training episodes, the overall time complexity is

O (e \cdot s \cdot b \cdot p^{2})

. The space complexity mainly includes the neural network parameter storage

O (p^{2})

, and the experience replay buffer storing c pieces of experience data with n-dimensional states

O (n \cdot c)

, resulting in

O (p^{2} + n \cdot c)

.

5. Performance Evaluation

5.1. Experimental Settings

To validate the effectiveness of the proposed method, we conducted simulation experiments on a cloud data center built on the CloudSim 4.0 platform [29], consisting of 200 heterogeneous hosts. The hardware configurations were designed with reference to Amazon EC2 specifications, incorporating two types of host flavors and four types of VM flavors. These flavors are primarily distinguished by their million instructions per second (MIPS) and RAM capacities. Detailed configurations are provided in Table 2.

In our experiments, we utilized real-world public workload data collected by the PlanetLab monitoring infrastructure provided by the CoMon project. The dataset was recorded at five-minute intervals over a 10-day period during March and April 2011. We randomly selected workload traces from different days as samples, and we then extracted a specified number of data entries from the chosen days for each experiment. Moreover, the widely adopted SPEC power model [30] was employed to estimate power consumption based on CPU utilization. Under this model, Table 3 presents the power consumption of the two host types at different load levels. To ensure the reliability of the results, each experiment was repeated 10 times and the average is presented.

5.2. Component Validity Analysis

5.2.1. Validation of CVMS

We propose a VM selection method based on the co-location coefficient model, which we term CVMS. To verify its effectiveness, we compared it with three popular virtual machine selection methods: (1) Random Selection (RS), which randomly selects a VM on the host; (2) Maximum Correlation (MC), which selects the VM with the highest correlation to the host load; and (3) Minimum Migration Time (MMT), which selects the VM with the shortest migration time on the host. Different selection methods are used to add VMs to the migration candidate list until the host is no longer overloaded. All candidate lists are then processed through the same DVMM procedure to enable intuitive evaluation and ensure the fairness of the experiments.

Figure 3 illustrates the energy consumption and SLAVs under different numbers of VMs. It is evident that CVMS achieves the best performance across problems of various scales. When the number of VMs reaches 400, its energy consumption and SLAVs are, on average, 33.57% and 22.73% lower than those of other algorithms, respectively. This improvement can be attributed to the co-location coefficient model constructed based on the resource utilization correlation among VMs on overloaded hosts. By effectively identifying and selecting the VMs that are least suitable for colocation, CVMS enables the host to return to a normal resource utilization level, thereby reducing both energy consumption and SLAVs.

5.2.2. Validation of DVMM

We propose a DQN-based VM migration algorithm, which is denoted CVMS. To verify the effectiveness of this approach, we compared it with three popular VM migration methods: (1) First Fit Decreasing (FFD), which prioritizes migrating a VM to the first host that can accommodate it; (2) Best Fit Decreasing (BFD), which prioritizes the host with the least remaining but sufficient resources; and (3) Power Aware Best Fit Decreasing (PABFD), which prioritizes the host with the lowest energy consumption. To ensure the fairness of the experiments, all data was preprocessed using CVMS, and the final results were obtained through different migration methods for comparison.

Figure 4 illustrates the energy consumption and SLAVs under different numbers of VMs. The effectiveness of our proposed method is evident from the experimental results, especially when the number of VMs reaches 400 (as this is where DVMM demonstrates the most significant advantage—achieving an average reduction of 43.22% in energy consumption and 33.48% in SLAVs compared to other algorithms). The advantage results from the ability of the agent to make informed decisions based on a reward function derived from energy consumption and SLAVs, enabling the agent to find optimal migration strategies through continuous interaction with the environment.

5.3. Comparative Evaluation

Based on the validation of the effectiveness of each component, this section evaluates the overall performance of the proposed EVMC algorithm. Several representative and competitive algorithms in this field are selected for comparison, namely SMVMP [19], MEGH [24], DVMP [31], and AVMP [32]. Specifically, SMVMP is an improved multi-objective genetic algorithm; MEGH is a reinforcement learning method for real-time migration; and DVMP and AVMP are two representative DRL approaches based on DQN and A3C, respectively.

5.3.1. Convergence Analysis

The DQN agent undergoes a training process, during which a reward evolution over episodes takes place, as illustrated in Figure 5. Since SMVMP is an evolutionary algorithm, we conducted separate experiments and converted its final results into rewards to serve as a reference baseline. At the beginning, the agent is not fully adapted to the environment, but through continuous exploration, it gradually improves its decision-making ability. The reward curve shows fluctuations due to occasional suboptimal actions; however, once the agent discovers more effective migration strategies, the rewards increase steadily and eventually converge within a stable range. Throughout the training process, the reward curve of EVMC rises the fastest and stabilizes around −0.32 after approximately 200 episodes, exhibiting the smallest fluctuations among all methods. This performance benefits from our two-stage design: the CVMS module accurately evaluates the virtual machines to be migrated, providing more promising initial candidates, while the DVMM module leverages its superior environment interaction capability to make efficient migration decisions.

5.3.2. Performance Metrics Comparison

We first compared the energy consumption performance of different algorithms, and the experimental results are shown in Figure 6a. As the number of VMs increases, the energy consumption curves fluctuated but generally showed an upward trend. This can be attributed to two factors: on one hand, the increase in the number of VMs leads to higher overall resource usage; on the other hand, the increased complexity of scheduling results in more fragmented resources. It is evident from the figure that EVMC achieved the lowest energy consumption across different problem scales, with an average reduction of 42.7% compared to SMVMP, 39.8% compared to MEGH, 18.43% compared to DVMP, and 27.91% compared to AVMP.

Figure 6b illustrates the SLAV rate of the different algorithms. It can be clearly observed that the SLAV curves of all algorithms exhibited a gentle upward trend, which was due to intensified resource competition as the number of VMs increased, making it harder for the system to meet quality of service requirements. The SLAV optimization effect of EVMC was significantly better than that of other algorithms, being lower than SMVMP, MEGH, DVMP, and AVMP by 29.01%, 29.52%, 18.43%, and 19.96%, respectively. This improvement benefits from our precise evaluation of VMs to be migrated and the deep interaction between the agent and the environment, effectively avoiding the SLAV issues caused by overly aggressive consolidation strategies.

The number of VM migrations is a key optimization metric as it not only affects system service continuity, but is also closely related to performance. Under limited resources, excessive migration operations can degrade overall system performance and occupy significant network bandwidth, causing system bottlenecks. In our problem modeling, we fully considered and quantified the costs caused by migrations to prevent excessive redundant migration operations. Figure 6c presents the migration count curves of the different algorithms. As the number of VMs increased, the optimization effect of EVMC became increasingly significant, reducing migration counts by 24.78%, 20.15%, 13.26%, and 12.27% compared to SMVMP, MEGH, DVMP, and AVMP, respectively.

Since VMs simultaneously consume resources across multiple dimensions, scheduling strategies that focus on a single resource often lead to imbalance in the utilization of other resource dimensions. This imbalance can not only cause a certain resource dimension to become a performance bottleneck, but also generate a large amount of resource fragmentation, reducing overall resource utilization and service quality. To more comprehensively evaluate the effectiveness of resource scheduling, we introduced the metric of multidimensional load balancing to measure the coordination degree of utilization across various resource dimensions. The formula is as follows:

\begin{matrix} M L B = \sum_{d \in D} \sqrt{\sum_{j = 1}^{m} {(u_{j}^{r} - {\bar{u}}^{r})}^{2} / m}, D = {C P U, R A M} . \end{matrix}

(19)

As shown in Figure 6d, EVMC achieved the best balance across different scales. Through quantitative analysis, the average multidimensional load balance value was 0.31, which was 23.31% lower than SMVMP, 19.04% lower than MEGH, 10.38% lower than DVMP, and 11.05% lower than AVMP.

5.3.3. Statistical Tests

To determine the statistical significance of the performance differences between EVMC and the comparison algorithms, we conducted the Friedman statistical test. Specifically, by comparing and ranking the performance values (V) of different algorithms under various scales, the test statistic was then calculated based on the ranks (R), as shown in the following formula:

\begin{matrix} R_{j} = \frac{1}{N} \sum_{j = 1}^{N} r_{j}^{i}, \end{matrix}

(20)

\begin{matrix} τ_{χ}^{2} = \frac{12 N}{K (K + 1)} (\sum_{j = 1}^{K} R_{j}^{2} - \frac{K {(K + 1)}^{2}}{4}), \end{matrix}

(21)

\begin{matrix} τ_{F} = \frac{(N - 1) τ_{χ}^{2}}{N (K - 1) - τ_{χ}^{2}}, \end{matrix}

(22)

where K represents the number of algorithms, N represents the number of scenarios of the same scale, and

r_{j}^{i}

denotes the rank of the j-th algorithm in the i-th scenario. The test statistic follows a chi-square distribution with

K - 1

degrees of freedom. As shown in Table 4, EVMC achieved the best rankings across different metrics. The calculated p-value was 0.0051 (<0.01), demonstrating that the differences among the algorithms were statistically highly significant.

6. Conclusions

We propose an efficient virtual machine consolidation method named EVMC, which employs a co-location coefficient model to detect the VMs in a host that are least suitable for sharing resources with others. The DQN agent then dynamically executes migration decisions to move these VMs to appropriate hosts. By consolidating VMs onto fewer hosts, idle hosts can be shut down to save energy. Extensive experimental results demonstrate that EVMC optimizes energy consumption, reduces SLAV rates, achieves the fewest migrations, and achieves the best balance across multiple resource dimensions. However, the current study only considers a limited set of optimization metrics, and the correlation-based detection model may encounter performance bottlenecks when facing highly dynamic and unpredictable workload types. As part of our future work, we plan to improve the predictive and generalization capabilities of the migration model by incorporating advanced machine learning mechanisms, expand the set of QoS metrics to make the method more suitable for complex real-world cloud environments, and enhance the scalability of the migration model in extremely large clusters by adopting a distributed computing architecture.

Author Contributions

Conceptualization, P.Z. and J.L.; methodology, J.G.; software, L.T.; validation, P.Z., J.G. and J.L.; formal analysis, J.G.; investigation, L.T.; resources, P.Z. and L.T.; data curation, L.T.; writing—original draft preparation, J.G. and J.L.; writing—review and editing, P.Z. and J.G.; visualization, J.G.; supervision, P.Z.; project administration, J.L.; funding acquisition, P.Z. and L.T. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the National Natural Science Foundation of China under Grant 62471493 and 62402257 (for problem model research); partially supported by the Natural Science Foundation of Shandong Province under Grant ZR2023LZH017, ZR2024MF066, and 2023QF025 (for obtaining experimental data); and also partially supported by the Open Foundation of Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Qilu University of Technology (Shandong Academy of Sciences) under Grant 2023ZD010 (for metrics and experiment verification).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhang, P.; Zhang, Y.; Kumar, N.; Hsu, C.H. Deep reinforcement learning algorithm for latency-oriented IIoT resource orchestration. IEEE Internet Things J. 2022, 10, 7153–7163. [Google Scholar] [CrossRef]
Zhang, P.; Chen, N.; Xu, G.; Kumar, N.; Barnawi, A.; Guizani, M.; Duan, Y.; Yu, K. Multi-target-aware dynamic resource scheduling for cloud-fog-edge multi-tier computing network. IEEE Trans. Intell. Transp. Syst. 2023, 25, 3885–3897. [Google Scholar] [CrossRef]
Gudkov, A.; Popov, P.; Romanov, S. BalCon—Resource balancing algorithm for VM consolidation. Future Gener. Comput. Syst. 2023, 147, 265–274. [Google Scholar] [CrossRef]
Patel, P.; Ranabahu, A.H.; Sheth, A.P. Service Level Agreement in Cloud Computing; Wright State University: Dayton, OH, USA, 2009. [Google Scholar]
Javadi, S.A.; Suresh, A.; Wajahat, M.; Gandhi, A. Scavenger: A black-box batch workload resource manager for improving utilization in cloud environments. In Proceedings of the ACM Symposium on Cloud Computing, Santa Cruz, CA, USA, 20–23 November 2019; pp. 272–285. [Google Scholar]
Radi, M.; Alwan, A.A.; Gulzar, Y. Genetic-based virtual machines consolidation strategy with efficient energy consumption in cloud environment. IEEE Access 2023, 11, 48022–48032. [Google Scholar] [CrossRef]
Cortez, E.; Bonde, A.; Muzio, A.; Russinovich, M.; Fontoura, M.; Bianchini, R. Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms. In Proceedings of the P26th Symposium on Operating Systems Principles, Shanghai, China, 28–31 October 2017; pp. 153–167. [Google Scholar]
Alshammari, M.M.; Alwan, A.A.; Nordin, A.; Abualkishik, A.Z. Data backup and recovery with a minimum replica plan in a multi-cloud environment. In Research Anthology on Privatizing and Securing Data; IGI Global: Hershey, PA, USA, 2021; pp. 794–814. [Google Scholar]
Avgerinou, M.; Bertoldi, P.; Castellazzi, L. Trends in data centre energy consumption under the european code of conduct for data centre energy efficiency. Energies 2017, 10, 1470. [Google Scholar] [CrossRef]
Wang, J.; Gu, H.; Yu, J.; Song, Y.; He, X.; Song, Y. Research on virtual machine consolidation strategy based on combined prediction and energy-aware in cloud computing platform. J. Cloud Comput. 2022, 11, 50. [Google Scholar] [CrossRef]
Imran, M.; Ibrahim, M.; Din, M.S.U.; Rehman, M.A.U.; Kim, B.S. Live virtual machine migration: A survey, research challenges, and future directions. Comput. Electr. Eng. 2022, 103, 108297. [Google Scholar] [CrossRef]
Li, Z.; Yu, X.; Yu, L.; Guo, S.; Chang, V. Energy-efficient and quality-aware VM consolidation method. Future Gener. Comput. Syst. 2020, 102, 789–809. [Google Scholar] [CrossRef]
Pourghebleh, B.; Aghaei Anvigh, A.; Ramtin, A.R.; Mohammadi, B. The importance of nature-inspired meta-heuristic algorithms for solving virtual machine consolidation problem in cloud environments. Clust. Comput. 2021, 24, 2673–2696. [Google Scholar] [CrossRef]
Yao, W.; Wang, Z.; Hou, Y.; Zhu, X.; Li, X.; Xia, Y. An energy-efficient load balance strategy based on virtual machine consolidation in cloud environment. Future Gener. Comput. Syst. 2023, 146, 222–233. [Google Scholar] [CrossRef]
Tlili, T.; Krichen, S. Best Fit Decreasing Algorithm for Virtual Machine Placement Modeled as a Bin Packing Problem. In Proceedings of the 2023 9th International Conference on Control, Decision and Information Technologies (CoDIT), Rome, Italy, 3–6 July 2023; pp. 1261–1266. [Google Scholar]
Zhao, C.; Liu, J. A virtual machine dynamic consolidation algorithm based dynamic complementation and ffd algorithm. In Proceedings of the 2015 Fifth International Conference on Communication Systems and Network Technologies, Gwalior, India, 4–6 April 2015; pp. 333–338. [Google Scholar]
Tandon, A.; Manju, K.; Patel, S. A New VMP Approach Based on CPU and Memory Using Bin Packing. In Proceedings of the 2024 IEEE Pune Section International Conference (PuneCon), Pune, India, 13–15 December 2024; pp. 1–6. [Google Scholar]
Sayadnavard, M.H.; Haghighat, A.T.; Rahmani, A.M. A multi-objective approach for energy-efficient and reliable dynamic VM consolidation in cloud data centers. Eng. Sci. Technol. Int. J. 2022, 26, 100995. [Google Scholar] [CrossRef]
Saxena, D.; Gupta, I.; Kumar, J.; Singh, A.K.; Wen, X. A secure and multiobjective virtual machine placement framework for cloud data center. IEEE Syst. J. 2021, 16, 3163–3174. [Google Scholar] [CrossRef]
Hariharan, B.; Siva, R.; Kaliraj, S.; Prakash, P.S. ABSO: An energy-efficient multi-objective VM consolidation using adaptive beetle swarm optimization on cloud environment. J. Ambient. Intell. Humaniz. Comput. 2023, 14, 2185–2197. [Google Scholar] [CrossRef]
Singh, S.; Kumar, R.; Singh, D. An empirical investigation of task scheduling and VM consolidation schemes in cloud environment. Comput. Sci. Rev. 2023, 50, 100583. [Google Scholar] [CrossRef]
Shaw, R.; Howley, E.; Barrett, E. Applying reinforcement learning towards automating energy efficient virtual machine consolidation in cloud data centers. Inf. Syst. 2022, 107, 101722. [Google Scholar] [CrossRef]
Haghshenas, K.; Pahlevan, A.; Zapater, M.; Mohammadi, S.; Atienza, D. Magnetic: Multi-agent machine learning-based approach for energy efficient dynamic consolidation in data centers. IEEE Trans. Serv. Comput. 2019, 15, 30–44. [Google Scholar] [CrossRef]
Basu, D.; Wang, X.; Hong, Y.; Chen, H.; Bressan, S. Learn-as-you-go with megh: Efficient live migration of virtual machines. IEEE Trans. Parallel Distrib. Syst. 2019, 30, 1786–1801. [Google Scholar] [CrossRef]
Caviglione, L.; Gaggero, M.; Paolucci, M.; Ronco, R. Deep reinforcement learning for multi-objective placement of virtual machines in cloud datacenters. Soft Comput. 2021, 25, 12569–12588. [Google Scholar] [CrossRef]
Tong, Z.; Wang, J.; Wang, Y.; Liu, B.; Li, Q. Energy and performance-efficient dynamic consolidate VMs using deep-Q neural network. IEEE Trans. Ind. Inform. 2023, 19, 11030–11040. [Google Scholar] [CrossRef]
Zeng, J.; Ding, D.; Kang, K.; Xie, H.; Yin, Q. Adaptive DRL-based virtual machine consolidation in energy-efficient cloud data center. IEEE Trans. Parallel Distrib. Syst. 2022, 33, 2991–3002. [Google Scholar] [CrossRef]
Beloglazov, A.; Buyya, R. Optimal online deterministic algorithms and adaptive heuristics for energy and performance efficient dynamic consolidation of virtual machines in cloud data centers. Concurr. Comput. Pract. Exp. 2012, 24, 1397–1420. [Google Scholar] [CrossRef]
Calheiros, R.N.; Ranjan, R.; Beloglazov, A.; De Rose, C.A.; Buyya, R. CloudSim: A toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Software Pract. Exp. 2011, 41, 23–50. [Google Scholar] [CrossRef]
Lange, K.D. Identifying shades of green: The SPECpower benchmarks. Computer 2009, 42, 95–97. [Google Scholar] [CrossRef]
Panesar, G.S.; Chadha, R. DDPG: Cloud Data Centre Hybrid Optimisation for Virtual Machine Migration and Job Scheduling. In Proceedings of the 2023 3rd International Conference on Technological Advancements in Computational Sciences (ICTACS), Tashkent, Uzbekistan, 1–3 November 2023; pp. 150–156. [Google Scholar]
Wei, P.; Zeng, Y.; Yan, B.; Zhou, J.; Nikougoftar, E. VMP-A3C: Virtual machines placement in cloud computing based on asynchronous advantage actor-critic algorithm. J. King Saud Univ. Comput. Inf. Sci. 2023, 35, 101549. [Google Scholar] [CrossRef]

Figure 1. Cloud computing application scenarios and cloud data centers.

Figure 2. Architecture of the algorithm model.

Figure 3. Performance of CVMS with different VM selection methods. (a) Comparison of energy consumption. (b) Comparison of SLAVs.

Figure 4. Performance of DVMM with different VM migration methods. (a) Comparison of Energy Consumption. (b) Comparison of SLAVs.

Figure 5. Comparison of the rewards from different algorithms.

Figure 6. Overall performance comparison of the five algorithms. (a) Comparison of energy consumption. (b) Comparison of SLAVs. (c) Comparison of the number of migrations. (d) Comparison of multidimensional load balance.

Table 1. Comparison of the representative VMC approaches and the proposed method.

Category	Representative	Strategy	Distinction of EVMC
Heuristic	BFD [15]	Greedy strategy	Co-location coefficient-based VM selection avoiding local optimization
Heuristic	FFD [16]	Greedy strategy
Meta-heuristic	MOABC [18]	Evolutionary search-based migration	Two-stage integration reducing redundant migrations and accelerating convergence
	SMVMP [19]
	ABSO [20]
ML	RL [22]	Migration decisions based on learning mechanisms	Fewer feature dependencies and lightw-eight training
	MAGNETIC [23]
	MEGH [24]
DRL	DRLVMP [25]	Neural network policy and adaptive migration	Guided by co-location coefficient for faster convergence and lower overhead
	DQNVMC [26]
	ADVMC [27]
EVMC	–	Co-location coefficient-based selection and DQN-based migration	Joint optimization of energy and SLA with fewer migrations and superior energy performance trade-off

Table 2. Configurations of the hosts and VM based on Amazon EC2.

Category	Type	MIPS	Core	RAM (GB)
Host	HP ProLiant G4	1860	2	4
Host	HP ProLiant G5	2660	2	4
VM	Micro instance	500	1	0.85
	Small instance	1000	1	1.70
	Extra-large instance	2000	1	3.75
	High-CPU medium instance	2500	1	0.85

Table 3. Power consumption at different load levels (%) in Watts.

Host	0	10	20	30	40	50	60	70	80	90	100
HP ProLiant G4	86	89.4	92.6	96	99.5	102	106	108	112	114	117
HP ProLiant G5	93.7	97	101	105	110	116	121	125	129	133	135

Table 4. Friedman test on the four metrics between algorithms.

Metrics	SMVMP		MEGH		DVMP		AVMP		EVMC
Metrics	V	R	V	R	V	R	V	R	V	R
Energy	$1.66 \times 10^{2}$	5	$1.58 \times 10^{2}$	4	$1.17 \times 10^{2}$	2	$1.32 \times 10^{2}$	3	$0.95 \times 10^{2}$	1
SLAVs	$6.85 \times 10^{- 2}$	4	$6.90 \times 10^{- 2}$	5	$5.48 \times 10^{- 2}$	2	$6.06 \times 10^{- 2}$	3	$4.86 \times 10^{- 2}$	1
MigNum	$1.40 \times 10^{2}$	5	$1.32 \times 10^{2}$	4	$1.22 \times 10^{2}$	3	$1.20 \times 10^{2}$	2	$1.06 \times 10^{2}$	1
MLB	$4.05 \times 10^{- 1}$	5	$3.84 \times 10^{- 1}$	4	$3.47 \times 10^{- 1}$	2	$3.50 \times 10^{- 1}$	3	$3.11 \times 10^{- 1}$	1
Rank	4.75		4.25		2.25		2.75		1.00

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, P.; Gao, J.; Liu, J.; Tan, L. EVMC: An Energy-Efficient Virtual Machine Consolidation Approach Based on Deep Q-Networks for Cloud Data Centers. Electronics 2025, 14, 3813. https://doi.org/10.3390/electronics14193813

AMA Style

Zhang P, Gao J, Liu J, Tan L. EVMC: An Energy-Efficient Virtual Machine Consolidation Approach Based on Deep Q-Networks for Cloud Data Centers. Electronics. 2025; 14(19):3813. https://doi.org/10.3390/electronics14193813

Chicago/Turabian Style

Zhang, Peiying, Jingfei Gao, Jing Liu, and Lizhuang Tan. 2025. "EVMC: An Energy-Efficient Virtual Machine Consolidation Approach Based on Deep Q-Networks for Cloud Data Centers" Electronics 14, no. 19: 3813. https://doi.org/10.3390/electronics14193813

APA Style

Zhang, P., Gao, J., Liu, J., & Tan, L. (2025). EVMC: An Energy-Efficient Virtual Machine Consolidation Approach Based on Deep Q-Networks for Cloud Data Centers. Electronics, 14(19), 3813. https://doi.org/10.3390/electronics14193813

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

EVMC: An Energy-Efficient Virtual Machine Consolidation Approach Based on Deep Q-Networks for Cloud Data Centers

Abstract

1. Introduction

2. Related Work

3. Problem Formulation

3.1. Virtual Machine and Physical Machine Modeling

3.2. Energy Consumption

3.3. Migration Cost

3.4. Service Level Agreement Violation

3.5. Correlation of VM Resource Utilization

4. Proposed Algorithm

4.1. VM Selection

4.2. VM Migration

4.2.1. State Space

4.2.2. Action Space

4.2.3. Reward Function

4.2.4. Training Model

4.2.5. Complexity Analysis

5. Performance Evaluation

5.1. Experimental Settings

5.2. Component Validity Analysis

5.2.1. Validation of CVMS

5.2.2. Validation of DVMM

5.3. Comparative Evaluation

5.3.1. Convergence Analysis

5.3.2. Performance Metrics Comparison

5.3.3. Statistical Tests

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI