Strategy for Precopy Live Migration and VM Placement in Data Centers Based on Hybrid Machine Learning

Hidayat, Taufik; Ramli, Kalamullah; Harwahyu, Ruki

doi:10.3390/informatics12030071

Open AccessArticle

Strategy for Precopy Live Migration and VM Placement in Data Centers Based on Hybrid Machine Learning

by

Taufik Hidayat

,

Kalamullah Ramli

^*

and

Ruki Harwahyu

Department of Electrical Engineering, Universitas Indonesia, Depok 16424, Indonesia

^*

Author to whom correspondence should be addressed.

Informatics 2025, 12(3), 71; https://doi.org/10.3390/informatics12030071

Submission received: 5 April 2025 / Revised: 5 June 2025 / Accepted: 11 June 2025 / Published: 15 July 2025

(This article belongs to the Section Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

Data center virtualization has grown rapidly alongside the expansion of application-based services but continues to face significant challenges, such as downtime caused by suboptimal hardware selection, load balancing, power management, incident response, and resource allocation. To address these challenges, this study proposes a combined machine learning method that uses an MDP to choose which VMs to move, the RF method to sort the VMs according to load, and NSGA-III to achieve multiple optimization objectives, such as reducing downtime, improving SLA, and increasing energy efficiency. For this model, the GWA-Bitbrains dataset was used, on which it had a classification accuracy of 98.77%, a MAPE of 7.69% in predicting migration duration, and an energy efficiency improvement of 90.80%. The results of real-world experiments show that the hybrid machine learning strategy could significantly reduce the data center workload, increase the total migration time, and decrease the downtime. The results of hybrid machine learning affirm the effectiveness of integrating the MDP, RF method, and NSGA-III for providing holistic solutions in VM placement strategies for large-scale data centers.

Keywords:

live migration; strategy VM placement; hybrid machine learning

1. Introduction

The advancement of virtualization technology in data centers has increased rapidly in recent years and has enabled seamless resource management and dynamic load balancing. A key component of this technology is live migration, which refers to the transfer of a virtual machine (VM) from its original host to a new host without causing any downtime or disruption of services that are running on the VM. This technique is crucial for maintaining high availability and optimizing resource usage in modern data center environments. Common issues in data centers include host machines with too many VMs, limited memory capacity, full storage, and other problems that reduce customer satisfaction; these issues must be addressed appropriately. One way to maintain the performance of a data center is to implement live virtual migration (LVM). This method balances the workload among host machines, manages resources, and helps perform system maintenance with reduced downtime [1].

The LVM technique is a method for seamlessly migrating VMs between host machines while maintaining the application service continuity of the applications within the VMs. This technique aims to improve the overall system integrity without causing downtime. In the LVM process, failures often occur during migration, and monitoring the host machine conditions, the workload patterns of each VM, and the target host machine is necessary to avoid such failures. LVM techniques have emerged as solutions for optimizing dynamic data centers; the well-known conventional LVM method includes techniques such as precopy migration. In precopy migration, only the altered memory pages (dirty pages) are transmitted in subsequent iterations; the VM memory is copied to the destination host during the first phase. The remaining data are copied to the destination host after the virtual machines (VMs) momentarily stop when a stopping condition is satisfied. This technique has been used extensively because of its efficiency in maintaining VM integrity during the transfer process. However, during periods of very high workloads, the precopy migration method faces significant challenges, and additional knowledge is required to address these issues, which results in downtime for data center services and longer total migration times, thus adversely affecting the data center and reducing its performance [1].

Various methods have been proposed for improving data center performance. However, researchers still face many challenges, such as those described in [1], which examined the application of precopy live migration with a machine learning-based prediction model and successfully reduced the downtime by 64.91% by terminating the migration iteration process once the estimated downtime fell below a predefined threshold. In addition, precopy live migration was estimated to reduce the total transfer time by 85.81% for various memory configurations. The findings of this study demonstrate that the use of machine learning in the migration process not only improves efficiency but also contributes to the increased availability of services in cloud computing. The application of the K-nearest neighbor (KNN) algorithm when optimizing precopy migration has yielded excellent results and achieved a mean absolute percentage error (MAPE) less than 5% in estimating service downtime and overall migration duration. In addition, the proposed model reduced the downtime by as much as 36% compared with existing algorithms [2]. These results demonstrate that AI-driven methodologies, particularly those that emphasize the selection of pertinent features, can significantly improve VM migration performance [3].

In a previous study [4], the WBATimeNet deep learning network model was used to identify virtual machines that require live migration within cloud computing by optimizing resource efficiency and virtual machine availability. This method employs multidimensional temporal data related to memory, CPUs, and disks, as well as the CNN and LSTM approaches, to handle uncertainties in sequential data. The findings of this study demonstrate that WBATimeNet performs better than the baseline model and can minimize the workload during live migration. However, WBATimeNet is constrained by its dependence on multidimensional sequential and dynamic data. This study enhances the LVM-optimized precopy algorithm (OPCA) with an approach that focuses on memory management efficiency during the LVM process. When applied with the gray–Markov model, the OPCA can predict memory behaviour, optimize memory placement, reduce downtime, optimize bandwidth usage, and improve resource utilization efficiency. This system faces several technical challenges, such as algorithm complexity, significant computational overhead, and prediction accuracy limitations. The proposed OPCA provides a solid framework for the future development of VM migration systems [5].

Virtual machine migration and VM placement are significant challenges in data centers because of dynamic workloads, resource limitations, and the need to minimize service disruptions. Inefficient migration decisions can cause various problems, especially workload overload on the host machine, increased migration time, and decreased service quality, which impact performance and energy consumption in data centers.

To address the issues from the previous research, an adaptive approach for predicting which VMs need to be migrated and determining their placement on the appropriate hosts is needed. This study suggests a mixed machine learning system that uses the Markov decision process (MDP) algorithm, which can make decisions when conditions are uncertain, which enables it to predict when VMs should be moved on the basis of the changing conditions of the host machine. The random forest (RF) algorithm is a classifier that can evaluate migration feasibility, effectively handle imbalanced data, and produce reliable predictions. Moreover, the nondominated sorting genetic algorithm III (NSGA-III) performs multi-objective optimization to balance the workload among host machines and minimize migration costs. This integrated framework aims to improve the accuracy of migration predictions and optimize VM placement strategies, thereby enhancing data center efficiency and reducing significant service disruptions. The key contributions of this study are outlined as follows:

A hybrid machine learning model that integrates the method, MDP, RF, and NSGA-III is used to precisely forecast the VM migration time and enhance the placement strategies.
The MDP, RF method, and NSGA-III are used to improve prediction and efficiency in executing the VM precopy migration process.
LVM strategies are used in the proposed model to reduce migration failures and decrease the overall transfer duration and downtime.

The remainder of this paper is organized as follows: Section 2 reviews related studies, with an emphasis on precopy live migration and the application of machine learning in this context. A hybrid machine learning strategy for workload prediction and virtual machine prioritization is a part of the methodology described in Section 3. Section 4 details the experimental outcomes and evaluates the proposed model. The research findings concerning live virtual machine optimization are discussed and contrasted with those of earlier studies. The key contributions of this study are summarized in Section 5, and future directions for improving VM migration live optimization are suggested.

2. Background and Literature Review

2.1. Live VM Migration Model

The LVM has become essential for modern cloud computing infrastructure management. This review analyses the latest developments from 2020 to 2024 in the integration of LVMs with machine learning approaches. LVMs enhance application efficiency during virtual machine migration by improving bandwidth utilization. Precopy, postcopy, and hybrid migration are the three primary methods for moving virtual machines. These methods are essential for controlling virtual machine migration and reducing downtime.

The precopy approach duplicates all memory segments to the target host machine [6]. In precopy migration, the process begins by copying all memory pages of the source VMs to the destination host during the first iteration. In subsequent iterations, only the updated memory pages, known as dirty pages, are transferred. This process continues in multiple iterations until a predefined threshold or stopping condition is met. Since memory pages may be modified multiple times during migration, dirty pages can be duplicated, which leads to repeated transfers. Once the stopping condition is met, the source VMs shut down, and the remaining memory pages are copied to the destination. After this final transfer, the VMs resume operations on the destination host. Downtime occurs when the VMs are shut down and no client activity occurs. Postcopy migration involves pausing the virtual machine on the source host while transferring the critical memory and processor state to the destination. On-demand retrieval of the remaining memory pages from the computer source may result in increased delays because of page faults. Hybrid migration combines the precopy and postcopy strategies: memory pages are copied first in a precopy manner and then in a postcopy manner once the virtual machine has stopped.

When the VMs are subsequently run on their source, the precopy method progressively replicates the modified memory segments from the VMs to the target. It terminates once a predefined threshold is reached, which is determined by cycles with stable memory conditions for the target VMs. The adjusted memory segments are replicated to the target host machine together with the virtual machines and the CPU status once the VMs are halted on the original host machine; this is referred to as the stop-and-copy process, and the virtual machines continue to run at the destination. The postcopy operations begin by stopping the virtual machines on the original host machine, transferring the processor state and essential memory segments, and restarting the target system. The original host then begins pushing memory segments to the destination. During memory transfer, if the virtual machine application requires a specific memory segment that has not yet been migrated, an on-demand request is generated to retrieve it from the original host as a page fault interruption. The page is subsequently copied to the target host machine.

This approach results in less downtime; however, excessive page fault requests reduce the efficiency of applications on virtual machines, as pages are requested and copied over the network, which leads to increased migration delays and diminished application performance compared to those of various other migration methods, including precopy [7]. The integrated approach, which merges the precopy and postcopy methods, initially performs a single precopy migration cycle that transfers the VM memory to the target host system. Moreover, the VMs continue running on the original host. Following this cycle, the postcopy technique suspends the VMs and transmits the CPU state to the target host machine. The remaining memory segments are retrieved from the original host system [8]. This precopy migration is the most critical feature of kernel-based virtual machine (KVM) virtualization because it allows for the transfer of virtual machines without interrupting services. Precopy migration has the following stages: premigration, reservation, memory copy, iteration, stop-and-copy, and activation [1,9]. Figure 1 shows the precopy migration process [10].

2.2. Previous Research on LVM and Machine Learning

Previous studies have utilized machine learning algorithms to improve the efficiency of the live migration process, particularly in predicting VM workloads and in the decision-making regarding VM placement. Several approaches use algorithms such as support vector machines (SVMs), decision trees, and artificial neural networks (ANNs) to detect overload conditions on the basis of metrics such as CPU, memory, and network metrics. Reinforcement learning (RL) approaches such as the deep Q-network (DQN) have also been used to form VM migration policies dynamically [11].

However, most previous approaches tend to only address one stage in the migration process, namely detection or placement, without fully integrating prediction and optimization. Furthermore, previous approaches often use limited parameters such as CPU and memory while neglecting other important aspects, including network [12], task execution duration [13], and service level agreement (SLA) compliance [14], which can significantly influence both the urgency and feasibility of VM migration. The absence of parameters that represent these aspects in the decision-making model can result in instability in VM placement in the data center, where migration and placement times are crucial. Therefore, this research emphasizes the incorporation of other parameters to enhance the model’s performance and ensure that decisions align with VM placement and migration scheduling goals.

Reference [14] considered optimizing VM placement rather than focusing on migration criteria. This research aimed to increase resource utilization and performance by determining optimal VM placement strategies and to ensure that the context of VM migration and placement was clearly distinguished. The primary drawback of this study was the lack of a selection method for the VM migration times. In line with the research described in [15], the live migration process requires pre- and postcopy processes to guide migration on virtual machines. The genetic algorithm that was employed in the study can determine scheduling during the migration process. The major drawback of this study was the absence of a host machine location selection process during the VM process. Research on live migration via machine learning, as described in [16], has shown that the neural network (NN) approach can be combined with the fuzzy c-means (FCM) clustering algorithm to determine live migration. The drawback of this approach is the lack of scheduling time for migration VMs, which makes the live migration performance suboptimal. One study [17] suggested that scheduling must be applied to reduce the excessive workload on the virtual machine when live VM migration is implemented. In their research, they used machine learning. In line with the scheduling opinion in [18], this study assumed that scheduling must be performed to implement live migration.

The MDP in machine learning has broader uses in decision-making [19]. This conclusion is in line with the conclusion of [20], that the MDP can simplify the path problem in the hybrid flow-shop scheduling problem (HFSSP) branch, determine the significant potential in the branch, save data center resources, and quickly optimize the solution. The characteristics of managing dynamic virtual machines using an MDP approach were studied in [21]. This approach can formulate problems and generate optimal solutions in dynamically changing cloud environments. However, when faced with more complex and large-scale scenarios, the MDP structure is adjusted more specifically to remain effective and efficient in generating appropriate migration policies.

This study suggests a new machine learning model that combines the MDP, the RF algorithm, and NSGA-III to improve accuracy, thus addressing the problems identified in earlier studies. The MDP algorithm works with policies that are based on the transition of host machine statuses in a dynamic data center [19], and the RF algorithm offers reliable classification for migration feasibility. Moreover, NSGA-III can handle multi-objective optimization that balances workload overload in the data center and reduces migration costs [22,23]. This combination was chosen on the basis of previous research that highlighted the limitations of earlier methods in handling migration and VM placement decisions. Unlike previous approaches that treated these processes independently, the proposed model can enhance efficiency in dynamic data centers.

3. Proposed Methodology

3.1. Hybrid Machine Learning Architecture for VM Placement Strategy

This section outlines the hybrid machine learning architecture that was developed [24] to address the challenges of efficiently performing VM migrations in data centers. In this model, three main algorithms are executed one after the other. These algorithms are connected to the MDP algorithm, RF algorithm, and NSGA-III. First, the MDP algorithm is used to analyze the conditions of the host machine and the VM workload to identify the priority of VM migration. The MDP algorithm creates a list of which VMs should be moved first, on the basis of a value and policy model, through a process called value iteration while considering the risk of overloading the host machine. The MDP algorithm outputs a VM priority list, which is used as input for the RF classification model.

Next, the RF algorithm evaluates the migration feasibility of each VM by observing features such as CPU usage, memory, and the network. The RF algorithm predicts which VMs are suitable for migration, and these VMs then proceed to the VM placement optimization stage. In the last step, NSGA-III is used to find the best way to place the VMs, with a focus on balancing the workload across host machines and reducing migration costs. In NSGA-III, various placement solutions are evaluated, and the best Pareto optimal solutions are generated.

This study selected the MDP algorithm, the RF algorithm, and NSGA-III because each has advantages that match the needs for moving and placing VMs. The MDP algorithm is suitable for making decisions under conditions of uncertainty, such as predicting which VMs need to be migrated on the basis of the host’s status. The RF algorithm is effective for classifying imbalanced data, thus enabling the accurate prediction of migration feasibility. Moreover, NSGA-III can handle complex multi-objective optimization problems, such as VM placement, for which it must balance workload and minimize migration costs simultaneously. The conceptual diagram shown in Figure 2 illustrates the detailed flow of the hybrid machine learning architecture in the placement of VMs.

Figure 2 explains how the normalized data are used in the MDP algorithm to generate VM migration candidates on the basis of the statuses of the host machine and VMs. The VM migration candidates are subsequently classified by the RF algorithm, and the feasible prediction results are optimized for VM placement using NSGA-III. These optimization results can be reused for classification adjustment

3.2. Dataset Description and Feature Engineering

This study used a dataset from GWA-Bitbrains [25,26], which is a VM dataset from a cloud computing data center. This dataset includes CPU usage, memory, and network information for a specific period. Data processing and feature engineering were carried out gradually and systematically. This study initially created a dataset with VM details and host machine names, which encompassed crucial metrics such as CPU, memory, and network usage. After clustering, the data were cleaned, particularly the numerical features with missing values, by replacing them using the median approach. In the next stage, all the numerical features were normalized using min–max scaling to ensure that they fall within a diverse range.

Next, feature engineering was carried out to create new attributes that better represent the VM workload, including CPU and memory usage ratios and special features such as SLA compliance. The target label, which indicates the migration decision, was assigned on the basis of threshold rules; specifically, when the CPU or memory usage of a VM exceeded 60%, the VM was classified as requiring migration. Prior studies [27,28,29] adopted this threshold and found it effective in distinguishing workloads prone to overload. Additionally, a sensitivity analysis was performed to ensure that this threshold was balanced, minimizing unnecessary migrations and preventing host overload. The data distribution was imbalanced, so data balancing was performed using the synthetic minority oversampling technique (SMOTE). Finally, the dataset was divided into training and testing data (80% training and 20% testing) using the stratification method to maintain balanced class proportions in both datasets. All the stages described were designed to ensure that the data used to train the machine learning model were of high quality, well-structured, and representative of the data conditions in a data center environment. With this approach, the reliability of the model built to predict efficient VM migration decisions was ensured while maintaining the validity of the results when it is implemented in real-world scenarios. Although this study based the decision to migrate VMs on CPU and memory usage thresholds to make testing simpler and easier, many other factors influence migration decisions under real conditions. The VM usage time, variations in CPU and memory usage, network load, and resources utilized during the migration process are among the factors that impact these decisions. In the future, this model will be developed to consider these factors so that the migration predictions will be more accurate, and unnecessary migrations will be reduced, thus making the migration process more efficient. Figure 3 shows a flowchart of the dataset process and feature engineering.

3.3. Learning Modules and Optimization Strategy

The hybrid approach includes three primary learning methods: the MDP, the RF algorithm, and NSGA-III. Each method helps in different ways to decide how to move and place VMs. In the first process, the MDP algorithm evaluates the load status of the host machine to identify the conditions of VMs experiencing overload. By using the value iteration approach, the MDP generates prediction decisions on the basis of reward calculations that consider the overload risk. The RF algorithm then analyzes the selected candidate VM to determine the feasibility of migration on the basis of the CPU and memory workloads. This RF model is trained using 5-fold cross-validation and tuned through RandomizedSearchCV.

This process ensures that the model has good generalizability, with evaluations in terms of accuracy, F1 score, and MAPE. VMs classified as suitable for migration are processed by NSGA-III to determine the target host machine’s optimal placement. By using a multi-objective method that focuses on Pareto dominance, NSGA-III improves how workloads are shared and reduces migration costs, thereby leading to better decisions on where to place VMs. The hybrid machine learning model developed in this study includes a hyperparameter approach, which is detailed in Table 1 below.

Table 1 not only shows the range of values and best options for each parameter but also highlights the thoughtful design of the hybrid machine learning model, which combines classification learning models (RF), multi-objective optimization (NSGA-III), and value-based decision-making (MDP). The parameter values are adjusted on the basis of initial experiments and cross-validation, in consideration of the balance among prediction accuracy, process efficiency, and the model’s ability to generalize to various data center scenarios.

3.4. Hybrid Machine Learning Evaluation Model

This study assesses how well the hybrid machine learning framework performs by examining various measures related to migration decision classification and how efficiently VMs are placed. First, for classification, the confusion matrix equation is used to determine the numbers of false positives (FPs), false negatives (FNs), true positives (TPs), and true negatives (TNs), where the sum TP + TN + FP + FN represents the overall number of tuples, and the accuracy score and F1 score are used to check how accurately the VMs to be migrated are identified. This study also uses the MAPE to evaluate the downtime, total migration time, and service level agreement (SLA) compliance. First, the framework is evaluated using the confusion matrix, accuracy score, and F1 score [30,31,32] to assess the precision with which the VM to be migrated is identified. The mean absolute percentage error (MAPE) is also applied to evaluate the downtime, total migration time, and SLA compliance. It is used to assess the effects of VM placement recommendations on service performance and load distribution in the data center [33,34]; the MAPE calculation is formulated as Equation (1):

M A P E = \frac{100 %}{n} \sum_{i = 1}^{n} | \frac{D_{a c t u a l, i} - D_{p r e d i t e d, i}}{D_{a c t u a l, i}} |

(1)

where n represents the number of VMs being evaluated,

D_{a c t u a l, i}

represents the actual downtime, and

D_{p r e d i t e d, i}

represents the estimated downtime for

{V M}_{i}

. Next, the downtime per VM is evaluated as the time difference between when the VM stops on the source host and when the VM starts running again on the destination host, which is expressed mathematically for VM I by Equation (2):

D_{i} = t_{r e s u m e, i} - t_{s t o p, i}

(2)

In this context,

t_{s t o p, i}

refers to the time (milliseconds) when the migration process begins the stop and copy phase for

{V M}_{i}

, and

t_{r e s u m e, i}

indicates the time when

{V M}_{i}

becomes active again on the destination host. The average downtime (Ď) is then calculated as the average of all N migrated VMs. To evaluate the total time required to complete the live migration process for all the recommended VMs, the total migration time is formulated in Equation (3):

T_{t o t a l} = \sum_{i = 1}^{n} (t_{e n d, i} - t_{s t a r, i})

(3)

A lower

T_{t o t a l}

indicates a more efficient migration strategy in terms of time and reflects an overall reduction in service disruptions at the data center. The evaluation of compliance with the SLA is formulated mathematically as Equation (4), which is based on a VM whose downtime does not exceed the established threshold:

SLA Compliance (%) = (1 - \frac{V}{N}) \times 100 %

(4)

In this study, to calculate energy efficiency [35,36], first, the total energy consumption of each host machine is measured before the VM migration process begins, referred to as

E_{b e f o r e}

, and after all VMs have been successfully migrated, referred to as

E_{a f t e r}

. In the context of VM placement strategies, this energy value is estimated based on a model that relates resource utilization, such as CPU and memory, to power consumption, without relying on any specific virtualization environment. For instance, we convert the host machine’s CPU utilization into an estimated power load (watts) using a predetermined power profile curve. In this way, the energy accumulation during the migration delay period can be calculated as the integral of the power load in watt-hours over the same time interval at

E_{b e f o r e}

and

E_{a f t e r}

. Next, in this study, energy efficiency is calculated via Equation (5):

Energy Efficiency (%) = \frac{E_{b e f o r e} - E_{a f t e r}}{E_{b e f o r e}} \times 100 %

(5)

This formula represents the percentage reduction in energy consumption achieved by the optimized VM placement strategy. A higher percentage means better efficiency, showing that the hybrid algorithm MDP, RF, and NSGA-III successfully reduces power use on all hosts while still maintaining performance and SLA requirements.

4. Results and Discussion

4.1. Results of Hybrid Machine Learning

This section describes the performance testing of the hybrid machine learning model using RF classification on a test dataset in which 20% of the original stratified data were separated to maintain class proportions. The evaluation metrics include accuracy, F1 score, and MAPE for estimating downtime, along with results from placement simulations to cover SLA compliance, downtime, energy efficiency during migration, and the number of VMs successfully migrated.

4.1.1. Migration Policy Analysis

Figure 4 shows the VM migration pattern generated by the migration policy that is based on the MDP. Each number represents the frequency of VM migration from the source host machine to the destination host machine during the testing period. A higher value indicates that migration on the host machine occurs frequently and reflects the priorities and migration strategies that are applied in the model to reduce the workload on the host machine when it experiences overload. In contrast, a low number indicates that the migration path does not align with the workload needs or the capacity of the host machine, thus clearly demonstrating how the MDP policy dynamically and adaptively manages the workload distribution.

4.1.2. Migration Feasibility Prediction Results

This study evaluates the ability of the RF model to predict the feasibility of virtual machine migration. The model was trained and tested on a dataset that had undergone meticulous preprocessing. The model’s ability to distinguish between VMs that can and cannot be migrated was measured in terms of accuracy, precision, recall, and F1 score. The results from this output demonstrate the model’s ability to reliably identify VM migration priorities with a training time of 86.37 s, which is an essential step in optimizing direct VM migration strategies; see Table 2 below.

Table 2 shows that the RF model performed excellently in predicting VM migration. For the non-migration class, the model achieved a precision of 98.6%, a recall of 100%, and an F1 score of 99.3%, which indicates that the model identified VMs that did not need to be migrated accurately. For the migration class, the model’s precision reached 100%, recall reached 92.2%, F1 score reached 96%, and accuracy reached 98.77%, which indicates that the model detected most VMs that needed to be migrated very well. However, there were a few missed predictions. Overall, this study confirms that the RF model can be relied upon for effective VM migration decisions; see Figure 5 below for the confusion matrix of the RF model.

Figure 5 shows the confusion matrix of the RF model for the prediction of VM migration. This model successfully identified 54 VMs that did not require migration (true negatives), with only three false positives, thus indicating a tendency to predict migration for VMs that do not need to be migrated (false positives). It successfully identified 24 VMs that needed to be migrated (true positives), but there were two false negatives where migration should have been performed but was not.

Although the RF model has shown good performance in predicting migrations, the use of a RF is limited to classifying the feasibility of VM migrations. Therefore, in the proposed approach, the MDP model is integrated to decide migration priorities on the basis of the host machine’s and VM’s current conditions to optimize scheduling in dynamic situations. NSGA-III is then used to optimize the placement of VMs on different host machines by considering multiple objectives such as migration time, downtime, and workload distribution among host machines.

4.1.3. NSGA-III Evolution Metric

This section illustrates the evolution of overload and migration times across generations in the optimization process of NSGA-III. The first plot displays the minimum and average overloads at each generation, which shows the algorithm’s ability to reduce excessive load on the host machine gradually. In the second plot, the minimum and average migration times are shown, which reflect the improvements in efficiency in VM migration scheduling over generations.

In Figure 6a, the evolution of overload values (minimum and average) over generations is shown, while in Figure 6b, the evolution of migration time (minimum and average) over the same generations is displayed. These results show that NSGA-III consistently reduces host overload from the early to the late generations, allowing for a more even distribution of workload. Additionally, the VM migration time also decreased, indicating that the migration scheduling became better and more efficient. These results show that NSGA-III is effective in finding the best solutions for VM migration while keeping downtime low and preventing host overload.

4.2. Evaluation Model Prediction Error Analysis

In this section, the MAPE is used to determine, on average, how far our predictions are from the actual values. By expressing errors as percentages, MAPE offers a clear view of our model, which is accurate in estimating VM downtime and migration duration, which is crucial for effective migration planning and resource management. This analysis also highlights specific areas where our prediction approach can be improved. Figure 7 presents the results of the MAPE downtime, migration time, and SLA compliance evaluation.

In Figure 7, the model testing results with values under 10% for downtime and migration time indicates that the model’s average prediction error was low, which makes these predictions trustworthy for planning and scheduling migrations. Additionally, the MAPE for SLA compliance was just 3.5%, which shows that the model performed well in predicting whether downtime limits would be met, which is essential for maintaining service quality during migration. These MAPE results suggest that the hybrid MDP, RF, and NSGA-III framework yields accurate and reliable estimates, which can help decision-makers manage data center resources effectively. Moreover, the MAPE for SLA compliance, with a value of only 3.5%, indicates that the model’s ability to predict compliance with downtime thresholds is very high, which is essential for preserving service quality during migration. These MAPE results suggest that the developed hybrid MDP-RF and NSGA-III framework provide accurate and consistent estimates, thereby assisting strategic decision-makers in managing data center resources.

SLA Compliance and Energy Efficiency of VM Migration

This section outlines the results for the NSGA-III parameters within the hybrid machine learning framework. This study aimed to determine the combination of SLA threshold limits, the number of VM candidates, and the minimum memory limits that can achieve an optimal balance between SLA compliance and energy migration efficiency. Figure 8 shows the best configurations in terms of SLA compliance and energy efficiency.

Figure 8 shows the results of the autotuning process of the hyperparameter framework for hybrid VM migration. The x-axis displays the percentage of SLA compliance, and the y-axis depicts the percentage of energy efficiency achieved in VM migration. Each bubble represents a distinct parameter combination; its area is proportional to the number of VMs successfully migrated, and its color indicates the tested SLA threshold (30%, 35%, 40%, 45%, 50%, or 60%). A legend in the lower-right corner clarifies that bubble size is proportional to the number of VMs migrated, and bubble color corresponds to the SLA threshold. A red star marks the single best configuration, achieving 93% SLA compliance and 90.8% energy efficiency; a thin dashed red line now connects that star back to the cluster of bubbles to highlight this optimal trade-off. This visualization makes clear how different SLA thresholds affect both compliance and energy savings in VM migration.

4.3. Comparison of Model Machine Learning VM Placement

In this section, to highlight the research on VM placement alongside other studies that used the same Bitbrains cloud dataset, Table 3 shows the comparison results, which demonstrate that the combination of MDP, the RF method, and NSGA-III performed better than traditional machine learning algorithms in terms of prediction accuracy, F1 score, MAPE, SLA compliance, and energy efficiency.

Table 3 outlines the research conducted by [37] with PSO Ensemble and the research conducted by [38] using the Hull-White method and GA, which only report very low MAPE values of 0.545% and 3.70%, respectively, without including other measurement metrics such as SLA compliance and energy efficiency. In 2020, a turning point occurred when the research conducted by [39] proposed A3C and R2N2 in the edge-cloud environment. Although they did not include MAPE or accuracy, this study successfully increased SLA compliance by 31.9% and saved 14.4% energy. The 14.4% figure indicates that after implementing the migration strategy with the A3C and R2N2 approach, the total energy consumption in the test scenario decreased by 14.4% compared to the conventional approach that did not use load prediction or other optimizations. Meanwhile, the research by [40] added to the load prediction model using Multivariate Bi-LSTM, which only showed the error measurements RMSE/MAE without using percentages.

Entering the year 2023-2024, the focus shifted to energy savings as the main metric, as demonstrated by [41] through Multi-Objective RL (VMRL), which recorded an energy savings increase of up to 17%. This means that the average power consumption in their test cluster decreased by 17% compared to previous research without RL optimization. In the study by [42] using the SA-IWDCA algorithm, they achieved a 25% decrease in energy use compared to the older WCA or IWD algorithms; this shows that their methods for placing and moving resources effectively cut power usage by a quarter. Furthermore, this study proposes a hybrid method combining MDP, RF, and NSGA-III, which successfully achieved an overall energy efficiency level of 90.8%. With a ratio of 90.8%, more than ninety percent of the energy difference before and after migration was saved, showing that the hybrid strategy can greatly lower power use while keeping SLA compliance at 93% and prediction accuracy at 98.77%. Thus, VM migration and placement research now not only prioritizes prediction accuracy but also emphasizes service quality and substantial energy savings within a single integrated framework.

4.4. Experimental Setup

The GWA-Bitbrains dataset [25,26], which contains traces of real workloads from business-critical VMs, was used in the live VM migration scenario. The comprehensive testing environment used in this study comprised hardware based on an AMD Ryzen 5 6600 H processor with 16 GB of RAM. The virtualization platform used Proxmox VE 7.4 and a 1 Gbps interconnection network. To ensure fair and meaningful comparisons with the existing approaches, a methodology was implemented that included applying the baseline method to the same hardware configuration, using identical workload patterns in all tests to maintain consistency, and measuring key metrics (downtime and total migration time). The Proxmox VE 7.4 environment was set up on a single physical server, and virtual machines (VMs) and host machines (HMs) were simulated within this environment. The migration scenario was modeled using virtualized networks, where virtual bridges and VLANs were used to simulate network communication. Additionally, NFS shared storage ensured that all the host machines had access to the necessary data.

Figure 9 illustrates the placement and live migration evaluation setup on the VM workload dataset. In this scenario, there are five host machines (HMs), which are denoted HM1, HM2, HM3, HM4, and H5; blue indicates that the corresponding host machine has an excessive load. HM1 consists of three VMs, namely, VM7 in gray (normal VM conditions) and VM6 and VM8 in red (VM migration conditions); HM3 consists of two VMs, namely, VM1 in gray (normal VM conditions) and VM3 in red (VM migration conditions); and HM5 consists of two VMs, namely, VM2 in gray (normal VM conditions) and VM5 in red (VM migration conditions). In this scenario, the model maps which VMs will be migrated according to a priority determined by the developed machine learning predictions. Then, the machine learning results are mapped to follow the VM process on the target server during the real-time virtual machine migration procedure, which is performed on the basis of the machine learning results and the destination host machine.

Table 4 lists the servers used in the live migration implementation test. In this study, Proxmox was used because it is open source; thus, a license is not needed for these Proxmox servers. In this study, the CPU, memory, storage, and IP in the implementation of live migration were redescribed to clarify each built server’s capacity and IP address.

Table 4 shows the specific hardware setup for each Proxmox server and VM, including the CPU, memory, storage, and network allocation, which were used for experiments in the live migration scenario of this research.

4.5. Results of the Implementation of Live Migration

This section describes our attempt to implement the LVM strategy in the created data center. This study established several LVM testing scenarios on the basis of the developed machine learning scenarios to implement the LVM strategy effectively. Figure 10 shows the conditions under which the host machine experienced overloading. On average, under these conditions, the data center had a CPU workload of 34%, a memory usage of 61%, and a storage usage of 50%. Host machine 1 had a CPU workload of 3% and a memory usage of 62%; host machine 2 had a CPU workload of 2% and a memory usage of 29%; and host machine 3 had an excessive workload, with CPU values of 62% and 93%. Under these conditions, the three host machines had to be balanced by using the LVM. The first step was to identify the VMs with higher workloads. Second, the storage capacity, memory, and CPU of host machines 1 and 2 were rechecked. Insufficient space caused live migration to fail by causing the total live migration time to be too long. Figure 10 shows the conditions of the data center, where host machine 3 had an excessive workload.

Table 5 shows the live migration results that were obtained with six virtual machines and without applying the machine learning strategy. Several migration attempts failed, as indicated in the status column. This study calculated the downtime as the duration of the virtual machine’s offline state on the source host during its operation on the destination host. The migration time is the duration from the start to the end of the data transfer process, excluding the downtime. These results highlight the challenges faced without machine learning support.

Table 6 shows the results of the VM migration obtained using the proposed machine learning strategy with six VMs. All the migration attempts were successful, as indicated in the status column, with downtimes that ranged from 32 ms to 84 ms and total migration times between 1.23 and 4.02 min. These results demonstrate the effectiveness of the machine learning-based approach, which can increase the success rate of migrations and minimize service disruptions during the VM migration process. This study calculated the downtime as the period during which the virtual machine remained inactive on the source host machine while it underwent migration on the destination host machine. The migration time was measured from the start to the completion of data transfer, excluding the downtime. Compared with the results in Table 5, the high success rate in Table 6 highlights how the machine learning strategy helps make migration easier and improves decisions about where to place VMs.

The downtime was measured on the basis of when the VM was unavailable to the application during the migration process. The total migration time was computed by calculating the time taken from the start of migration until the VM became fully operational at the destination host.

The data are shown in Figure 11; three server nodes (HM1, HM2, and HM3) were online. The overall CPU usage was 1% of the total resources of the six available CPUs. With respect to memory, 3.86 GB of 11.29 GB was used, which was approximately 34% of the total capacity. The amount of storage used was 53.41 of 286.45 GB, or approximately 19% of the total storage. The three nodes had sequential IP addresses that ranged from 192.168.1.228 to 192.168.1.230, each exhibiting a CPU usage of 1% and a memory usage of 34%. The uptime of each node was approximately 2 min and 12 s.

4.6. Discussion of Hybrid Machine Learning VM Placement

This section explains how we created a hybrid machine learning system using the MDP algorithm, the RF algorithm, and NSGA-III. This system uses the MDP to decide actions on the basis of workload and VM migration priorities, the RF model to identify which VMs need to be moved, and NSGA-III to optimize multiple objectives, such as reducing downtime, SLA requirements, and saving energy during VM migration. With the developed framework, each migration decision is made adaptively to ensure that the VMs being moved are in the most critical condition in terms of load and risk of SLA violation.

To evaluate the performance of the strategy, this study conducted live migration tests on the Proxmox VE cluster and compared scenarios that used machine learning with those that did not. The results of the tests show that this hybrid framework improves the prediction accuracy and energy efficiency and significantly reduces the load on the average data center, as measured by the CPU, memory, and I/O consumption during the migration process. Additionally, it can accelerate migration compared with the non-machine learning optimization approach. This research emphasizes that the benefits of applying machine learning models in VM placement and VM migration contribute to reducing the workload and improving the energy efficiency.

5. Conclusions

This study developed a hybrid MDP, RF, and NSGA-III framework that successfully integrates adaptive decision-making, VM load classification, and multi-objective optimization, with objectives such as minimizing downtime, achieving SLA compliance, and maximizing energy efficiency in VM placement and live VM migration. In this study, the GWA-Bitbrains dataset was used, with the developed model achieving a classification accuracy of 98.77%, a MAPE of 7.69%, and an energy efficiency of 90.80%, thus addressing the challenges of previous studies that focused only on data prediction. Experiments on the Proxmox VE cluster also revealed that the predictions for migration usually involved less than 3% of the VMs. The average resource use in the data center decreased substantially compared to that for VM placement, and migration was performed without a plan. However, several limitations need to be considered. First, the MAPE of 7.69% can still be reduced by integrating time series models and multi-objective functions. Second, we need to reduce the computational complexity of NSGA-III through surrogate-assisted optimization techniques. Third, we should implement cross-platform validation using platforms such as PlanetLab and incorporate quality of service (QoS) metrics, additional metrics that influence VM placement and migration, and reinforcement learning models.

Author Contributions

Conceptualization, T.H., K.R. and R.H.; methodology, T.H., K.R. and R.H.; software, T.H., R.H. and K.R.; validation, T.H., K.R. and R.H.; formal analysis, T.H., K.R. and R.H.; investigation T.H., K.R. and R.H.; resources, T.H., K.R. and R.H.; data curation, T.H., R.H. and K.R.; writing—original draft preparation, T.H.; writing—review and editing, T.H., R.H. and K.R.; visualization, T.H., K.R. and R.H.; supervision, K.R.; project administration, T.H.; funding acquisition, T.H., K.R. and R.H. All authors have read and agreed to the published version of the manuscript.

Funding

This publication was supported by Universitas Indonesia through the Hibah Publikasi Terindeks Internasional (PUTI) Kolaborasi Internasional Scheme under Contract PKS-273/UN2.RST/HKP.05.00/2025.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

This study used Bitbrains cloud datasets, which are available online at https://www.kaggle.com/datasets/gauravdhamane/gwa-bitbrains (accessed on 4 March 2025) [25,26].

Acknowledgments

The Ph.D. Scholarship of Taufik Hidayat was supported by Beasiswa Pendidikan Indonesia (Indonesia Education Scholarship), Pusat Pelayanan Pembiayaan dan Asesmen Pendidikan Tinggi (Center for Higher Education Funding and Assessment) and the Indonesia Endowment Funds for Education (LPDP). The Grid Workloads Archive provided the dataset used in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Haris, R.M.; Barhamgi, M.; Nhlabatsi, A.; Khan, K.M. Optimizing pre-copy live virtual machine migration in cloud computing using machine learning-based prediction model. Computing 2024, 106, 3031–3062. [Google Scholar] [CrossRef]
Haris, R.M.; Khan, K.M.; Nhlabatsi, A.; Barhamgi, M. A machine learning-based optimization approach for pre-copy live virtual machine migration. Clust. Comput. 2024, 27, 1293–1312. [Google Scholar] [CrossRef]
Motaki, S.E.; Yahyaouy, A.; Gualous, H. A prediction-based model for virtual machine live migration monitoring in a cloud datacenter. Computing 2021, 103, 2711–2735. [Google Scholar] [CrossRef]
Mangalampalli, A.; Kumar, A. WBATimeNet: A deep neural network approach for VM Live Migration in the cloud. Future Gener. Comput. Syst. 2022, 135, 438–449. [Google Scholar] [CrossRef]
Haris, R.M.; Khan, K.M.; Nhlabatsi, A. Live migration of virtual machine memory content in networked systems. Comput. Netw. 2022, 209, 108898. [Google Scholar] [CrossRef]
Khodaverdian, Z.; Sadr, H.; Edalatpanah, S.A. A shallow deep neural network for selection of migration candidate virtual machines to reduce energy consumption. In Proceedings of the 2021 7th International Conference on Web Research (ICWR), Tehran, Iran, 19–20 May 2021; pp. 191–196. [Google Scholar]
Sadr, H.; Salari, A.; Ashoobi, M.T.; Nazari, M. Cardiovascular disease diagnosis: A holistic approach using the integration of machine learning and deep learning models. Eur. J. Med. Res. 2024, 29, 455. [Google Scholar] [CrossRef]
Zhao, H.; Feng, N.; Li, J.; Zhang, G.; Wang, J.; Wang, Q.; Wan, B. VM performance-aware virtual machine migration method based on ant colony optimization in cloud environment. J. Parallel Distrib. Comput. 2023, 176, 17–27. [Google Scholar] [CrossRef]
Ishiguro, K.; Yasuno, N.; Aublin, P.L.; Kono, K. Revisiting VM-Agnostic KVM vCPU Scheduler for Mitigating Excessive vCPU Spinning. IEEE Trans. Parallel Distrib. Syst. 2023, 34, 2615–2628. [Google Scholar] [CrossRef]
Biswas, M.I.; Parr, G.P.; McClean, S.I.; Morrow, P.J.; Scotney, B.W. A Practical Evaluation in Openstack Live Migration of VMs Using 10Gb/s Interfaces. In Proceedings of the 2016 IEEE Symposium on Service-Oriented System Engineering (SOSE), Oxford, UK, 29 March–2 April 2016; pp. 346–351. [Google Scholar]
Wei, P.; Zeng, Y.; Yan, B.; Zhou, J.; Nikougoftar, E. VMP-A3C: Virtual machines placement in cloud computing based on asynchronous advantage actor-critic algorithm. J. King Saud Univ.-Comput. Inf. Sci. 2023, 35, 101549. [Google Scholar] [CrossRef]
Annadanam, C.S.; Chapram, S.; Ramesh, T. Intermediate node selection for Scatter-Gather VM migration in cloud data center. Eng. Sci. Technol. Int. J. 2020, 23, 989–997. [Google Scholar] [CrossRef]
Singh, S.; Singh, D. A Bio-inspired VM Migration using Re-initialization and Decomposition Based-Whale Optimization. ICT Express 2023, 9, 92–99. [Google Scholar] [CrossRef]
Zhang, B.; Wang, X.; Wang, H. Virtual machine placement strategy using cluster-based genetic algorithm. Neurocomputing 2021, 428, 310–316. [Google Scholar] [CrossRef]
Yazidi, A.; Ung, F.; Haugerud, H.; Begnum, K. Effective live migration of virtual machines using partitioning and affinity aware-scheduling. Comput. Electr. Eng. 2018, 69, 240–255. [Google Scholar] [CrossRef]
Srinivas, B.V.; Mandal, I.; Keshavarao, S. Virtual Machine Migration-Based Intrusion Detection System in Cloud Environment Using Deep Recurrent Neural Network. Cybern. Syst. 2022, 55, 450–470. [Google Scholar] [CrossRef]
Kumar, Y.; Kaul, S.; Hu, Y.-C. Machine learning for energy-resource allocation, workflow scheduling and live migration in cloud computing: State-of-the-art survey. Sustain. Comput. Inform. Syst. 2022, 36, 100780. [Google Scholar] [CrossRef]
Shalu; Singh, D. Artificial neural network-based virtual machine allocation in cloud computing. J. Discret. Math. Sci. Cryptogr. 2021, 24, 1739–1750. [Google Scholar] [CrossRef]
Ma, X.; He, W.; Gao, Y.; Ahmed, G. Virtual Machine Migration Strategy Based on Markov Decision and Greedy Algorithm in Edge Computing Environment. Wirel. Commun. Mob. Comput. 2023, 2023, 6441791. [Google Scholar] [CrossRef]
Guo, J.; Shi, Y.; Chen, Z.; Yu, T.; Shirinzadeh, B.; Zhao, P. Improved SP-MCTS-Based Scheduling for Multi-Constraint Hybrid Flow Shop. Appl. Sci. 2020, 10, 6220. [Google Scholar] [CrossRef]
Han, Z.; Tan, H.; Wang, R.; Chen, G.; Li, Y.; Lau, F.C.M. Energy-Efficient Dynamic Virtual Machine Management in Data Centers. IEEE/ACM Trans. Netw. 2019, 27, 344–360. [Google Scholar] [CrossRef]
Gopu, A.; Thirugnanasambandam, K.; Rajakumar, R.; AlGhamdi, A.S.; Alshamrani, S.S.; Maharajan, K.; Rashid, M. Energy-efficient virtual machine placement in distributed cloud using NSGA-III algorithm. J. Cloud Comput. 2023, 12, 124. [Google Scholar] [CrossRef]
Mohamed, M.F.; Dahshan, M.; Li, K.; Salah, A.; Khosravi, M.R. Virtual Machine Replica Placement Using a Multiobjective Genetic Algorithm. Int. J. Intell. Syst. 2023, 2023, 8378850. [Google Scholar] [CrossRef]
Harwahyu, R.; Erasmus Ndolu, F.H.; Overbeek, M.V. Three layer hybrid learning to improve intrusion detection system performance. Int. J. Electr. Comput. Eng. (IJECE) 2024, 14, 1691–1699. [Google Scholar] [CrossRef]
Shen, S.; Beek, V.V.; Iosup, A. Statistical Characterization of Business-Critical Workloads Hosted in Cloud Datacenters. In Proceedings of the 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Shenzhen, China, 4–7 May 2015; pp. 465–474. [Google Scholar]
Iosup, A.; Li, H.; Jan, M.; Anoep, S.; Dumitrescu, C.; Wolters, L.; Epema, D.H.J. The Grid Workloads Archive. Future Gener. Comput. Syst. 2008, 24, 672–686. [Google Scholar] [CrossRef]
Hidayat, T.; Ramli, K.; Mardian, R.D.; Mahardiko, R. Towards Improving 5G Quality of Experience: Fuzzy as a Mathematical Model to Migrate Virtual Machine Server in The Defined Time Frame. J. Appl. Eng. Technol. Sci. (JAETS) 2023, 4, 711–721. [Google Scholar] [CrossRef]
Alyas, T.; Ghazal, T.M.; Alfurhood, B.S.; Ahmad, M.; Thawabeh, O.A.; Alissa, K.; Abbas, Q. Performance Framework for Virtual Machine Migration in Cloud Computing. Comput. Mater. Contin. 2022, 74, 6289–6305. [Google Scholar] [CrossRef]
Hummaida, A.R.; Paton, N.W.; Sakellariou, R. Scalable Virtual Machine Migration using Reinforcement Learning. J. Grid Comput. 2022, 20, 15. [Google Scholar] [CrossRef]
Kumar, J.; Singh, A.K.; Buyya, R. Ensemble learning based predictive framework for virtual machine resource request prediction. Neurocomputing 2020, 397, 20–30. [Google Scholar] [CrossRef]
Kim, B.; Han, J.; Jang, J.; Jung, J.; Heo, J.; Min, H.; Rhee, D.S. A Dynamic Checkpoint Interval Decision Algorithm for Live Migration-Based Drone-Recovery System. Drones 2023, 7, 286. [Google Scholar] [CrossRef]
Saberi, Z.A.; Sadr, H.; Yamaghani, M.R. An Intelligent Diagnosis System for Predicting Coronary Heart Disease. In Proceedings of the 2024 10th International Conference on Artificial Intelligence and Robotics (QICAR), Qazvin, Iran, 29 February 2024; pp. 131–137. [Google Scholar]
Mosa, A.; Paton, N.W. Optimizing virtual machine placement for energy and SLA in clouds using utility functions. J. Cloud Comput. 2016, 5, 17. [Google Scholar] [CrossRef]
Moocheet, N.; Jaumard, B.; Thibault, P.; Eleftheriadis, L. Minimum-energy virtual machine placement using embedded sensors and machine learning. Future Gener. Comput. Syst. 2024, 161, 85–94. [Google Scholar] [CrossRef]
Zagloel, T.Y.M.; Harwahyu, R.; Maknun, I.J.; Kusrini, E.; Whulanza, Y. Developing Models and Tools for Exploring the Synergies between Energy Transition and the Digital Economy. Int. J. Technol. 2023, 14, 291–319. [Google Scholar] [CrossRef]
Masdari, M.; Khezri, H. Efficient VM migrations using forecasting techniques in cloud computing: A comprehensive review. Clust. Comput. 2020, 23, 2629–2658. [Google Scholar] [CrossRef]
Leka, H.L.; Fengli, Z.; Kenea, A.T.; Hundera, N.W.; Tohye, T.G.; Tegene, A.T. PSO-Based Ensemble Meta-Learning Approach for Cloud Virtual Machine Resource Usage Prediction. Symmetry 2023, 15, 613. [Google Scholar] [CrossRef]
St-Onge, C.; Benmakrelouf, S.; Kara, N.; Tout, H.; Edstrom, C.; Rabipour, R. Generic SDE and GA-based workload modeling for cloud systems. J. Cloud Comput. 2021, 10, 6. [Google Scholar] [CrossRef]
Tuli, S.; Ilager, S.; Ramamohanarao, K.; Buyya, R. Dynamic scheduling for stochastic edge-cloud computing environments using a3c learning and residual recurrent neural networks. IEEE Trans. Mob. Comput. 2020, 21, 940–954. [Google Scholar] [CrossRef]
Dang-Quang, N.-M.; Yoo, M. An Efficient Multivariate Autoscaling Framework Using Bi-LSTM for Cloud Computing. Appl. Sci. 2022, 12, 3523. [Google Scholar] [CrossRef]
Bhatt, C.; Singhal, S. Multi-Objective Reinforcement Learning for Virtual Machines Placement in Cloud Computing. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 1051–1058. [Google Scholar] [CrossRef]
Chayan, B.; Sunita, S. Hybrid Metaheuristic Technique for Optimization of Virtual Machine Placement in Cloud. Int. J. Fuzzy Log. Intell. Syst. 2023, 23, 353–364. [Google Scholar] [CrossRef]

Figure 1. Pre-copy migration process.

Figure 2. Flow model of the hybrid machine learning architecture for VM placement.

Figure 3. Dataset and feature engineering VM placement.

Figure 4. MDP policy migration.

Figure 5. Confusion matrix of RF.

Figure 6. NSGA-III evolution overload and migration time: (a) evolution overload; (b) evolution migration time.

Figure 7. MAPE VM migration metrics.

Figure 8. SLA compliance and energy efficiency of VM migration.

Figure 9. Host overload and VM placement scenario.

Figure 10. Host machine overload condition.

Figure 11. Host machine condition after migration.

Table 1. Hyperparameters of the hybrid machine learning framework.

Component	Hyperparameter	Search Range	Optimal Value
MDP	gamma	0.8, 0.9, 0.99	0.9
MDP	theta	0.0001, 0.001, 0.01	0.001
RF	n_estimators	300, 400, 500	400
RF	max_depth	15, 20, None	20
RF	max_features	‘sqrt’, ‘log2’, None	‘sqrt’
RF	min_samples_split	2, 4, 6	4
RF	min_samples_leaf	1, 2, 4	2
RF	class_weight	(0:1, 1:10)	(0:1, 1:10)
RF	cv_folds	5 (cross-validation folds)	5
NSGA-III	population_size	50, 100, 150	100
NSGA-III	num_generations	50, 100, 150	100
NSGA-III	crossover_probability	0.6, 0.7, 0.8	0.7
NSGA-III	mutation_probability	0.2, 0.3, 0.4	0.3

Table 2. RF migration prediction.

Class	Precision	Recall	F1 Score
Non-Migration (0)	0.98	1.00	0.99
Migration (1)	1.00	0.92	0.96

Table 3. Comparative performance of VM migration and placement models.

Author	Model Machine Learning	Accuracy (%)	MAPE (%)	SLA Compliance (%)	Energy Efficiency/Savings (%)
[37]	PSO Ensemble	-	0.545	-	-
[38]	Hull-White and GA	-	3.70	-	-
[39]	A3C and R2N2	-	-	31.9	14.4
[40]	Multivariate Bi-LSTM	-	-	-	-
[41]	Multi-Objective RL (VMRL)	-	-	-	17
[42]	SA-IWDCA	-	-	-	25
Proposed Model	Hybrid (MDP, RF, and NSGA-III)	98.77	7.69	93	90.8

Table 4. Scenario testing: precopy live migration of the VM.

Server Name	CPU (MHz)	Memory (MiB)	Storage (GB)	IP Address
Server Proxmox HM 1	2	4048	150	192.168.1.228/24
Server Proxmox HM 2	2	4048	100	192.168.1.229/24
Server Proxmox HM 3	2	4048	100	192.168.1.230/24
Server Librenms	2	4048	40	192.168.1.69/24
VM1	1	1048	10	192.168.1.10/24
VM2	1	2048	20	192.168.1.11/24
VM3	1	3048	30	192.168.1.12/24
VM4	1	4048	40	192.168.1.13/24
VM5	1	3048	30	192.168.1.14/24
VM6	1	1048	10	192.168.1.15/24

Table 5. Migration results without the machine learning strategy.

VM Name	HM Source	HM Destination	Total Migration Time (Minutes)	Downtime (ms)	Status
VM5	HM2	HM3	1.33	163	Success
VM5	HM3	HM1	2.48	0	Fail
VM4	HM1	HM3	2.01	0	Fail
VM3	HM3	HM1	6.17	0	Fail
VM5	HM3	HM2	2.14	103	Success
VM5	HM3	HM2	1.24	93	Success
VM4	HM1	HM3	7.25	57	Success

Table 6. Live migration results with the machine learning strategy.

VM Name	HM Source	HM Destination	Total Migration Time (Minutes)	Downtime (ms)	Status
VM2	HM2	HM3	2.30	42	Success
VM5	HM2	HM3	1.23	64	Success
VM2	HM3	HM2	2.43	84	Success
VM3	HM1	HM3	4.02	39	Success
VM3	HM3	HM1	3.59	61	Success
VM4	HM3	HM1	3.38	32	Success

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hidayat, T.; Ramli, K.; Harwahyu, R. Strategy for Precopy Live Migration and VM Placement in Data Centers Based on Hybrid Machine Learning. Informatics 2025, 12, 71. https://doi.org/10.3390/informatics12030071

AMA Style

Hidayat T, Ramli K, Harwahyu R. Strategy for Precopy Live Migration and VM Placement in Data Centers Based on Hybrid Machine Learning. Informatics. 2025; 12(3):71. https://doi.org/10.3390/informatics12030071

Chicago/Turabian Style

Hidayat, Taufik, Kalamullah Ramli, and Ruki Harwahyu. 2025. "Strategy for Precopy Live Migration and VM Placement in Data Centers Based on Hybrid Machine Learning" Informatics 12, no. 3: 71. https://doi.org/10.3390/informatics12030071

APA Style

Hidayat, T., Ramli, K., & Harwahyu, R. (2025). Strategy for Precopy Live Migration and VM Placement in Data Centers Based on Hybrid Machine Learning. Informatics, 12(3), 71. https://doi.org/10.3390/informatics12030071

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Strategy for Precopy Live Migration and VM Placement in Data Centers Based on Hybrid Machine Learning

Abstract

1. Introduction

2. Background and Literature Review

2.1. Live VM Migration Model

2.2. Previous Research on LVM and Machine Learning

3. Proposed Methodology

3.1. Hybrid Machine Learning Architecture for VM Placement Strategy

3.2. Dataset Description and Feature Engineering

3.3. Learning Modules and Optimization Strategy

3.4. Hybrid Machine Learning Evaluation Model

4. Results and Discussion

4.1. Results of Hybrid Machine Learning

4.1.1. Migration Policy Analysis

4.1.2. Migration Feasibility Prediction Results

4.1.3. NSGA-III Evolution Metric

4.2. Evaluation Model Prediction Error Analysis

SLA Compliance and Energy Efficiency of VM Migration

4.3. Comparison of Model Machine Learning VM Placement

4.4. Experimental Setup

4.5. Results of the Implementation of Live Migration

4.6. Discussion of Hybrid Machine Learning VM Placement

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI