Machine-Learning-Based Approach for Virtual Machine Allocation and Migration

: Due to its ability to supply reliable, robust and scalable computational power, cloud computing is becoming increasingly popular in industry, government, and academia. High-speed networks connect both virtual and real machines in cloud computing data centres. The system’s dynamic provisioning environment depends on the requirements of end-user computer resources. Hence, the operational costs of a particular data center are relatively high. To meet service level agreements (SLAs), it is essential to assign an appropriate maximum number of resources. Virtualization is a fundamental technology used in cloud computing. It assists cloud providers to manage data centre resources effectively, and, hence, improves resource usage by creating several virtualmachine (VM) instances. Furthermore, VMs can be dynamically integrated into a few physical nodes based on current resource requirements using live migration, while meeting SLAs. As a result, unoptimised and inefﬁcient VM consolidation can reduce performance when an application is exposed to varying workloads. This paper introduces a new machine-learning-based approach for dynamically integrating VMs based on adaptive predictions of usage thresholds to achieve acceptable service level agreement (SLAs) standards. Dynamic data was generated during runtime to validate the efﬁciency of the proposed technique compared with other machine learning algorithms.


Introduction
By enhancing isolation between application resource usage, allocation, and management, virtualization offers the prospect of improved efficiency in cloud data centres [1]. One of the most intriguing aspects of virtualization is live migration. As a result, virtualization becomes more appealing. Operating virtual machines (VMs) can be relocated effortlessly between physical hosts via live migration. Service providers offer hosting of available applications to ensure quality of service, which is defined according to a service level agreement (SLA), and, in most circumstances, expressed in terms of application availability.
Cloud computing has become a requirement for today's IT businesses. The exchange of data and information over the Internet is widespread. The applications generate a large amount of data that must be kept, and which is constantly transmitted over the Internet. To address the issue of storage space, numerous kinds of big data are gathered and saved on a cloud server; as a result, anyone can access the required data from any place via the Internet. Amazon, IBM, Google and Microsoft have established themselves as major cloud service providers. According to one study, the average resource utilization of a cloud data centre is about thirty percent. However, idle energy consumption can reach seventy percent of the total data centre power consumption. As a result, much energy is lost unnecessarily [1]. Furthermore, the server cannot be left with a light workload [2]. As a result, numerous approaches have been proposed for identifying uncluttered and overloaded machines to save energy. These approaches include migration, virtualization, job mapping and consolidation. The virtualization technology concept is used in many studies. The power management problem has been overcome using the notion of virtualization, which enables overburdened servers to spread their workload to multiple virtual machines without impacting service performance [3]. As a result, efficient VM usage has played an essential role in providing cost-effective services to clients [4]. Cloud computing is becoming increasingly valuable for storing enormous amounts of data in today's world. Virtual machines (VMs) provide virtual application resources to a largenumber of users at once. With an application window on a desktop, VM allows any organization operating system to function as if it were a completely independent computer. How VM works is determined by the historical data utilized by the server. Flexibility and low cost are the primary advantages of VM.Virtualization technology assists in balancing the load on the server by deploying applications to machines that are less or moderately loaded. This procedure entails offsetting several virtual computers to relieve the workload on the actual device.
The key elements which need to be handled intelligently, and in a balanced way, are service level agreements (SLA), energy consumption, hosting and the number of migrations. Any imbalance between the given metrics can degrade the performance and service quality of the data centre's services. This means that underutilized and over-utilized equipment should be easily distinguishable [5]. In this regard, investigating real and virtual machines is a time-consuming operation. As a result, machine learning architecture is a viable option. Artificial intelligence and swarm intelligence have been used in combination to handle the critical parameters which affect service quality [6,7]. The authors of the latter studies provided a hybrid optimization strategy-based intelligent VM allocation and migration framework, which was inspired by similar research [8][9][10]. ABC (Ant Bee Colony) and CS (Cuckoo Search), two well-known bio-inspired algorithms, were also integrated to find the best virtual machine for load balancing, while meeting energy needs and addressing SLA limitations. An overview of virtual machine allocation and migration techniques is provided in Figure 1.
Electronics 2022, 11, x FOR PEER REVIEW 2 of 11 percent of the total data centre power consumption. As a result, much energy is lost unnecessarily [1]. Furthermore, the server cannot be left with a light workload [2]. As a result, numerous approaches have been proposed for identifying uncluttered and overloaded machines to save energy. These approaches include migration, virtualization, job mapping and consolidation. The virtualization technology concept is used in many studies. The power management problem has been overcome using the notion of virtualization, which enables overburdened servers to spread their workload to multiple virtual machines without impacting service performance [3]. As a result, efficient VM usage has played an essential role in providing cost-effective services to clients [4]. Cloud computing is becoming increasingly valuable for storing enormous amounts of data in today's world. Virtual machines (VMs) provide virtual application resources to a largenumber of users at once. With an application window on a desktop, VM allows any organization operating system to function as if it were a completely independent computer. How VM works is determined by the historical data utilized by the server. Flexibility and low cost are the primary advantages of VM.Virtualization technology assists in balancing the load on the server by deploying applications to machines that are less or moderately loaded. This procedure entails offsetting several virtual computers to relieve the workload on the actual device. The key elements which need to be handled intelligently,and in a balanced way,are service level agreements (SLA), energy consumption, hosting and the number of migrations. Any imbalance between the given metrics can degrade the performance and service quality of the data centre's services. This means that underutilized and over-utilized equipment should be easily distinguishable [5]. In this regard, investigating real and virtual machines is a time-consuming operation. As a result, machine learning architecture is a viable option. Artificial intelligence and swarm intelligence have been used in combination to handle the critical parameters which affect service quality [6,7]. The authors of the latter studies provided a hybrid optimization strategy-based intelligent VM allocation and migration framework, which was inspired by similar research [8][9][10]. ABC (Ant Bee Colony) and CS (Cuckoo Search), two well-known bio-inspired algorithms, were also integrated to find the best virtual machine for load balancing, while meeting energy needs and addressing SLA limitations. An overview of virtual machine allocation and migration techniques is provided in Figure 1. The main contributions of this research which differentiate it from existing research are: 1. Here, an optimization methodology is designed which assists in the development of a green computing environment which results in an eco-friendly use of resources. 2. Performance evaluation is undertaken in terms of different performance parameters, such as used energy consumption, the total number of migrations and SLA violations. Here, these parameters have better values. The main contributions of this research which differentiate it from existing research are: 1.
Here, an optimization methodology is designed which assists in the development of a green computing environment which results in an eco-friendly use of resources.

2.
Performance evaluation is undertaken in terms of different performance parameters, such as used energy consumption, the total number of migrations and SLA violations.
Here, these parameters have better values.

3.
Anovel machine-learning-based approach is introduced to dynamically integrate VMs based on adaptive predictions of usage thresholds to achieve acceptable service level agreements (SLAs).

4.
Dynamic data is generated during runtime to validate the efficiency of the proposed technique compared with other machine learning algorithms.

5.
The proposed approach is also useful in cases of dynamic workload, i.e., changing workloads over time. The dynamic data help to create a dynamic architecture as the workload continuously changes in a cloud environment.Thus, the most efficient virtual machine can be selected depending on the dynamics of the load and energy.
A novel hybrid technique, comprising a combination of a swarm intelligence algorithm and a machine learning classifier, is proposed for the allocation and migration of virtual machines. In the literature, the focus is either on swarm intelligence algorithms or machine learning classifiers for virtual machine allocation and migration, but the proposed hybrid technique provides better results compared to existing techniques described in the literature.
This article is organized as follows: Section 2 discusses related work, including existing studies addressing virtual machine allocation and migration. Section 3 describes the study methodology. Section 4 details the results and provides related discussion. The paper's conclusions and suggestions for future research are presented in Section 5.

Related Work
It has been found that switching between servers and computing in cloud data centres consumes a significant amount of energy. Although VM migration consumes energy, it improves the data centre's overall execution efficiency. The allocation of machines is the most critical aspect of virtualization, followed by migration. Researchers have used various methodologies to determine the optimum VM to enable minimal migration to save energy throughout the virtualization process. Several applications and factors affect the VM migration process. Some of the problems that have been addressed concerning VM migration include economical VM migration, the heterogeneous nature of cloud resources, system workload, and memory. Another significant challenge in VM migration is security risk [11]. Yakhchi used the Cuckoo Search (CS) optimization technique to detect overused hosts, resulting in more efficient resource management. The minimum migration time policy was then implemented to transfer machines from being overused to underused or moderately used [12]. A simulation analysis revealed that, with a low number of migrations, the average value of energy usage was 19.95 kWh. Dhanoa and Khurmi presented a hybrid approach to accomplish power-efficient VM migration. This project included SLA violations, energy usage, virtual machine migrations and a genetic algorithm that assisted in improving the response time. The primary intent of the study was to optimize live migrations for reduced power consumption in various scenarios.
In contrast to the basic algorithm, simulation investigation revealed that the hybrid VM allocation saved 72 percent of power when delivering quality service [13,14]. Jiang [15] suggested a data ABC (Ant Bee Colony) energy model for making VM consolidation decisions based on global optimization. This model depends on the usage rates of the GPU and the CPU. The study considered two policies for live VMs: the first was called VM selection, and the second VM allocation. Compared to existing models, the ABC model saved 25% to 35% energy [15]. Perumal presented a fuzzy hybrid bio-inspired meta-heuristic strategy to solve the problem of VM placement. The study focused on power usage, resource wastage, and virtual machine placement in the cloud. The experimental reports revealed that the hybrid algorithm outperformed the ACO, Firefly, MMAS, and FFD algorithms [16]. Barlaskar et al. presented Enhanced Cuckoo Search (ECS), a VM placement algorithm inspired by the Cuckoo Search (CS) algorithm, which helped to address the energy consumption problem in cloud data centres. Planet Lab was used to track the workload of the ECS algorithm. The results of comparing ECS to the Optimized Firefly Search (OFS) algorithm, the Ant Colony (AC) method and the Genetic Algorithm (GA) showed that the amount of power utilized by ECS was low, but SLA, as well as VM migration performance, was maintained [16,17].
Ruan [17] presented a performance-to-power ratio-based VM allocation and migration technique. This ratio was calculated using machine usage levels that were sampled. This data was utilized to ensure that the computers were operating power-efficiently without sacrificing performance. A comparison analysis revealed that the energy consumption decreased effectively, with a percentage of 69.31 using the VM selection and allocation framework [17]. This was affected by reducing the number of migrations and shutdowns, while maintaining minimum performance loss. The Cuckoo Search with Firefly (CS-FA) method was developed by Kumar et al. utilizing cloud computing for load balancing. It was used to determine a virtual machine's capacity and load. The CSFA algorithm's primary goal was to complete two tasks: first, to identify the optimal virtual machines (VMs) to allot the work, and second, to migrate overloaded VM's tasks to under-loaded VM's tasks. The CSFA algorithm was able to migrate two tasks when the numbers of tasks were equivalent to 40, whereas the present technique can migrate six tasks [18,19]. Karthikeyan [20] used hybrid optimization to overcome the energy restrictions. These researchers combined the Bat and Artificial Bee Colony (ABC) algorithms, and then used the Nave Bayes classifier to assign the VM to meet the energy requirements [20]. To achieve optimal resource use while decreasing energy consumption, Jangra combined an artificial neural network (ANN) and the Cuckoo Search algorithm. The authors sorted the virtual machines depending on the CPU usage, which was calculated using the modified best-fit decreasing (MBFD) method. When comparing the suggested CS-based method to the existing SI-technique-based approach, it was discovered that the proposed CS-based approach consumed 13.15 percent less energy [21]. Talwani and Singla [22] proposed an Enhanced Artificial Bee Colony (E-ABC) algorithm to distribute the load equally to multiple VMs. The E-ABC algorithm decreased the VM workload to save energy by migrating the task to an underloaded host from an overloaded VM. According to simulation results, the E-ABC algorithm fared better with respect to scalability and the occurrence of fewer VM migrations. Compared to the present proposal, the E-ABC approach saved 15 to 17 percent of the energy while reducing the number of migrations by 10% [23][24][25][26].
The proposed approach is different from existing approaches in terms of the optimization methodology used for the development of a green computing environment, the machine-learning-based approach to dynamically integrate VMs based on adaptive predictions of usage thresholds, and the dynamic data to be generated during runtime to validate the efficiency of the technique [27,28].
The advantages of existing approaches described in the literature are reduced energy consumption and optimized resource allocation during virtual machine allocation and migration; however, these parameters may be further improved by using swarm intelligence techniques with machine-learning classifiers. There are some disadvantages as well. In some of the existing studies, there is no verification of the overloading of physical machines. Some studies reported a borderline problem. Optimal virtual machine placement and migration still represent an NP hard problem [29,30].

Proposed Methodology
The concept of the deployment of virtual machines resolves problems associated with the use of physical machines. However, its application also has its problems. The traditional notion is that a virtual machine is created and powered on when the PM is overloaded. Buya (co-owner: Cloud Sim) saw this as a severe problem and developed the migration minimization (MM) method as a VM transfer policy and the MBFD (modified best-fit decreasing) method as an allocation policy. Certain drawbacks were identified in allocation policies, such as misallocation and unnecessary energy consumption, which contribute to the problem of global warming. This research proposes optimization methods that contribute to the development of green computing environments by increasing resource use and reducing energy consumption.
The primary areas of this proposal are allocation policy optimization and the mutual validation of migration systems to optimize QoS. Evaluation of the performance of the proposed approach involved consideration of the following parameters: • the consumption of energy; • the total number of migrations; • SLA violations.
The steps involved in undertaking the research are listed below.

Allocation Process
Definition 1. In the cloud environment, an active architecture is the allocation mechanism. An allocation A (H, At) where At is the total allocated tasks and H denotes the total number of hosts.

Problem Definition
For A (H, At), if sufficient resources are present, Hnext for "µ" load utilisation should be less than the load utilisation of the network; to handle the incoming tasks, then host selection in the A is feasible. If the satisfaction of the considered parameter, i.e., load utilisation, is not fulfilled, then the structure must search for a node which depends on the lemma.

Lemma 1.
Includes two elements, such as false and true, which are further represented by 0 and 1, respectively. These are shown in Table 1. If The usage of virtual machine resources is requested to evaluate the allocation process. The rules below are followed in the allocation procedure:

•
The sorting of the VMs should be undertaken based on the requirements of CPU utilization.

•
Check the availability of resources on the physical machine (PM).

•
If the physical machine meets the resource requirements, assign VMPM. • PM resources are reduced according to demand coverage. A significant problem faced by the modified best-fit decreasing algorithm is borderline issues, which result in virtual machine migration and energy consumption. Hence, avoiding false migration and reducing the migration number is necessary.

Transition between Physical Machine and Virtual Machine
Physical machines are utilized by the cloud framework to deal with the incoming workload. A physical medium that includes central processing units, process utilities and random-access memory is known as a physical machine. By considering the capabilities of physical machines, virtual machines are allocated to them for faster execution of parallel processing. A number of computing algorithms, such as MBFD, have been implemented for the allocation process. In the migration process, the virtual machines are migrated from one PM to another PM when observed to be unsuitable. Massive energy consumption occurs in the migration process; therefore, to reduce the energy consumption, it is also essential to decrease the migrations. The primary reason for migration is misallocation. Minimization of Migration (MM) is an effective algorithm for preventing migration. The service level agreement (SLA) is violated if the service provider fails to offer the promised service within the specified time interval, also known as SLA-V. It is also increased by over-exhausting the load, leading to considerable consumption of energy. Green-computing-based frameworks are also introduced, which help to reduce energy consumption.

Finding Over-Utilized, Normal-Utilized and Under-Utilized Hosts Using CPU
Threshold policy steps are as follows: Step 1: For each host in the host list, Step 2: Obtain CPU utilization Step 3: Set minimum and maximum threshold values for CPU utilization Step 4: If obtained CPU utilization<minimum threshold value, then Step 5: Host is under-utilized and all VMs of that host are migrated Step 6: Switch off that host Step 7: If obtained minimum threshold value<CPU utilization<maximum threshold value, then Step 8: Host is normal-utilized Step 9: If obtained CPU utilization>maximum threshold value, then Step 10: Host is over-utilized Step 11: Migrate some VMs from that host to new PMs

Selection of VMs for Migration from Over-Utilized Host Using Enhanced Artificial
Bee Colony Algorithm is as follows: Step 1: For every overloaded PM, Step 2: Do Step 3: Employ policy of virtual machine selection Step 4: Select the adequate virtual machine for migration using enhanced artificial Bee Colony approach Step 5: Addition of VM to migration list Step 6: Reallocate the migrated VM to new PM using MBFD algorithm.

Analysis of Post-Migration Process Using Support Vector Machine Algorithm
The training process is as follows: Step 1: Set input parameters: Bf, Ids. Bf, Pre-trained-structure (Pts), Label-set (Ls) € {Above Post Performance (APP), Below Post Performance (BPP)}. The Pts contain the data from previous learning for 1000 simulations for the migration process of SVM that is available in the set. Bf is the created feature during the execution of the Energy-Aware ABC algorithm. Ids contain the identity number of VMs in the array form.
Step 2: Initialize support vector machine algorithm with Bf, Ls.
Each allocation procedure follows the context awareness policy. With this policy, the system collects information about the environment and adapts accordingly. This policy assesses whether the unit's demand meets the fundamental standards or not. For instance, if the virtual machine requires 500 MB of RAM and the host has only 520 MB as available RAM, then the virtual machine is allocated to the host. However, in this case, the borderline issue is faced by the host, as it is left with only 20 MB of RAM. For smooth functioning, the real-time simulations are not dependent on the policy of content awareness. Run time entities must consume more resources than were requested at the allocation time. Hence, in this case, either wait until the required resource is unavailable and busy performing other tasks or migrate the virtual machines from their current host. This will lead to expenditure and network delay, which are unnecessary and incompatible with any procedure.
In real life scenarios, the proposed approach helps to improve utilization of resources, the load-balancing of processing nodes, the isolation of applications, fault tolerance in virtual machines, and to increase the portability of nodes and the efficiency of the physical server. A novel contribution of the proposal is the incorporation of swarm intelligence technique to address SLA violations and the number of migrations. Enhanced ABC is also proposed here to achieve optimal allocation of virtual machines.

Result and Discussion
The proposed methodology was compared with k-nearest neighbors (KNN) and decision tree classification algorithms concerning energy consumption, violations of SLA and the number of VM migrations. One of the most basic approaches for pattern classification is the k-nearest neighbors (KNN) method. This technique is based on proximity, to enable grouping of individual data points. When paired with past information, it has been used as a validated classification technology in many disciplines, producing significant results. Every unlabeled example in the training set is classified by the majority occurring label among its KNN. As a result, the distance utilized to locate nearest neighbors impacted its classification performance. Most of the KNN classifiers were not based on any statistical data regularities, and could be evaluated from a vast set of training examples with labels where prior knowledge is missing. Vector inputs were used to represent the Euclidean distances between examples utilized in many KNN classifiers to evaluate the similarities.
When a class was known in advance during the observation of the training sample, it was efficient to use a decision tree for the classification. Classes could be either hypothesisbased or user-provided in the learning sample. Let us assume a parent node and right and left child nodes which are, respectively, nodes of the parent node. Consider a training sample whose variable matrix includes the observations and a large number of variables and a class vector consisting of observations with integers of the class. The splitting rule is used to build a classification tree. It splits the learning sample into smaller sections. Support vector machines (SVMs) could discover the hyper-plane of the maximum margin and enable nonlinear classification. SVM is a supervised machine-learning-based algorithm that is used for classification. The following is a mathematical representation of the data points (or training set D),D = {(x1,y1), (x2,y2), (xn,yn)}.
Where the n-dimensional real vector is represented by xi; its values will be either 1 or −1, with reference to the class with which the particular point is associated.
During the training process, the classification function F(x) is calculated by the SVM, and the form of the function is F(x) = w × (x − b), where w and b are the weights and bias, respectively. To identify if the classifier classifies the training set into accurate classes, it is necessary to ensure that the output of the negative data point is always negative and the output of positive points is always represented in positive numbers.
A detailed comparison, in terms of the number of migrations, SLA violations and energy consumption, is shown in Tables

Improvement Analysis in Terms of Confusion Matrix
Improvement analysis was also performed in terms of the confusion matrix. A total of six confusion matrices were generated and compared with each other. All six matrices are shown in Table 5. In the case of KNN, ABC and CS, two cases in each were wrongly classified, and in the case of the decision tree, E-ABC and CS-FA, one case in each was wrongly classified, but using the proposed approach, all the cases were correctly classified.  Table 5. Improvement analysis in terms of the confusion matrix.

Conclusions and Future Scope
In this investigation, the performance of classification techniques was evaluated in terms of the false negative rate (FNR) and the true positive rate (TPR). The classifier output was considered as TPR if the VM migration was correctly classified into the class named 'migrated' due to high usage. Similarly, if the classifier output was considered an FNR, VM migration was classified as 'not migrate' from the current server.
KNN, ABC and CS could accurately identify VM migration in the dynamic data with 100 percent TPR and 81.8 percent FNR. The decision trees, E-ABC, CS-FA were able to accurately identify and predict VM migration in the dynamic data with 100 percent TPR and 90.9 percent FNR. The proposed methodology was able to accurately identify and predict VM migration in the dynamic data with 100 percent TPR and 100 percent FNR. It is clear from the confusion matrix that the performance of the proposed methodology based on SVM was better compared to KNN and the decision trees. In future, another set of similar optimization algorithms may be investigated to further improve the existing approach. Multi-class ML approaches may also be integrated for performance analysis against other state-of-the-art techniques. In the future, research may also be undertaken on resource constraints in the cloud data centre which is to be accessed by the cloud broker.