Achieving Reliability in Cloud Computing by a Novel Hybrid Approach

Cloud computing (CC) benefits and opportunities are among the fastest growing technologies in the computer industry. Cloud computing’s challenges include resource allocation, security, quality of service, availability, privacy, data management, performance compatibility, and fault tolerance. Fault tolerance (FT) refers to a system’s ability to continue performing its intended task in the presence of defects. Fault-tolerance challenges include heterogeneity and a lack of standards, the need for automation, cloud downtime reliability, consideration for recovery point objects, recovery time objects, and cloud workload. The proposed research includes machine learning (ML) algorithms such as naïve Bayes (NB), library support vector machine (LibSVM), multinomial logistic regression (MLR), sequential minimal optimization (SMO), K-nearest neighbor (KNN), and random forest (RF) as well as a fault-tolerance method known as delta-checkpointing to achieve higher accuracy, lesser fault prediction error, and reliability. Furthermore, the secondary data were collected from the homonymous, experimental high-performance computing (HPC) system at the Swiss Federal Institute of Technology (ETH), Zurich, and the primary data were generated using virtual machines (VMs) to select the best machine learning classifier. In this article, the secondary and primary data were divided into two split ratios of 80/20 and 70/30, respectively, and cross-validation (5-fold) was used to identify more accuracy and less prediction of faults in terms of true, false, repair, and failure of virtual machines. Secondary data results show that naïve Bayes performed exceptionally well on CPU-Mem mono and multi blocks, and sequential minimal optimization performed very well on HDD mono and multi blocks in terms of accuracy and fault prediction. In the case of greater accuracy and less fault prediction, primary data results revealed that random forest performed very well in terms of accuracy and fault prediction but not with good time complexity. Sequential minimal optimization has good time complexity with minor differences in random forest accuracy and fault prediction. We decided to modify sequential minimal optimization. Finally, the modified sequential minimal optimization (MSMO) algorithm with the fault-tolerance delta-checkpointing (D-CP) method is proposed to improve accuracy, fault prediction error, and reliability in cloud computing.


Introduction
Motivation. CC debuted in information technology and has since evolved into a popular business model for providing IT infrastructure, components, and applications [1]. The five distinct characteristics of CC are on-demand self-service, extensive network access, resource pooling, rapid elasticity, and measured service. In addition, four deployment models are available: private clouds, community clouds, public clouds, and hybrid clouds. In addition, it provides three service models: SaaS, PaaS, and IaaS [2].
Sarker [18] suggested that the process of learning a function that translates input to output was introduced using sample input-output pairs. It uses labeled training data and a set of training examples to infer a function. When specific goals are specified to be achieved from a specific set of inputs, i.e., a task-driven method, supervised learning occurs. The most common supervised tasks are classification (data separation) and regression (fitting data). For example, supervised learning is used to predict the class label or sentiment of a piece of text, such as a tweet or product review.
Butt et al. [19] investigated ML as the logical evaluation of computations and quantifiable models used by computer systems to perform a particular attempt without the need for explicit instructions based on models and acceptance. It falls under the umbrella of computerized reasoning. ML is so important in the cloud that it will be used by all clouds soon.
Sun et al. [20] suggested that ML has recently grown at a breakneck pace, attracting a large number of academics and practitioners. It has emerged as one of the most prominent research areas, with applications in a wide range of industries including machine translation, speech recognition, image recognition, recommendation systems, and so on.
Kochhar et al. [21] suggested the NB classifier is one of the most useful ML algorithms. The NB classifier is based on the Bayes theorem, which requires significant independence (naïve) between qualities or features (predictors). Because it requires little work to develop and has no complicated repeating parameter setting or computation, the naïve Bayesian classification model is very useful for very large datasets. Despite its simplicity, the NB classifier is one of the most widely used algorithms because it frequently outperforms more complicated and refined classification algorithms.
Chang and Lin [22] proposed that LIBSVM is a support vector machines library. The goal is to make applying SVM to applications as simple as possible for users. LIBSVM has been widely used in ML and other fields. LIBSVM has grown to be one of the most widely used SVM programs. LIBSVM provides support for a variety of SVM formulations for classification, regression, and distribution estimation. LIBSVM is widely used in numerous fields.
Mohamad [23] found that based on many independent factors, MLR is used to estimate the probability of multiple possible outcomes for a categorical dependent variable with more than two categories. The MLR model compares various categories using a combination of binary logit models. The multinomial logit model is composed of k-1 binary logit models that assess the influence of predictors on the likelihood of success in that category for k response variable categories.
C.R. LI and J. GUO [24] proposed that the SMO limits B to only two multipliers that can be calculated analytically and do not require any extra matrix storage. There are two methods for determining which multipliers to optimize. The first heuristic prioritizes unbound multipliers that are more likely to violate the KKT specifications. The secondchoice heuristic, after selecting the first Lagrange multiplier, selects the second Lagrange multiplier that maximizes the difference between the two prediction errors. To save training time, the SMO technique is based on a single-program multiple data (SPMD) paradigm. It divides the entire dataset into smaller subsets and uses several processors to update the error array of each subset in parallel.
Sen et al. [25] suggested that the KNN saves all available records and predicts the class of new occurrences in probability using similarity measures from the nearest neighbors. Unlike other classification techniques that construct a mapping function or internal model, this classification technique is known as a lazy learning method because it stores the data members in inefficient data structures such as hash tables, reducing the computation cost to check and apply the appropriate distance function between the new observation and all k amount of different data points stored and then come to any conclusion about the label of the new data point. The results are generated by applying simple majority support to the KNN of each new data point.
In the work by Attallah et al. [26], the proposed methodology tolerates VM CPU faults to achieve maximum CC infrastructure reliability and availability. CPU faults can occur during VM operation. The proposed model's main goal is to track changes in CPU utilization and make a decision when a high value of CPU utilization is detected. It either moves the faulty VM to a different destination host or manages loads on the destination host so that the faulty VM can be moved.
S. Suguna and K. Devi [27] suggested virtual machine fault tolerance (VMFT). The machine tolerates failure in this method based on the VMs reliability. It delivers reliability and availability while also shortening service times. When the application is calculated on a VM, the VM that produces the proper logical output in the shortest amount of time is regarded as the best VM among all VMs, and that VM is used for further application processing. The suggested VMFT approach is implemented using a cloud sim tool. The time it takes to execute the program is used to measure the reliability of a single-node VM. The node that returns the result on time is designated as the reliable VM.
Sarker [18] the proposed RF classifier is a well-known ensemble classification approach used in machine learning and data science in a wide range of application fields. This method employs a parallel ensemble, which entails fitting multiple decision tree classifiers to different data sets sub-samples concurrently, with the conclusion or final result determined by majority voting or averages. As a result, over-fitting is reduced, and forecast accuracy and control are improved. As a result, the RF learning model with multiple decision trees outperforms a single decision tree model regularly. It generates a series of decision trees with a controlled variance using a combination of bootstrap aggregation (bagging) and random feature selection. Table 1 shows a summary of the literature review.  [12] Muhammad Asim Shahid et al. 2020 They identify the need for FT efficiency metrics in algorithms in this article, which is one of the main concerns in cloud environments.
They do not provide quality of service in terms of reliability. [13] Vipul Gupta et al. 2019 In this article, they show that the accuracy value of the fault tolerance is 79%, which is better than in the existing method.
They do not provide classification techniques for selecting fault-tolerance nodes based on virtual machine success/failure. [14] Rakesh et al. 2020 In this article, reactive FT mechanisms were found to likely result in failure.
In this article, they do not implement machine learning algorithms for better fault prediction, so they are not providing high accuracy and less fault prediction.
[15] Sam Goundar and Akashdeep Bhardwaj 2018 This article discusses fault-tolerance systems for cloud computing environments and examines whether or not they are effective in a cloud environment.
They do not address accuracy and fault prediction to achieve reliability. [16] Mihiretu Kebede Edemo 2019 The author created a fault-tolerance architecture that can effectively use versions in real-time cloud computing systems.
The limitation is that the architecture cannot tolerate faults if an equal number of versions fail in each subpart at the same time, especially if the number of failed versions exceeds the number of error-free versions in all subparts.
[17] Jackson Kamiri and Geoffrey Mariga 2021 The primary goal of this paper was to investigate current machine learning research methods, emerging themes, and the implications of those themes in machine learning research.
They do not offer content analysis for machine learning applications such as supervised learning, text analytics, classification, and prediction.   [18] Iqbal H. Sarker 2021 The author provides a comprehensive overview of machine learning algorithms, which can be used to improve an application's intelligence and capabilities.
There is a lack of analysis on machine learning algorithms. [19] Umer Ahmed Butt et al. 2020 They present an analysis of CC security threats, issues, and solutions that used one or more ML algorithms in this review paper.
There is a lack of a proposed solution to achieve reliability based on VM failure. [20] Shiliang Sun et al. 2019 In this article, they use of ML algorithms to improve accuracy.
There is a lack of challenges and open problems in ML optimization methods. [21] Deepak Kochhar et al. 2017 The proactive fault-tolerance technique is used in this article, and they propose using the NB classifier to classify the nodes.
There is a lack of use of other classification algorithms to improve accuracy and achieve less fault prediction. [22] Chih-Chung Chang and Chih-Jen Lin 2022 In this article, they present the implementation of LibSVM and discuss all issues.
There is a lack of ensuring good system reliability. [23] Nor Amira Mohamad et al. 2016 This study used MLR model to determine fault prediction.
There is a lack of use of other classification algorithms to determine fault prediction.
[24] C.R. LI and J. GUO 2015 The authors of this paper proposed an improved version of SVM that can avoid falling into endless loops.
The article was unable to determine the optimal parameter in an n-way that can speed up training.
[25] Pratap Chandra Sen et al. 2020 This paper attempts to compare various types of classification algorithms and provides a thorough review of all supervised learning classifications.
There is a lack of a proposed solution to achieve reliability based on VM failure.
[26] Salma M.A. Attallah et al. 2020 The main goal of the proposed model is to track changes in CPU utilization and make a decision when a high value of CPU utilization is identified.
There is a lack of a proposed solution to achieve reliability based on VM failure.
[27] S. Suguna and K. Devi 2015 The authors proposed a virtual machine fault-tolerance technique in this article to achieve reliability.
In this article, the authors only achieved one virtual machine result that was successful, and the remaining were failures.

Problem Statement
Reliability is a continuous metric that changes with each computing step. One of the most important service characteristics is reliability, which must be met in CC for a stable operation. The main backup duplication is a critical FT software strategy used to meet reliability requirements. The dependability of overall task completion is the result of specific activities, and for too many thousands or millions of computing operations, this can quickly become a fading variety. A cloud system's reliability is an assessment of how effectively the cloud system provides the service to the user based on the criteria listed above [8].
There is a need to design and implement ML models that can resolve FT issues by acquiring high accuracy, less fault prediction, and achieving optimum reliability based on successfully running VMs without any failure.

Mathematical Equation for Reliability
In complex systems, the analyst requires a mathematical approach to determine the importance of each VM. Reliability is an appropriate measure for determining the Sensors 2023, 23,1965 7 of 55 relative importance of each virtual machine to the overall system reliability. The reliability importance, IRi, of component i in an n-VM system is given as follows [8]: where Rs (t) is the reliability at a given time, t; and Ri (t) is the VM reliability at the same time, t [8]. RI measures the rate of change of system reliability in relation to VM reliability at a given time t. The RI can also be used to calculate the likelihood that a component will cause a system failure at time t. The calculated reliability importance in Equation (1) can be influenced by both reliability and the current position of a system component [8].

Research Methodology
This section focuses on the proposed methodology. In this section, the research design, data collection procedure, and data analysis techniques are all explained in detail. The architecture of the data analysis techniques is also incorporated and explained.

Research Design
The following research design was followed.

Data Collected and Generated
The secondary dataset contains trace data collected from the ETH Zurich homonymous, experimental HPC system, and the generated primary dataset contains repair and failure virtual machine data to conduct an ML-based approach for FT reliability in CC.

Machine Learning Algorithms
This research is based on supervised ML algorithms to achieve high accuracy and less fault prediction error. It is defined by its use of labeled datasets to train algorithms to classify data or predict outcomes accurately.

Fault-Tolerance Approach
In this research, FT is used to identify the failure and repair of VMs. Virtual machines are used in a CC system to handle user requests for services. A user request cannot be completed if the virtual machine fails. D-CP mechanisms are used to mitigate the impact of VM failure.

Reliability
In CC, reliability is defined as a cloud computing system's capacity to complete the intended job or deliver a necessary service for a certain amount of time under predetermined conditions. In this research, reliability is achieved based on ML and the FT method. Reliability means that all VMs have been run successfully without any failure.

Implementation View of Research Framework
The research framework diagram was designed to understand the flow of the proposed research. In the beginning, the secondary data acquisition was completed from external sources, and primary data were generated through the Weibull distribution approach. The secondary data set was cleaned out and then processed. The primary dataset was clean. Figure 1 demonstrates that a genuine, competent, and effective solution has been designed to achieve high accuracy, less fault prediction error, and achieve reliability in cloud computing from ML and D-CP techniques. posed research. In the beginning, the secondary data acquisition was completed from external sources, and primary data were generated through the Weibull distribution approach. The secondary data set was cleaned out and then processed. The primary dataset was clean. Figure 1 demonstrates that a genuine, competent, and effective solution has been designed to achieve high accuracy, less fault prediction error, and achieve reliability in cloud computing from ML and D-CP techniques.

Acquired Secondary Data
We acquired Antarex HPC fault dataset secondary data through the ZENODO website, and this dataset is published in articles. This dataset and all test environment details

Acquired Secondary Data
We acquired Antarex HPC fault dataset secondary data through the ZENODO website, and this dataset is published in articles. This dataset and all test environment details are publicly available for use by the community. The Antarex secondary dataset is based on trace data from the homonymous, experimental HPC system at ETH Zurich during fault injection, which is used to undertake ML-based fault prediction studies for researchers.
CPU-Mem mono-has (Instances 4005), CPU-Mem multi-(Instances 4380), HDD mono-(Instances 3244), and HDD multicores (Instances 2493) dataset. This dataset block has eight attributes (timestamp, type, args, seqNum, duration, cores, error, and isFault) and various instances. These instance types are numeric and nominal bases [28]. Table 2 shows a short overview of the secondary dataset. In the Antarex secondary dataset, we used exploratory data analysis (EDA). The goal of EDA is to tackle specific tasks such as detecting missing and incorrect data, mapping and understanding the underlying structure of the data, and identifying the most important variables in the dataset. The dataset is divided into two sections: CPU and memoryrelated benchmark apps and fault programs as well as hard-drive-related apps and fault programs. Antarex datasets are organized into four folders: one for each dataset block, namely CPU/memory and HDD, in single-core and multi-core forms [28].

Data Pre-Processing on Secondary Dataset
Data pre-processing is necessary before applying ML algorithms to secondary datasets. This dataset has duplicate values in three attributes named args, seqNum, and duration. Furthermore, this dataset has some none values and empty rows. All duplicate values, none values, and empty rows were removed using the Remove Duplicates option in Excel after applying data pre-processing of the CPU-Mem mono-(Instances 1740), CPU-Mem multi-(Instances 1408), HDD mono-(Instances 568), and HDD multicores (Instances 551).

Generated Primary Data
We generated a primary dataset through the Weibull distribution approach. The Weibull distribution is also often employed as a time-to-failure model for reliability. It extends the exponential model by including non-constant failure rate functions. This contains both rising and falling failure rate curves and has been successfully utilized to explain both initial burnings and wear-out failures [11]. We coded different parameters in the Java platform for primary data generated using the Weibull distribution approach. Table 3 is a summary of the parameters of the primary dataset generated, and Table 4 shows a short overview of the primary dataset. This primary dataset has seven attributes: failure host ID (FHID), host failure time (HFT), last failure time (LFT), distribution (Dis), distribution happen time (DHT), failure time/repair time (FTime/RTime), and status and total (1400) instances. These instance types are numeric and nominal bases.

Data Analysis Techniques
Different ML-based techniques were used in this study for fault classification and prediction. Fault classification and prediction were carried out using various classifiers from NB, LibSVM, MLR, SMO, KNN, and RF algorithms.

Naïve Bayes
The NB classifier represents, employs, and learns well-defined probabilistic knowledge. The method is intended for supervised induction tasks where the performance goal is to correctly predict the class of test cases, and the training examples include class information.
A naïve classifier is a type of Bayesian network that is based on two basic simplifying assumptions. It assumes, in particular, that the predictive qualities are conditionally independent of the class and that no hidden or latent features influence the prediction process. As a result, Figure 2 depicts the graphic shape of a naïve Bayesian classifier, with all arcs pointing from the class attribute to the observable, predictive attributes [29].

Naïve Bayes
The NB classifier represents, employs, and learns well-defined proba knowledge. The method is intended for supervised induction tasks where the mance goal is to correctly predict the class of test cases, and the training examples class information. A naïve classifier is a type of Bayesian network that is based basic simplifying assumptions. It assumes, in particular, that the predictive quali conditionally independent of the class and that no hidden or latent features influe prediction process. As a result, Figure 2 depicts the graphic shape of a naïve B classifier, with all arcs pointing from the class attribute to the observable, predic tributes [29]. In Equations (2)-(4), the Bayes' rule is used to compute the probability of eac given a vector of observed values for the predictive qualities and then predicts th likely class.  In Equations (2)-(4), the Bayes' rule is used to compute the probability of each class given a vector of observed values for the predictive qualities and then predicts the most likely class.
Let C represent the random variable representing an instance's class and X represent a vector of random variables representing the observed attribute values. Let c represent a specific class label and x represent an observed attribute value vector.
We can write the probability density function for a normal (or Gaussian) distribution for continuous attributes.

Library Support Vector Machine
LIBSVM is an SVM library. The goal is to make applying SVM to applications as simple as possible for users. LIBSVM is widely used in machine learning and a variety of other fields. LIBSVM is commonly used in two steps: first, training a data set to generate a model and then using the model to predict information from a testing data set. LIBSVM provides support for a wide range of SVM formulations for classification, regression, and distribution estimation. Figure 3 depicts the LIBSVM code organization for training [22].
In Equation (5), where e = [1... 1], T is the vector of all ones; Q is a l by l positive semidefinite matrix, Qij yiyjK(xi, xj); and the kernel function is as follows: LIBSVM is an SVM library. The goal is to make applying SVM to applications as sim ple as possible for users. LIBSVM is widely used in machine learning and a variety o other fields. LIBSVM is commonly used in two steps: first, training a data set to generate a model and then using the model to predict information from a testing data set. LIBSVM provides support for a wide range of SVM formulations for classification, regression, and distribution estimation. Figure 3 depicts the LIBSVM code organization for training [22]. In Equation (5), where e = [1... 1], T is the vector of all ones; Q is a l by l positive semidefinite matrix, Qij yiyjK(xi, xj); and the kernel function is as follows: (5

Multinomial Logistic Regression
Softmax is an abbreviation for MLR. Because of the hypothesis function it employs regression is a supervised learning technique that can be used to solve a variety of prob lems, including text categorization. It is a regression model that applies logistic regression to classification problems with multiple possible outcomes [30]. The multinomial logistic classifier is depicted in Figure 4.

Multinomial Logistic Regression
Softmax is an abbreviation for MLR. Because of the hypothesis function it employs, regression is a supervised learning technique that can be used to solve a variety of problems, including text categorization. It is a regression model that applies logistic regression to classification problems with multiple possible outcomes [30]. The multinomial logistic classifier is depicted in Figure 4.

Sequential Minimal Optimization
To train an SVM, a very large quadratic programming (QP) optimization problem must be solved. SMO divides the enormous QP problem into the smallest feasible QP In Equation (6), MLR is employed where the objective function of the classifier is given as above.

Sequential Minimal Optimization
To train an SVM, a very large quadratic programming (QP) optimization problem must be solved. SMO divides the enormous QP problem into the smallest feasible QP problems. These minor QP issues are handled analytically, which eliminates the need for a time-consuming numerical QP optimization as an inner loop. SMO's memory requirements scale linearly with training set size, allowing it to handle extremely large training sets. SMO scales the training set size for various test problems somewhere between linear and quadratic because matrix computation is avoided, whereas the traditional chunking SVM technique scales the training set size somewhere between linear and cubic. Because SVM evaluation consumes the majority of SMO's computing time, SMO is the fastest for linear SVMs and sparse data sets. In real-world sparse data collections, SMO can be more than 1000 times faster than chunking [32]. Figure 5 depicts the overall architecture of SMO inference and training. In Equations (7)-(9) the QP problem for training an SVM is as given below: In Equation (6), the QP problem for training an SVM is maximized and subject to and (9).

K-Nearest Neighbor
The KNN classification method is widely used. It is widely used because of its s plicity and quick calculation time [34]. The choice of value k is critical in this method shown in Figure 6. The two parameters that must be accessible to different k values training and validation error rates [35].

•
Determine the parameter K defining the number of nearest neighbors [35]  In Equations (7)-(9) the QP problem for training an SVM is as given below: In Equation (6), the QP problem for training an SVM is maximized and subject to (7) and (9).

K-Nearest Neighbor
The KNN classification method is widely used. It is widely used because of its simplicity and quick calculation time [34]. The choice of value k is critical in this method, as shown in Figure 6. The two parameters that must be accessible to different k values are training and validation error rates [35]. Data are used by fine and medium classifiers to categorize new data points based on similarity measurements.

•
Fine and Medium KNN: The fine and medium KNN algorithms use the Euclidean distance function to calculate the nearest neighbors, as shown in Equations (10) and (11).
To calculate the NNs, the fine and medium KNN algorithms employ the Euclidean distance function, as indicated in Equations (10) and (11).

Random Forest
This method generates a large number of collaborative decision trees. In this algorithm, decision trees serve as pillars. RF is a set of decision trees that were defined during the pre-processing stage. After constructing many trees, the best feature from a random subset of features is chosen. Another idea generated by the decision tree algorithm is the creation of a decision tree. As a result, these trees combine to form a random forest, which is used to classify new objects based on the input vector. Each built decision tree is used to categorize. Figure 7 depicts the flowchart of a random forest classifier [36].  Use the majority in the category of nearest neighbors as the instance's prediction value [35].
Data are used by fine and medium classifiers to categorize new data points based on similarity measurements.

•
Fine and Medium KNN: The fine and medium KNN algorithms use the Euclidean distance function to calculate the nearest neighbors, as shown in Equations (10) and (11).
To calculate the NNs, the fine and medium KNN algorithms employ the Euclidean distance function, as indicated in Equations (10) and (11).

Random Forest
This method generates a large number of collaborative decision trees. In this algorithm, decision trees serve as pillars. RF is a set of decision trees that were defined during the pre-processing stage. After constructing many trees, the best feature from a random subset of features is chosen. Another idea generated by the decision tree algorithm is the creation of a decision tree. As a result, these trees combine to form a random forest, which is used to classify new objects based on the input vector. Each built decision tree is used to categorize. Figure 7 depicts the flowchart of a random forest classifier [36]. The mathematical formula for RF classifiers is shown below in Equation (12).
ni sub(j) = the importance of node j w sub(j) = weighted number of samples reaching node j C sub(j) = the impurity value of node j left(j) = child node from left split on node j right(j) = child node from right split on node j The mathematical formula for RF classifiers is shown below in (12).

Parameters Configuration of ML Classifiers
ML classifiers have been configured by applying different parameters to achiev curacy and fault prediction by class. Table 5 shows the different parameters of ML c fiers with values. The mathematical formula for RF classifiers is shown below in Equation (12). n ij = w i C j − w le f tj C le f tj − w rightj C rightj n i sub(j) = the importance of node j w sub(j) = weighted number of samples reaching node j C sub(j) = the impurity value of node j le f t(j) = child node from left split on node j right(j) = child node from right split on node j The mathematical formula for RF classifiers is shown below in (12).

Parameters Configuration of ML Classifiers
ML classifiers have been configured by applying different parameters to achieve accuracy and fault prediction by class. Table 5 shows the different parameters of ML classifiers with values.

Modified Sequential Minimal Optimization
The original SMO algorithm has low accuracy and a high fault prediction error. This research has to resolve FT issues by acquiring high accuracy with less fault prediction error to apply to D-CP to achieve reliability by acquiring high accuracy with less fault prediction error from SMO. The block diagram of an MSMO classifier is shown in Figure 8. High accuracy and less fault prediction errors are based on the primary dataset that has been generated. High accuracy and less fault prediction error are evaluated in min α1, α2 using an objective function. High accuracy & less fault prediction error have been made by applying objective functions through algorithm parameters and kernel parameters. The C parameter is determined as a trade-off between fitting the training data and maximizing the separating margin. C has a value between 0.01 and 100. The random seed is set at 2. The only parameter for the polynomial kernel is the exponent, which controls the degree of the polynomial. By default, the kernel computes the exponent as (x*y).
Sensors 2023, 23, x FOR PEER REVIEW 17 of 57 set at 2. The only parameter for the polynomial kernel is the exponent, which controls the degree of the polynomial. By default, the kernel computes the exponent as (x*y).

Delta Checkpointing
D-CP is a common basic FT mechanism that works by saving a VMs execution state as an image file regularly. However, due to the limited network resources available in data centers, transferring a large number of CP image files can quickly become congested. This study used a D-CP approach to address this issue, in which the base system is only saved once the first CP is completed, and subsequent CP images only contain incrementally modified pages [9]. The D-CP interval is the amount of time that passes between CPs [37].

Description of D-CP Algorithm
The main benefit of using CP is that it allows the cloud computing resources to be used for other customers' requests while reducing profit loss caused by other methods of fault tolerance. CP interval and latency are two parameters that have a significant impact on the CP algorithm. The CP interval is the amount of time between one CP and the next. CP latency is the amount of time it takes to save a CP.
The CP algorithm assumes that the length of the CP interval must not be fixed while the customer's task is being executed. At the time of the current CP, the algorithm calculates the next CP interval. This is determined by the failed history of the VM on which the task is run. If the failure history is poor, the algorithm will shorten the CP interval. Furthermore, if the failure history is good, the algorithm will extend the CP interval. Equations (13)- (18) values are based on the D-CP algorithm.

Delta Checkpointing
D-CP is a common basic FT mechanism that works by saving a VMs execution state as an image file regularly. However, due to the limited network resources available in data centers, transferring a large number of CP image files can quickly become congested. This study used a D-CP approach to address this issue, in which the base system is only saved once the first CP is completed, and subsequent CP images only contain incrementally modified pages [9]. The D-CP interval is the amount of time that passes between CPs [37].

Description of D-CP Algorithm
The main benefit of using CP is that it allows the cloud computing resources to be used for other customers' requests while reducing profit loss caused by other methods of fault tolerance. CP interval and latency are two parameters that have a significant impact on the CP algorithm. The CP interval is the amount of time between one CP and the next. CP latency is the amount of time it takes to save a CP.
The CP algorithm assumes that the length of the CP interval must not be fixed while the customer's task is being executed. At the time of the current CP, the algorithm calculates the next CP interval. This is determined by the failed history of the VM on which the task is run. If the failure history is poor, the algorithm will shorten the CP interval. Furthermore, if the failure history is good, the algorithm will extend the CP interval. Equations (13)- (18) values are based on the D-CP algorithm.
τ ji (13) The execution time of task j on VM i.
Sensors 2023, 23,1965 17 of 55 The remaining execution time of task j on VM i.
Probability of no failure of VM i. h CP interval. z (18) Number of failures during the task execution.

Reliability
The dependability of each VM will be assessed, and cloudlets will be assigned to the most reliable VM. It is the cloud broker's responsibility to assign cloudlets to cloud providers. To evaluate VM reliability, we first determine whether cloudlets executed successfully or failed within the time limit. Then, we update the reliability of each VM based on success and failure cloudlets. Finally, we select the most dependable VM from the list of available VMs and assign cloudlets to it [38].
In the equation, (19) represents the VM's reliability, and (20) to (21) represents the host's reliability, where MMi is the available memory ratio, CPi is the available MIPS ratio, BWi is the available bandwidth ratio, and Ri is the reliability of the ith VM.
Reliability of VM.
Now, (20) is used to calculate the VMs available RAM ratio.
(21) The Ri reliability of ith in the VM is found. Similarly, as shown in Equations steps (22)-(24), the availability of MIPS ratio CPi and bandwidth ratio BWi can be calculated as the ratio of available MIPS to total MIPS and available bandwidth to total bandwidth.
Reliability of host.
Next, (23) is used to obtain the ratio of available MIPS in the VM.
Then, (24) is used to find the BWi available bandwidth ratio in the VM.

Results and Findings
Data analysis is performed in this section. Classification results using NB, LibSVM, MLR, SMO, and RF with confusion matrix and graphical representations results are incorporated into this section. Finally, MSMO results, which are the main algorithm of this research study, are also included here. This research focuses on a comparative analysis of conventional, ML algorithms, and FT techniques for high accuracy, less fault prediction error, and reliability.
The secondary dataset archive includes four directories: one for each dataset block, namely CPU/memory and HDD, in single-core and multi-core variants [10]. A significant difference was observed in the four directories of the secondary dataset based on results, the difference is CPU-Mem multicores have good results against the remaining directories such as CPU-Mem mono, HDD mono, and HDD multi.
According to the comparisons, the primary dataset has good results against the secondary dataset, so in this research, the primary dataset results were sufficient to consider in terms of modification of the ML algorithm.
ML classifiers were used before using the FT Delta-CP approach. Data were trained on 80/20, 70/30, and 5-fold cross-validation using NB, LibSVM, MLR, SMO, KNN, and RF classifiers, and the desired results in the classification (secondary and primary) were achieved. The results are compared based on NB, LibSVM, MLR, SMO, KNN, and RF in terms of accuracy, fault prediction error, and data validation by class using the following Equations (25)- (36). Secondary dataset (CPU-Mem multi) results proved that NB outperformed LibSVM, MLR, SMO, KNN, and RF. Furthermore, the primary dataset results proved that RF outperformed, but the time complexity was not good. According to the primary dataset results, RF and SMO have minor point values difference between results, but SMO yielded good time complexity. The software environment we used is WEKA 3.8.6 with Remove Percentage Filter.
In Equation (25), the accuracy is defined as above.
Recall or True − Positive Rate = TP TP + FN (26) In Equation (26), the recall or true-positive rate is defined as above.
In Equation (27), the true-negative rate is defined as above.
In Equation (28), the precision is defined as above.
In Equation (29), the false-positive rate is defined as above.
In Equation (30), the Matthews correlation coefficient is defined as above.
In Equation (32), the F1 score is defined as above.
• The RMSE is a commonly used measure of the difference between predicted and observed values by a model or estimator [39]; • MAE is a distinct measure of two continuous variables [39]; • The relative absolute error normalizes the total absolute error by dividing it by the total absolute error of the simple predictor [40]; • The relative squared error normalizes the total squared error by dividing it by the simple predictor's total squared error [40].
In Equation (33), the RMSE is defined as follows.
In Equation (34), the MAE is defined as follows.
In Equation (35), the RAE is defined as follows.
In Equation (36), the RSE is defined as follows.

Simulation Setup of ML Classifiers to Achieve High Accuracy and Less Fault Prediction
WEKA stands for Waikato environment for knowledge analysis and refers to software written in Java by the University of Waikato in New Zealand and distributed under the GNU general public license. This software consists of a collection of [41] machine learning algorithms [42] and data pre-processing and transformation tools, including discretization and sampling methods [41]. Table 6 shows the configuration description for an experiment.

Comparison of Classification Models on Secondary Dataset
We present the results associated with different classifiers using ISFAULT class in the secondary dataset. For classification models, we opted for NB, LibSVM, MLR, RF, KNN, and SMO with the poly kernel.
The secondary data results of each classifier are shown in Figures 9-72, with the 80/20, 70/30, and 5-fold cross-validation in terms of high accuracy and less fault prediction. Furthermore, data validation was 60% training, 20% testing, and 20% validation. In the secondary data results, CPU-Mem mono gave the highest percentage of accuracy and less fault prediction to the NB classifier in terms of 80/20 (77.01%), 70/30 (76.05%), and 5 -old cross-validation (74.88%) and CPU-Mem multi in terms of 80/20 (89.72%), 70/30 (90.28%), and 5-fold cross-validation (92.83%). Furthermore, for HDD mono, the SMO classifier gave the highest percentage of accuracy and less fault prediction fault in terms of 80/20 (87.72%), 70/30 (89.41%), and 5-fold cross-validation (88.38%) and HDD-multi in terms of 80/20 (93.64%), 70/30 (90.91%), and 5-fold cross-validation (88.20%). According to the results, the difference is that CPU-Mem multicores have good results against the remaining directories of CPU-Mem mono, HDD mono, and HDD multi.            The confusion matrix is used to calculate accuracy, precision, recall, and F-measure. It is used as an efficient technique for the classification of attributes based on qualitative response categories. Figures 13-18 show the confusion matrix related to accuracy and fault prediction achieved through NB, LibSVM, MLR, SMO, KNN, and RF. The following confusion matrix indicates that the NB classification model gave the highest percentage of accuracy and less fault prediction for CPU-Mem mono.    The confusion matrix is used to calculate accuracy, precision, recall, and F-measure. It is used as an efficient technique for the classification of attributes based on qualitative response categories. Figures 13-18 show the confusion matrix related to accuracy and fault prediction achieved through NB, LibSVM, MLR, SMO, KNN, and RF. The following confusion matrix indicates that the NB classification model gave the highest percentage of accuracy and less fault prediction for CPU-Mem mono.    The confusion matrix is used to calculate accuracy, precision, recall, and F-measure. It is used as an efficient technique for the classification of attributes based on qualitative response categories. Figures 13-18 show the confusion matrix related to accuracy and fault prediction achieved through NB, LibSVM, MLR, SMO, KNN, and RF. The following confusion matrix indicates that the NB classification model gave the highest percentage of accuracy and less fault prediction for CPU-Mem mono.                                                    The confusion matrix is used to calculate accuracy, precision, recall, and F-measure. It is used as an efficient technique for the classification of attributes based on qualitative response categories. Figures 29-34   The confusion matrix is used to calculate accuracy, precision, recall, and F-measure. It is used as an efficient technique for the classification of attributes based on qualitative response categories. Figures 29-34 show the confusion matrix related to accuracy and fault prediction achieved through NB, LibSVM, MLR, SMO, KNN, and RF. The following confusion matrix indicates that the NB classification model gave the highest percentage of accuracy and less fault prediction for CPU-Mem multi.                                             The confusion matrix is used to calculate accuracy, precision, recall, and F-measure. It is used as an efficient technique for the classification of attributes based on qualitative response categories. Figures 45-50 show the confusion matrix related to accuracy and fault prediction achieved through NB, LibSVM, MLR, SMO, KNN, and RF. The following confusion matrix indicates that the SMO classification model gave the highest percentage of accuracy and less fault prediction for HDD mono.    The confusion matrix is used to calculate accuracy, precision, recall, and F-measure. It is used as an efficient technique for the classification of attributes based on qualitative response categories. Figures 45-50 show the confusion matrix related to accuracy and fault prediction achieved through NB, LibSVM, MLR, SMO, KNN, and RF. The following confusion matrix indicates that the SMO classification model gave the highest percentage of accuracy and less fault prediction for HDD mono.                                             The confusion matrix is used to calculate accuracy, precision, recall, and F-measure. It is used as an efficient technique for the classification of attributes based on qualitative response categories. Figures 61-66 show the confusion matrix related to accuracy and fault prediction achieved through NB, LibSVM, MLR, SMO, KNN, and RF. The following confusion matrix indicates that the SMO classification model gave the highest percentage of accuracy and less fault prediction for HDD multi.   The confusion matrix is used to calculate accuracy, precision, recall, and F-measure It is used as an efficient technique for the classification of attributes based on qualitativ response categories. Figures 61-66 show the confusion matrix related to accuracy an fault prediction achieved through NB, LibSVM, MLR, SMO, KNN, and RF. The followin confusion matrix indicates that the SMO classification model gave the highest percentag of accuracy and less fault prediction for HDD multi.

Comparison of Classification Models on Primary Dataset
We present the results associated with different classifiers using the STATUS class in the primary dataset. For classification models, we opted for NB, LibSVM, MLR, RF, KNN, and SMO with the poly kernel.
In the primary data results, we notice that the RF classifier gives the highest percentage of accuracy and less fault prediction in terms of 80/20 (97.14%), 70/30 (96.19%), and 5fold cross-validation (95.85%), but the algorithm complexity (0.17 s) is not good. SMO gives the second highest accuracy and less fault prediction in terms of 80/20 (95.71%), 70/30 (95.71%), and 5-foldscross-validation (95.71%), and the algorithm complexity is good (0.3 s). The difference between the accuracy and lesser fault prediction between RF and SMO is just 0.13%, and the time complexity difference is 14 s. Figures 73-76 show the result comparison of NB, LibSVM, MLR, SMO, KNN, and RF in primary-dataset-related detailed accuracy by class (repair/failure) and prediction on further test-split data validation.

Comparison of Classification Models on Primary Dataset
We present the results associated with different classifiers using the STATUS class in the primary dataset. For classification models, we opted for NB, LibSVM, MLR, RF, KNN, and SMO with the poly kernel.
In the primary data results, we notice that the RF classifier gives the highest percentage of accuracy and less fault prediction in terms of 80/20 (97.14%), 70/30 (96.19%), and 5fold cross-validation (95.85%), but the algorithm complexity (0.17 s) is not good. SMO gives the second highest accuracy and less fault prediction in terms of 80/20 (95.71%), 70/30 (95.71%), and 5-foldscross-validation (95.71%), and the algorithm complexity is good (0.3 s). The difference between the accuracy and lesser fault prediction between RF and SMO is just 0.13%, and the time complexity difference is 14 s. Figures 73-76 show the result comparison of NB, LibSVM, MLR, SMO, KNN, and RF in primary-dataset-related detailed accuracy by class (repair/failure) and prediction on further test-split data validation.

Comparison of Classification Models on Primary Dataset
We present the results associated with different classifiers using the STATUS class in the primary dataset. For classification models, we opted for NB, LibSVM, MLR, RF, KNN, and SMO with the poly kernel.
In the primary data results, we notice that the RF classifier gives the highest percentage of accuracy and less fault prediction in terms of 80/20 (97.14%), 70/30 (96.19%), and 5-fold cross-validation (95.85%), but the algorithm complexity (0.17 s) is not good. SMO gives the second highest accuracy and less fault prediction in terms of 80/20 (95.71%), 70/30 (95.71%), and 5-foldscross-validation (95.71%), and the algorithm complexity is good (0.3 s). The difference between the accuracy and lesser fault prediction between RF and SMO is just 0.13%, and the time complexity difference is 14 s.                 The confusion matrix is used to calculate accuracy, precision, recall, and F-measure. It is used as an efficient technique for the classification of attributes based on qualitative response categories. Figures 77-82 show the confusion matrix related to accuracy and fault prediction achieved through NB, LibSVM, MLR, SMO, KNN, and RF. The following confusion matrix indicates that the RF classification model gave the highest percentage of accuracy and less fault prediction on the primary dataset, but the algorithm complexity (0.17 s) is not good.  The confusion matrix is used to calculate accuracy, precision, recall, and F-measure. It is used as an efficient technique for the classification of attributes based on qualitative response categories. Figures 77-82 show the confusion matrix related to accuracy and fault prediction achieved through NB, LibSVM, MLR, SMO, KNN, and RF. The following confusion matrix indicates that the RF classification model gave the highest percentage of accuracy and less fault prediction on the primary dataset, but the algorithm complexity (0.17 s) is not good. The confusion matrix is used to calculate accuracy, precision, recall, and F-measure. It is used as an efficient technique for the classification of attributes based on qualitative response categories. Figures 77-82 show the confusion matrix related to accuracy and fault prediction achieved through NB, LibSVM, MLR, SMO, KNN, and RF. The following confusion matrix indicates that the RF classification model gave the highest percentage of accuracy and less fault prediction on the primary dataset, but the algorithm complexity (0.17 s) is not good.

Modified Sequential Minimal Optimization Results
In this subsection, the results of the classification of the primary dataset results are shown in Figures 89-92

Modified Sequential Minimal Optimization Results
In this subsection, the results of the classification of the primary dataset results are shown in Figures 89-92

Modified Sequential Minimal Optimization Results
In this subsection, the results of the classification of the primary dataset results are shown in Figures 89-92      The confusion matrix is used to calculate accuracy, precision, recall, and F-mea It is used as an efficient technique for the classification of attributes based on qualit response categories. Figure 93 shows the confusion matrix related to accuracy and prediction achieved through MSMO. The following confusion matrix indicates tha MSMO classification model gave the highest percentage of accuracy and less fault pr tion error for the primary dataset against NB, LibSVM, MLR, SMO, KNN, and RF.   The confusion matrix is used to calculate accuracy, precision, recall, and F-measure. It is used as an efficient technique for the classification of attributes based on qualitative response categories. Figure 93 shows the confusion matrix related to accuracy and fault prediction achieved through MSMO. The following confusion matrix indicates that the MSMO classification model gave the highest percentage of accuracy and less fault prediction error for the primary dataset against NB, LibSVM, MLR, SMO, KNN, and RF.  The confusion matrix is used to calculate accuracy, precision, recall, and F-measure. It is used as an efficient technique for the classification of attributes based on qualitative response categories. Figure 93 shows the confusion matrix related to accuracy and fault prediction achieved through MSMO. The following confusion matrix indicates that the MSMO classification model gave the highest percentage of accuracy and less fault prediction error for the primary dataset against NB, LibSVM, MLR, SMO, KNN, and RF. Figure 94 represents the error of the classifier that shows the values corresponding to true-positive, true-negative, false-positive, and false-negative values. In the Figure 94, the square box represents the errors in the actual class versus the predicted class.
The confusion matrix is used to calculate accuracy, precision, recall, and F-measure It is used as an efficient technique for the classification of attributes based on qualitativ response categories. Figure 93 shows the confusion matrix related to accuracy and fau prediction achieved through MSMO. The following confusion matrix indicates that th MSMO classification model gave the highest percentage of accuracy and less fault predic tion error for the primary dataset against NB, LibSVM, MLR, SMO, KNN, and RF.

Simulation Setup of D-CP to Achieve Reliability
To achieve reliability, we integrated the MSMO classifier results with the D-CP faulttolerance technique. D-CP can learn from previous data and execute them.
We used a cloud simulation 3.0.3 toolkit. It is a simulation tool to mimic CC scenarios. We extended the cloud simulation simulator with a fault-tolerance D-CP method to achieve reliability. Table 7 shows the hardware specifications for an experiment.

Delta-Checkpointing Results
In this step, we configured the CP configuration file, the number of cloud users, and the cloud simulation library; created a data center, a broker, and a cloudlet; and submitted the VM list to the broker using the D-CP method. A data center with a recovery scheduler, CP scheduler, CP image index, data center destroyer, VMs, and cloudlets was included in the simulated platform. Each simulated VM has unique properties in our execution envi-

Simulation Setup of D-CP to Achieve Reliability
To achieve reliability, we integrated the MSMO classifier results with the D-CP faulttolerance technique. D-CP can learn from previous data and execute them.
We used a cloud simulation 3.0.3 toolkit. It is a simulation tool to mimic CC scenarios. We extended the cloud simulation simulator with a fault-tolerance D-CP method to achieve reliability. Table 7 shows the hardware specifications for an experiment.

Delta-Checkpointing Results
In this step, we configured the CP configuration file, the number of cloud users, and the cloud simulation library; created a data center, a broker, and a cloudlet; and submitted the VM list to the broker using the D-CP method. A data center with a recovery scheduler, CP scheduler, CP image index, data center destroyer, VMs, and cloudlets was included in the simulated platform. Each simulated VM has unique properties in our execution environment. Table 8 shows the parameters that affect D-CP reliability through VMs. The results of D-CP techniques for achieving reliability are shown in Table 9. It indicates that all VMs were successfully executed without VM failure because the D-CP mechanism regularly saves the VMs execution state as a CP image during failure-free execution. In the event of a failed event, the VM is restarted from an intermediate state using the previously saved CP image. The amount of computation lost as a result is reduced. The status (success/failure) determines the dependability of the multiple nodes VMs. Multiple nodes, all of which are reliable VMs, were successfully executed. Table 9 shows the results of achieved reliability through the ML & D-CP approach.