Cloud computing is a virtualized and scalable computing model that dynamically offers diverse resources and services. It operates through a network of servers, employing numerous physical machines known as virtual machines to distribute cloud services. Due to the dynamic arrival of computing task flows in the cloud computing environment, task failures may occur due to the unavailability of resources, execution cost and time exceeding, system running out of memory, over-utilization of resources, and improper installation. These faults cause cloud computing to fail to deliver on the time-sensitive requirements of user-requested applications. Therefore, a novel MRKFL-FTS technique has been developed to guarantee the fault-tolerant cloud IoT for task scheduling.
The main objective of the paper is to enhance resource-efficient, fault-tolerant IoT request scheduling in dynamic cloud environments. To achieve this objective, the MRKFL-FTS technique is introduced, which includes two methods, namely, Multi-objective Radial Kernelized Federated Learning (MRKFL) and the weighted round-robin scheduling method. The MRKFL method determines resource-efficient virtual machines based on factors, such as energy, memory, CPU time, and average failure rate. After classifying the resource-efficient virtual machines, the server assigns incoming IoT requests to the appropriate virtual machines using the weighted round-robin scheduling method.
Figure 1 illustrates the proposed MRKFL-FTS technique for IoT-aware resource-optimized scheduling in cloud environments. The inputs to the system include IoT service requests from users, while the outputs are the scheduled allocation of these requests to virtual machines based on resource efficiency and fault tolerance. The MRKFL technique involves a machine learning approach that combines multiple objectives to select the most suitable virtual machine for task scheduling. It leverages federated learning across decentralized edge devices to enhance the scheduling process. The system measures various parameters such as the energy consumption, memory usage, and CPU performance of virtual machines to determine their efficiency and suitability for task allocation. Then, fault tolerance mechanisms are used to ensure reliable task scheduling even in the presence of failures or disruptions. The local training model evaluates the performance of virtual machines based on factors like energy consumption, memory usage, CPU performance, and failure rate using Radial Kernelized Support Vector Regression(RKSVR). Locally trained models are combined to create a global aggregation model that represents the overall performance and suitability of virtual machines for task scheduling. Thus, the system identifies virtual machines that are both resource-efficient and have a minimal failure rate, ensuring optimal task allocation and fault tolerance. The task assigner schedules incoming IoT service requests to the selected virtual machine using a weighted round-robin scheduling method, optimizing resource utilization and enhancing fault tolerance in the scheduling process. By following these steps, utilizing the MRKFL-FTS technique, the system aims to improve the efficiency, reliability, and fault tolerance of IoT request scheduling in cloud environments.
3.2. Multi-Objective Radial Kernelized Federated Learning
Federated learning is a machine learning technique that allows a model to be trained across multiple decentralized edge devices (such as IoT devices or other local servers) without exchanging raw data. It involves training an algorithm locally across multiple resources and obtaining a global aggregation model. In the local training phase, a task assigner evaluates various objective functions of the virtual machine, such as the energy consumption, memory usage, CPU performance, and failure rate, by applying kernelized support vector regression. Then, the locally trained models are combined, and the resulting model is fed into the global aggregation model.
Figure 2 illustrates the structure of the Multi-objective Radial Kernelized Federated Learning (MRKFL) technique. In the figure, three visually identical input blocks and their corresponding local training models are depicted. While these models appear similar, they are deliberately designed to represent three distinct sets of input tasks arriving from different IoT users or application domains. Each input block corresponds to a unique batch of IoT-generated tasks that are dynamically received by the cloud server. These task batches differ in characteristics, such as resource demand, execution priority, and deadline constraints. Once received, each batch is processed independently by a dedicated local training model. These models employ RKSVR to classify the available virtual machines (VMs) based on multiple resource-oriented objectives, including energy availability, memory usage, CPU time, and average failure rate.
Although the three local models share an identical structure, the key distinction lies in the heterogeneity of their input task profiles. Each model is trained on different data, reflecting the unique demands of the incoming requests. After local training, each model outputs its classification of resource-efficient VMs. These locally optimized results are then forwarded to the global aggregation model, which combines them to produce a unified, optimized task scheduling decision.
Let us consider the virtual machines in a cloud datacenter and the consumption of energy, memory, and CPU time, and the average failure rates across virtual machines. Proper management of energy resources in virtual machines can lead to improved sustainability and cost-effectiveness. Memory allocation and optimization among the virtual machines are critical for maintaining system stability and preventing resource contention. Effective distribution of CPU time among the virtual machines is necessary for achieving optimal processing power and performance for various applications running on the cloud infrastructure. Monitoring and addressing the average failure rates in virtual machines are vital for ensuring service delivery and minimizing downtime. Thus, cloud providers can enhance the overall performance of cloud computing services.
Energy plays a major role in task scheduling. Therefore, managing the energy consumption of virtual machines is crucial for optimizing resource utilization and reducing operational costs. First, the energy availability is estimated as follows
where
indicates the energy availability of the virtual machine
,
indicates the total energy of the virtual machine
, and
denotes the consumed energy of the virtual machine
.
Memory is a significant resource in task scheduling, measured by the amount of storage space needed to execute tasks. Therefore, memory availability is evaluated to find a virtual machine’s storage capacity.
where
designates the memory availability of the virtual machine
,
indicates the total memory capacity of the virtual machine
, and
indicates the memory consumption of the virtual machine
.
The available CPU time within
is within a specific time window
B.
where
denotes the available CPU time of the virtual machine based on the allocated tasks within the time window
B. This is calculated by discretizing time where
represents the execution time of task g in
, and
represents the number of allocated tasks in
.
Following that, the average failure rate of a virtual machine is calculated as the ratio of the number of tasks that have previously failed to be executed by the machine to the total number of tasks submitted for execution on the machine. This is expressed by the following formula:
where
denotes an average failure rate of the virtual machine,
indicates a number of failed tasks of the virtual machine
, and
indicates a number of submitted tasks of the virtual machine
. After computing the multi-objective functions, RKSVR is employed to select the virtual machine.
Radial kernelized Support Vector Regression (RKSVR) is a machine learning technique used to find a hyperplane that categorizes the input into different classes based on the relationships between the resources of a virtual machine and the corresponding threshold value.
The optimal hyperplane
is the decision boundary that is used to make predictions. Therefore, the optimal hyperplane is estimated as follows:
From Equation (
5),
denotes the resource metrics of
. The equations specify weights for each performance metric.
represents the weight of the energy efficiency metric within
. Similarly,
represents the weight metric of the available memory on
and
represents the weight of the available CPU time on
.
Equation (
6) represents the hyperplane, which implies the value derived from Equation (5) for
and
represents a bias. Following that, the two marginal hyperplanes are selected either above or below the decision boundaries as defined by Equations (7) and (8).
where
and
indicate marginal hyperplanes for classifying
. If the value from Equation (
7)
is greater than 0, then
is classified as resource-efficient (i.e., +1). If the value from Equation (
8)
is less than 0, then
is classified as not resource-efficient (i.e., −1).
Hyperplanes use the radial kernel function to measure the similarity between the estimated resources of the virtual machine and its threshold value. The similarity is estimated as given in Equation (
9).
from Equation (
9),
is the radial kernel function of
,
is the difference between the resource of virtual machine
and threshold
, and
indicates a deviation parameter
. The output results of the radial kernel function vary between 0 and 1. Based on the similarity, the hyperplanes classify
if it is above or below the decision boundary. If the output is 1, then the hyperplanes classify
as
. If the kernel output is 0, then the hyperplane classifies
as
.
Figure 3 illustrates the classification results generated by the radial kernelized support vector regression (RK-SVR) model using actual test data obtained from the MRKFL-FTS framework. In this output, real virtual machine (VM) metrics—such as CPU utilization, memory usage, and task execution patterns—are used as input features to train and test the classifier. The RK-SVR model applies a radial kernel to map these input features into a higher-dimensional space where a separating hyperplane can be constructed. This hyperplane effectively distinguishes between resource-efficient and non-efficient virtual machines. In this context, a classification value of +1 indicates that the virtual machine is resource-efficient and suitable for scheduling tasks, while a value of −1 denotes that the machine does not meet the efficiency criteria. Unlike a schematic diagram, this figure visualizes actual classification boundaries and support vectors derived from empirical data, providing a more accurate and realistic view of how the model behaves in practice. The resulting plot shows a clear separation between the two classes, validating the effectiveness of the MRKFL-FTS in identifying optimal VMs for task assignment in a dynamic cloud environment. This contributes directly to minimizing computational overhead and improving scheduling efficiency.
A cross-entropy loss function is employed to evaluate the model’s performance. This loss function measures the difference between the actual and predicted probabilities for the classification task. Specifically, it quantifies the uncertainty in the predictions made by the radial kernelized model from Equation (
9), where the output is either resource-efficient or not resource-efficient. The cross-entropy loss function, denoted as
L, is defined as follows:
, the cross-entropy loss function, measures the error between the actual and predicted classifications of the virtual machine’s
resource efficiency.
represents the actual classification label (resource-efficient or not) for virtual machine
.
is the predicted probability that the virtual machine
is resource-efficient, and
n is the total number of virtual machines being evaluated.
In the proposed MRKFL-FTS technique, the global aggregator model is essential to unify the outputs from
trained with local IoT service data. Each VM computes its radial kernel output, reflecting localized insights into task scheduling and fault tolerance. The global aggregator combines these outputs using a weighted average to create a more generalized model that enhances classification accuracy across all VMs. This aggregation process addresses the challenge of data heterogeneity among VMs, ensuring that the model benefits from diverse training data while preserving data privacy. Furthermore, the global model minimizes the cross-entropy loss function, reducing discrepancies between predicted and actual classifications for resource-efficient task assignments. By incorporating the global aggregator, the MRKFL-FTS technique achieves robust and resource-aware task scheduling, enabling efficient IoT service management even in the presence of faults or non-uniform data distributions across VMs.
where
A represents the global aggregator model,
is the radial kernel output of the virtual machine
, and
denotes a weight of the local model result for the virtual machine
.
The federated learning framework aims to minimize the global loss function across all virtual machines. This objective function optimizes the overall learning accuracy.
where
Z denotes a global objective function.
denotes the cross-entropy loss function, which measures the difference between the actual and predicted resource-efficient classifications, i.e., the loss function that the aggregation model aims to reduce the losses in the federated learning process. The term
is a mathematical notation that identifies the parameter values (resource allocation strategies or weights) that minimize the global loss function
. Minimizing this loss ensures that the aggregated model optimizes its performance in terms of resource efficiency. By reducing the loss, the system iteratively improves its predictions for resource-efficient scheduling, leading to a balanced utilization of energy, memory, and CPU time. This minimization process is critical to ensuring that the federated learning framework not only enhances classification accuracy but also operates effectively within the constraints of the cloud IoT environment.
A stochastic steepest descent function is employed to optimize the loss. In this process, each virtual machine
is associated with a weight
at time
t, which reflects its efficiency or suitability for handling incoming IoT service requests. The weight is iteratively updated using gradient descent to minimize the loss
specific to that VM. The update rule is defined as follows:
Here,
denotes the updated weight for
,
is the current weight, and
represents the learning rate (
), controlling the update step size. The term
refers to the partial derivative of the local loss function
with respect to
, which quantifies how sensitive the loss is to the weight adjustment for
. This process is repeated iteratively until convergence or the maximum number of iterations is reached. By applying this adaptive update mechanism, the model continuously refines its classification of resource-efficient virtual machines. Consequently, after identifying the most efficient VM, the task assigner schedules the incoming IoT service requests to this selected VM, thereby improving overall resource utilization, reducing the execution loss, and enhancing the fault-tolerant scheduling in cloud environments.
Algorithm 1 describes the processing steps for efficient IoT task scheduling in cloud computing using a Multi-objective Radial Kernelized Federated Learning (MRKFL) technique. The federated learning process begins with deploying a local training model across a set of virtual machines. Within the local training model, support vector regression constructs a hyperplane as a decision boundary for analyzing the resources of each virtual machine and determining a threshold value based on the radial kernel function. Depending on the kernel outcome, the hyperplane categorizes a virtual machine as either resource-efficient or not. The global aggregation model then combines all the local training output results through a weighted average model. For each output result, an objective function is defined. The weights are updated based on the loss value. The updated model weight is sent back to the local training model for training on the newly generated input. Finally, the classification of resource-efficient virtual machines is obtained.
Algorithm 1 Multi-objective Radial Kernelized Federated Learning (MRKFL) |
- 1:
Input: Number of virtual machines - 2:
Output: Classification of virtual machines - 3:
Begin - 4:
for each incoming IoT request - 5:
for each virtual machine - 6:
Estimate resources , , , - 7:
Apply MRKSVR in local training - 8:
Construct hyperplane - 9:
Find two marginal hyperplanes , - 10:
Measure the radial kernel between the resource and threshold using (9) - 11:
if - 12:
Virtual machine is classified as resource-efficient - 13:
else - 14:
Virtual machine is classified as not resource-efficient - 15:
end if - 16:
Find global aggregation model by weighted average using Equation ( 11) - 17:
Minimize the loss function for the local training model using Equation ( 12) - 18:
Update the weight using Equation ( 13) - 19:
if Minimum of loss is reached - 20:
Convergence is met - 21:
else - 22:
Go to Step 16 - 23:
end if - 24:
Return: Output results (Classified Virtual Machines) - 25:
end for - 26:
End
|