Federated Learning-Driven IoT Request Scheduling for Fault Tolerance in Cloud Data Centers

S, Sheeja Rani; Aburukba, Raafat

doi:10.3390/math13132198

Open AccessArticle

Federated Learning-Driven IoT Request Scheduling for Fault Tolerance in Cloud Data Centers

by

Sheeja Rani S

^* and

Raafat Aburukba

Computer Science and Engineering, American University of Sharjah, Sharjah P.O. Box 26666, United Arab Emirates

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(13), 2198; https://doi.org/10.3390/math13132198

Submission received: 27 May 2025 / Revised: 27 June 2025 / Accepted: 29 June 2025 / Published: 5 July 2025

(This article belongs to the Special Issue Advanced Information and Signal Processing: Models and Algorithms)

Download

Browse Figures

Versions Notes

Abstract

Cloud computing is a virtualized and distributed computing model that provides resources and services based on demand and self-service. Resource failure is one of the major challenges in cloud computing, and there is a need for fault tolerance mechanisms. This paper addresses the issue by proposing a multi-objective radial kernelized federated learning-based fault-tolerant scheduling (MRKFL-FTS) technique for allocating multiple IoT requests or user tasks to virtual machines in cloud IoT-based environments. The MRKFL-FTS technique includes Cloud RAN (C-RAN) and Virtual RAN (V-RAN). The proposed MRKFL-FTS technique comprises four entities, namely, IoT devices, cloud servers, task assigners, and virtual machines. Each IoT device generates several service requests and sends them to the control server. At first, radial kernelized support vector regression is applied in the local training model to identify resource-efficient virtual machines. After that, locally trained models are combined, and the resulting model is fed into the global aggregation model. Finally, using a weighted round-robin method, the task assigner allocates incoming IoT service requests to virtual machines. This approach improves resource awareness and fault tolerance in scheduling. The quantitatively analyzed results show that the MRKFL-FTS technique achieved an 8% improvement in task scheduling efficiency and fault prediction accuracy, a 36% improvement in throughput, and a 14% reduction in makespan and time complexity. In addition, the MRKFL-FTS technique resulted in a 13% reduction in response time. The energy consumption of the MRKFL-FTS technique is reduced by 17% and increases the scalability by 8% compared to conventional scheduling techniques.

Keywords:

cloud; IoT; fault-tolerant scheduling; Cloud RAN (C-RAN); Virtual RAN (V-RAN); multi-objective federated learning; radial kernelized support vector regression; weighted round-robin scheduling method

MSC:

68M14; 68M20

1. Introduction

The cloud comprises data centers established by connecting various large-scale physical devices to provide user-required services through the Internet. Implementing an effective IoT strategy in the cloud requires consideration of application requirements, resource availability, and the dynamic nature of IoT workloads. It involves a balance between efficiency and cost-effectiveness. Request scheduling in the context of IoT and cloud computing involves allocating and coordinating tasks to optimize resource utilization, enhance system performance, and ensure timely responses to IoT-generated requests. Fault tolerance in cloud computing is usually classified into proactive and reactive mechanisms. The proactive mechanism desires more data concerning cloud computing and its mechanisms in a probabilistic method. The proactive strategy in cloud computing minimizes the failure time and improves the throughput and capacity. The reactive fault tolerance reduces the impact of failure on the execution of applications.

Federated learning is achieving better accuracy beyond data partitioning. It allows for the training of models on different datasets while maintaining data privacy and minimizing the requirements for extensive data transfer. Federated learning also allows for examining different procedures when using the collective model. Once it collects this data, it updates the model. Federated machine learning offers numerous substantial benefits, including improved user privacy and protection, compliance with authorized observation, enhanced accuracy and different models, improved bandwidth efficiency, and better scalability. The federated learning method has been applied to the scheduling method. Federated learning has been optimized to different scheduling methods, namely, Random Reshuffling (RR) with gradient compression, which performs well with and without-replacement sampling of gradients.

The hybrid fault-tolerant scheduling algorithm (HFTSA) was developed in [1], focusing on independent tasks with deadlines. However, achieving efficient scheduling with a reduced makespan proved to be a challenging issue. The Fault-Tolerant and Data-Oriented Scientific Workflow Management and Scheduling System (FD-SWMS), as designed in [2], efficiently selects suitable resources based on data transfer time. However, implementing fault-tolerant techniques for big data applications in task-oriented scheduling is not considered.

An integer linear programming (ILP) optimization technique was developed in [3] with the aim of minimizing the makespan of tasks within a cloud data center. However, it did not consider heterogeneous resources within the cloud environment. A customized implementation of the genetic algorithm (GA) was introduced in [4] to schedule IoT requests to minimize overall latency. This implementation did not incorporate multi-objective optimization functions. As a result, the GA implementation might have reduced latency but potentially at the cost of suboptimal resource utilization, leading to inefficiencies in overall system performance.

A three-tier computing architecture was designed in [5] to schedule a maximum number of requests based on their deadline requirements. However, the scheduling model for IoT devices posed a significant challenge in cloud computing. A deep Q-learning algorithm was developed in [6] to detect near-optimal workflow task scheduling. However, it did not achieve improved performance when incorporating predictive fault-tolerant strategies. A failure-aware task scheduling approach was formulated in [7] to improve resource utilization performance for task execution. However, deep learning prediction models were not employed for precise task scheduling in the cloud.

Machine learning (ML) and deep learning (DL) techniques are increasingly essential in task scheduling within cloud computing environments due to their ability to handle complexity, uncertainty, and dynamic changes effectively. These algorithms can analyze large volumes of data to identify patterns and make predictions, which is crucial for tasks like predicting failure rates or optimizing scheduling decisions. A new artificial intelligence-driven, energy-aware, proactive, fault-tolerant scheduling method was developed in [8] to predict the failure rate of arriving tasks. However, the issue of larger-scale scheduling remained unaddressed. A fault-tolerant, trust-based task scheduling method was introduced in [9] to allocate tasks to suitable virtual machines and minimize the makespan. However, machine learning integration is not considered, limiting its adaptability and efficiency in dynamic environments. A deep reinforcement learning technique was developed in [10] for fault-tolerant-aware task scheduling. However, it does not achieve a higher throughput performance, indicating the complexity of optimizing scheduling decisions in real time.

The rapid increase in IoT devices has made it challenging for the cloud computing environment to handle a large number of tasks, including processing, storing, and analyzing. However, due to the dynamic nature of cloud data centers, fault tolerance is an essential consideration because of the inherent risks of component failures, network issues, and resource deficiency. Many existing task scheduling techniques focus on maximizing performance or optimizing resource usage but ignore the ability to handle system failures efficiently. Moreover, these existing techniques are inefficient for dealing with faults, causing more latency, resource inefficiency, and compromising system reliability. This problem is particularly critical in real-time applications. These problems guide the development of a novel scheduling algorithm that handles fault tolerance without sacrificing efficiency, latency, and scalability in IoT applications. Fault tolerance is vital to maintain continuous operation during unexpected failures, but it often introduces overheads regarding resource usage and processing time. Therefore, the developed scheduling algorithm is required to minimize system latency and maximize resource utilization. Additionally, the algorithm has to adapt dynamically to handle large numbers of requests generated from the IoT device without negatively impacting the overall system performance.

Based on the dynamic environment of cloud computing, task failures can occur for various reasons, such as resource unavailability, exceeding execution costs or time limits, memory constraints, over-utilization of resources, or improper installations. These failures lead to delays in the timely delivery of user-requested applications, particularly those with time-sensitive requirements. To address this challenge, this work proposes a novel technique named MRKFL-FTS. This technique aims to ensure fault tolerance in cloud IoT environments by optimizing task scheduling, thereby enhancing the reliability and efficiency of cloud services for users’ applications.

This work aims at the following main contributions:

A method is proposed to improve the fault tolerance and task scheduling efficiency developed in the cloud.
A multi-objective radial kernelized federated learning approach is designed to classify the resource-optimal virtual machines for task scheduling.
A novel radial kernel support vector regression is utilized in a local training model within federated learning to perceive the resource-efficient virtual machine. The stochastic steepest descent function is employed to reduce loss and improve the accuracy in finding resource-efficient virtual machines.
The proposed MRKFL-FTS is integrated with a task assigner that allocates tasks to the appropriate virtual machine using a weighted round-robin scheduling method.
The comprehensive and comparative experiments show that the MRKFL-FTS technique increases task scheduling efficiency and fault prediction accuracy by 7%, minimizes makespan and time complexity by 13%, and improves throughput by 32%. Moreover, the MRKFL-FTS technique also minimizes the response time complexity by 11%.

This paper is organized as follows: Section 2 introduces the related works. Section 3 outlines the MRKFL-FTS with a clear architecture diagram. Section 4 investigates the experimental settings and provides details about the dataset, and Section 5 presents the proposed algorithm’s performance evaluation, comparing it with established scheduling techniques across various metrics. Section 6 discusses the overall performance analysis results for various parameters. Section 7 illustrates the case study of the MRKFL-FTS technique with Amazon cloud data centers. Lastly, Section 8 includes the conclusions derived from this work.

2. Related Works

A real-time and dynamic fault-tolerant scheduling (ReadyFS) algorithm was developed in [11] to execute scientific workflows. This algorithm ensures deadline constraints and enhances resource utilization. Hybrid optimization algorithms were developed in [12] to establish an efficient scheduling and fault-tolerant mechanism. However, the scheduling process did not achieve the desired higher throughput performance.

An efficient adaptive fault-tolerant model (AFTM) was introduced in [13] to address the optimization of the job scheduling problem using Particle Swarm Optimization (PSO). A dynamic, time-sensitive scheduling algorithm was developed in [14]. However, it did not ensure the efficient integration of fault-tolerant techniques.

A quality of service (QoS)-aware scheduling algorithm was developed in [15] to enhance the fault tolerance during task execution. However, the algorithm was found to be inefficient in accurately scheduling tasks. Bi-objective optimization was developed in [16] to schedule tasks with the goal of minimizing both energy consumption and makespan. However, it did not address the challenge of energy- and quality of service (QoS)-aware scheduling for workflow applications in the cloud. The M/M/n queuing model, designed by [17], was based on a priority assignment algorithm for addressing task scheduling in cloud computing.

A rank-based, resource-aware, fault-tolerant method was developed in [18] for cloud platforms. However, it proved to be inapplicable for microservice-based cloud applications. An adaptive fault detector method was developed in [19] based on Improved Differential Evolution (IDE) to minimize the energy consumption, makespan, and task fault ratio. The Elastic pull-based Dynamic Fault-Tolerant (E-DFT) scheduling approach was designed [20], aiming to reduce response times in the multiple failures of independent tasks. However, it did not enhance the efficiency of the scheduling approach.

A scheduling approach with linear scaling-crow search optimization was developed in [21] as a multi-objective framework. However, this approach encountered limitations in incorporating advanced neural networks to optimize workflow scheduling efficiently. A dynamic task scheduling approach was developed in [22] to minimize energy consumption.

An integer linear programming (ILP) method was developed in [23] for scheduling tasks with minimal consumption of computational resources. A fuzzy self-defense method was developed in [24] for multi-objective task scheduling to improve efficient resource utilization. However, it did not enhance the performance of task scheduling. A multi-objective algorithm incorporating Q-learning was introduced in [25] to minimize the makespan, enhance throughput, and optimize average resource utilization. An energy-efficient, fault-tolerant scheduling method was developed in [26] to enhance system processing capacity while improving resource utilization.

The fault-tolerant approach was developed in [27] to achieve deadline-constrained task scheduling. Two semi-greedy-based algorithms were developed in [28] to map IoT tasks with a minimal makespan efficiently. A practical and adaptive fault-tolerant scheduling method was designed in [29] to provide error-free task scheduling. An efficient heuristic algorithm was designed for fault-tolerant virtual network placement in [30].

The Task Scheduling–Decision Tree (TS-DT) method was introduced in [31] to execute tasks with a lesser makespan. A new fault-tolerant scheduling approach was designed in [32] to efficiently increase the resource utilization of the cloud while guaranteeing fault tolerance.

The Energy-Aware Fault-Tolerant (EAFT) method was developed in [33] for efficiently scheduling tasks to optimize resource utilization. However, it did not consider the trade-off between energy consumption and fault tolerance within multi-level redundant data center networks.

The Fault-Tolerant Elastic Resource Management (FT-ERM) approach was introduced in [34] for mapping tasks to the selected server. A lightweight task scheduling approach was developed for cloud and edge platform applications in [35]. However, this approach was inefficient in geographically distributed scenarios. A quality of service (QoS)-based resource allocation and scheduling method was developed in [36] using swarm-based ant colony optimization.

An enhanced task scheduling approach was introduced in [37] to optimize task execution efficiency and reduce energy consumption. The modified version of the Heterogeneous Earliest Finish Time (HEFT) algorithm was developed in [38] to optimize task scheduling within a shorter time period. The Cost-based Job Scheduling (CJS) method was developed in [39] to enhance the overall performance of distributed scheduling by minimizing the makespan. An Energy- and Performance-Efficient Task Scheduling method was introduced in [40] for a heterogeneous virtualized cloud to address energy consumption issues. However, the approach did not consider multi-objective task scheduling. An improved squirrel search algorithm (SSA) was designed in [41] with the aim of scheduling the tasks by contributing the multiple objective functions’ makespan, throughput, and resource utilization. A simple scheduling method was designed in [42] for heterogeneous computing systems to minimize the task running time and complexity.

An optimal hybrid metaheuristic algorithm was designed in [43] to enhance the task scheduling efficiency and reduce the makespan. An enhanced load balancing method was developed in [44] to address the issue of dynamically distributing tasks with minimal resource utilization. A novel hybrid multi-objective optimization algorithm was developed in [45] to address the scheduling problem and efficiently allocate resources. However, it was not suitable for a hybrid cloud computing model. A hybrid genetic algorithm was designed [46] for reliable and cost-efficient task scheduling within various cloud environments. However, the dynamic failure-aware task scheduling remained unaddressed. A hybrid optimization algorithm was developed in [47] for scheduling tasks with minimal energy consumption.

Machine learning inference-serving models were developed in [48] to optimize the inference and improve the scalability and system efficiency with minimal latency. However, they failed to integrate serverless platforms and various machine learning frameworks. A detailed taxonomy of several serverless computing models was developed in [49] for handling the cold start latency by optimizing the loading time and resource usage. However, high efficiency was a major concern. Function placement approaches were developed in [50] for serverless computing to improve the system efficiency. However, the execution time, resource consumption, and cost were not reduced. The Q-learning Auto-scaling Offloading (QLAO) method was introduced in [51] to minimize latency, cost, and energy computation. However, load prediction was not performed. A review of the content caching approach was analyzed in [52] for an edge computing environment. However, the model had limited storage capacity compared to centralized cloud servers. Cold start latency mitigation approaches were developed in [53] for serverless computing platforms. The algorithm minimizes the response time, but the resource utilization is not minimized.

Scheduling approaches were developed in [54] for serverless computing across different computing environments, such as cloud, edge, and fog. The model performed more energy-efficient and deadline-aware scheduling. But the latency was a major issue. Function offloading methods were developed in [55] for serverless computing to improve the system’s efficiency and minimize the latency. However, the throughput performance did not improve. A hybrid task scheduling technique was designed in [56] using fuzzy logic and a deep reinforcement learning algorithm to enhance the performance of the makespan, energy consumption, cost, and fault tolerance. However, high throughput performance was not achieved. A trust- and reliability-aware task scheduler with an Actor–Critic (RTATSA2C) model was developed in [57] for dynamic resource scaling. The designed algorithm reduced the makespan while improving reliability, fault tolerance, trust, and scalability. However, the efficiency of the algorithm was not improved.

Integration of a hybrid EMD-Transformer-BiLSTM prediction model with a blockchain-backed federated learning mechanism was designed in [58] for providing secure, tamper-proof decentralized model updates. But the accuracy was not improved. A federated hybrid CC system based on an extension of Apache Virtual Computing Lab (VCL) was developed in [59]. The proposed system provided an independent open-source implementation, greater configuration flexibility, and methodological improvements. However, the cost was not minimized. A fault-tolerant IIoHT framework based on digital twin (DT) federated learning-enabled fog-cloud models was designed in [60]. However, it did not consider reducing the response time. A federated learning (FL) system was designed in [61] that includes a reputation mechanism to help manufacturers leverage customer data to train machine learning models. However, the efficiency of the algorithm was not improved. A novel approach was developed in [62] for resource prediction for cloud network load balancing, incorporating federated learning within a blockchain framework for secure and distributed management. But the energy consumption was not minimized.

Despite significant advancements in task scheduling and fault tolerance strategies within cloud computing, the existing approaches still exhibit several limitations. Prior methods, such as the HFTSA, FD-SWMS, ILP, and genetic algorithms, have primarily focused on optimizing individual metrics, like the makespan, throughput, or latency. However, these models often fail to account for the complex interplay of multiple objectives, particularly under heterogeneous cloud environments with large-scale IoT workloads. Most notably, these techniques lack a robust mechanism for accurately classifying resource-efficient virtual machines while simultaneously minimizing computational overhead. Furthermore, recent hybrid or federated approaches do not effectively integrate fault tolerance with predictive learning models, leading to suboptimal performance in terms of energy efficiency, response time, and task failure rate. Additionally, many conventional solutions do not preserve data privacy during VM classification, which is increasingly critical in distributed cloud systems. These shortcomings highlight the need for a scalable, multi-objective framework that not only ensures accurate VM selection through intelligent learning but also adapts efficiently to real-time, fault-prone cloud environments.

Table 1 provides a list of key performance parameters used for evaluating task scheduling methods, such as the latency (L), energy consumption (EC), memory usage (MEM), computational cost (CC), response time (RT), makespan (M), throughput (Th), efficiency (E), and fault prediction accuracy (FPA), along with the precision (P), recall (R), and F1 score. Table 2 summarizes various literature methods, highlighting their objectives, strengths, and limitations with respect to the parameters listed in Table 1.

3. Proposal Methodology

Cloud computing is a virtualized and scalable computing model that dynamically offers diverse resources and services. It operates through a network of servers, employing numerous physical machines known as virtual machines to distribute cloud services. Due to the dynamic arrival of computing task flows in the cloud computing environment, task failures may occur due to the unavailability of resources, execution cost and time exceeding, system running out of memory, over-utilization of resources, and improper installation. These faults cause cloud computing to fail to deliver on the time-sensitive requirements of user-requested applications. Therefore, a novel MRKFL-FTS technique has been developed to guarantee the fault-tolerant cloud IoT for task scheduling.

The main objective of the paper is to enhance resource-efficient, fault-tolerant IoT request scheduling in dynamic cloud environments. To achieve this objective, the MRKFL-FTS technique is introduced, which includes two methods, namely, Multi-objective Radial Kernelized Federated Learning (MRKFL) and the weighted round-robin scheduling method. The MRKFL method determines resource-efficient virtual machines based on factors, such as energy, memory, CPU time, and average failure rate. After classifying the resource-efficient virtual machines, the server assigns incoming IoT requests to the appropriate virtual machines using the weighted round-robin scheduling method. Figure 1 illustrates the proposed MRKFL-FTS technique for IoT-aware resource-optimized scheduling in cloud environments. The inputs to the system include IoT service requests from users, while the outputs are the scheduled allocation of these requests to virtual machines based on resource efficiency and fault tolerance. The MRKFL technique involves a machine learning approach that combines multiple objectives to select the most suitable virtual machine for task scheduling. It leverages federated learning across decentralized edge devices to enhance the scheduling process. The system measures various parameters such as the energy consumption, memory usage, and CPU performance of virtual machines to determine their efficiency and suitability for task allocation. Then, fault tolerance mechanisms are used to ensure reliable task scheduling even in the presence of failures or disruptions. The local training model evaluates the performance of virtual machines based on factors like energy consumption, memory usage, CPU performance, and failure rate using Radial Kernelized Support Vector Regression(RKSVR). Locally trained models are combined to create a global aggregation model that represents the overall performance and suitability of virtual machines for task scheduling. Thus, the system identifies virtual machines that are both resource-efficient and have a minimal failure rate, ensuring optimal task allocation and fault tolerance. The task assigner schedules incoming IoT service requests to the selected virtual machine using a weighted round-robin scheduling method, optimizing resource utilization and enhancing fault tolerance in the scheduling process. By following these steps, utilizing the MRKFL-FTS technique, the system aims to improve the efficiency, reliability, and fault tolerance of IoT request scheduling in cloud environments.

3.1. Cloud IoT Environment Model

The proposed MRKFL-FTS technique involves two models, namely, C-RAN and V-RAN. C-RAN is a centralized, cloud computing-based architecture for radio access networks that enables large-scale deployment and real-time virtualization capacity. V-RAN (virtualized radio access network) is responsible for connecting IoT devices to the cloud server network. In a C-RAN environment, multiple servers

S = S_{1}, S_{2}, S_{3}, \dots, S_{y}

where

S_{j} \in S

are present, with each server including several virtual machines

V M = V M_{1}, V M_{2}, V M_{3} \dots, V M_{n}

where

V M_{i} \in V M

. In a V-RAN, the cloud incorporates numerous IoT devices to connect these devices to the cloud server network for scheduling. Initially, cloud IoT devices or users submit incoming tasks/requests

T = T_{1}, T_{2}, T_{3}, \dots, T_{b}

, where

T_{g} \in T

to be executed on cloud servers. Upon receiving the request, the proposed MRKFL-FTS identifies the resource-efficient and minimal failure-prone virtual machine. Subsequently, the task assigner schedules incoming IoT tasks to the resource-efficient virtual machine with a minimal failure rate.

3.2. Multi-Objective Radial Kernelized Federated Learning

Federated learning is a machine learning technique that allows a model to be trained across multiple decentralized edge devices (such as IoT devices or other local servers) without exchanging raw data. It involves training an algorithm locally across multiple resources and obtaining a global aggregation model. In the local training phase, a task assigner evaluates various objective functions of the virtual machine, such as the energy consumption, memory usage, CPU performance, and failure rate, by applying kernelized support vector regression. Then, the locally trained models are combined, and the resulting model is fed into the global aggregation model.

Figure 2 illustrates the structure of the Multi-objective Radial Kernelized Federated Learning (MRKFL) technique. In the figure, three visually identical input blocks and their corresponding local training models are depicted. While these models appear similar, they are deliberately designed to represent three distinct sets of input tasks arriving from different IoT users or application domains. Each input block corresponds to a unique batch of IoT-generated tasks that are dynamically received by the cloud server. These task batches differ in characteristics, such as resource demand, execution priority, and deadline constraints. Once received, each batch is processed independently by a dedicated local training model. These models employ RKSVR to classify the available virtual machines (VMs) based on multiple resource-oriented objectives, including energy availability, memory usage, CPU time, and average failure rate.

Although the three local models share an identical structure, the key distinction lies in the heterogeneity of their input task profiles. Each model is trained on different data, reflecting the unique demands of the incoming requests. After local training, each model outputs its classification of resource-efficient VMs. These locally optimized results are then forwarded to the global aggregation model, which combines them to produce a unified, optimized task scheduling decision.

Let us consider the virtual machines

V M = V M_{1}, V M_{2}, V M_{3} \dots, V M_{n}

in a cloud datacenter and the consumption of energy, memory, and CPU time, and the average failure rates across virtual machines. Proper management of energy resources in virtual machines can lead to improved sustainability and cost-effectiveness. Memory allocation and optimization among the virtual machines are critical for maintaining system stability and preventing resource contention. Effective distribution of CPU time among the virtual machines is necessary for achieving optimal processing power and performance for various applications running on the cloud infrastructure. Monitoring and addressing the average failure rates in virtual machines are vital for ensuring service delivery and minimizing downtime. Thus, cloud providers can enhance the overall performance of cloud computing services.

Energy plays a major role in task scheduling. Therefore, managing the energy consumption of virtual machines is crucial for optimizing resource utilization and reducing operational costs. First, the energy availability is estimated as follows

E_{a i} = E_{k i} - E_{c i}

(1)

where

E_{a i}

indicates the energy availability of the virtual machine

V M_{i}

,

E_{k i}

indicates the total energy of the virtual machine

V M_{i}

, and

E_{c i}

denotes the consumed energy of the virtual machine

V M_{i}

.

Memory is a significant resource in task scheduling, measured by the amount of storage space needed to execute tasks. Therefore, memory availability is evaluated to find a virtual machine’s storage capacity.

μ_{a i} = μ_{k i} - μ_{c i}

(2)

where

μ_{a i}

designates the memory availability of the virtual machine

V M_{i}

,

μ_{k i}

indicates the total memory capacity of the virtual machine

V M_{i}

, and

μ_{c i}

indicates the memory consumption of the virtual machine

V M_{i}

.

The available CPU time within

V M_{i}

is within a specific time window B.

t_{a i} = \sum_{g = 1}^{b^{'}} t_{g_{i}} - B

(3)

where

t_{a i}

denotes the available CPU time of the virtual machine based on the allocated tasks within the time window B. This is calculated by discretizing time where

t_{g i}

represents the execution time of task g in

V M_{i}

, and

b^{'}

represents the number of allocated tasks in

V M_{i}

.

Following that, the average failure rate of a virtual machine is calculated as the ratio of the number of tasks that have previously failed to be executed by the machine to the total number of tasks submitted for execution on the machine. This is expressed by the following formula:

A F R_{i} = \frac{ω_{i}^{f}}{ω_{i}^{s}}

(4)

where

A F R_{i}

denotes an average failure rate of the virtual machine,

ω_{i}^{f}

indicates a number of failed tasks of the virtual machine

V M_{i}

, and

ω_{i}^{s}

indicates a number of submitted tasks of the virtual machine

V M_{i}

. After computing the multi-objective functions, RKSVR is employed to select the virtual machine.

Radial kernelized Support Vector Regression (RKSVR) is a machine learning technique used to find a hyperplane that categorizes the input into different classes based on the relationships between the resources of a virtual machine and the corresponding threshold value.

The optimal hyperplane

H P_{i}

is the decision boundary that is used to make predictions. Therefore, the optimal hyperplane is estimated as follows:

R_{i} = W_{E_{a i}} E_{a_{i}} + W_{μ_{a_{i}}} μ_{a_{i}} + W_{t_{a_{i}}} t_{a_{i}} \forall i

(5)

H P_{i} \to w . R_{i} + β = 0

(6)

From Equation (5),

R_{i}

denotes the resource metrics of

V M_{i}

. The equations specify weights for each performance metric.

W_{E_{a_{i}}}

represents the weight of the energy efficiency metric within

V M_{i}

. Similarly,

W_{μ_{a_{i}}}

represents the weight metric of the available memory on

V M_{i}

and

W_{t_{a_{i}}}

represents the weight of the available CPU time on

V M_{i}

.

Equation (6) represents the hyperplane, which implies the value derived from Equation (5) for

R_{i}

and

β

represents a bias. Following that, the two marginal hyperplanes are selected either above or below the decision boundaries as defined by Equations (7) and (8).

M_{1 i} \to R_{i} + β > 0

(7)

M_{2 i} \to R_{i} + β < 0

(8)

where

M_{1 i}

and

M_{2 i}

indicate marginal hyperplanes for classifying

V M_{i}

. If the value from Equation (7)

M_{1 i}

is greater than 0, then

V M_{i}

is classified as resource-efficient (i.e., +1). If the value from Equation (8)

M_{2 i}

is less than 0, then

V M_{i}

is classified as not resource-efficient (i.e., −1).

Hyperplanes use the radial kernel function to measure the similarity between the estimated resources of the virtual machine and its threshold value. The similarity is estimated as given in Equation (9).

R K_{i} = e x p (- \frac{1}{2 σ^{2}} ∣ R_{i} - T H ∣^{2})

(9)

from Equation (9),

R K_{i}

is the radial kernel function of

V M_{i}

,

∣ R_{i} - T H ∣

is the difference between the resource of virtual machine

R_{i}

and threshold

T H

, and

σ

indicates a deviation parameter

σ > 0

. The output results of the radial kernel function vary between 0 and 1. Based on the similarity, the hyperplanes classify

V M_{i}

if it is above or below the decision boundary. If the output is 1, then the hyperplanes classify

V M_{i}

as

M_{1 i}

. If the kernel output is 0, then the hyperplane classifies

V M_{i}

as

M_{2 i}

.

Figure 3 illustrates the classification results generated by the radial kernelized support vector regression (RK-SVR) model using actual test data obtained from the MRKFL-FTS framework. In this output, real virtual machine (VM) metrics—such as CPU utilization, memory usage, and task execution patterns—are used as input features to train and test the classifier. The RK-SVR model applies a radial kernel to map these input features into a higher-dimensional space where a separating hyperplane can be constructed. This hyperplane effectively distinguishes between resource-efficient and non-efficient virtual machines. In this context, a classification value of +1 indicates that the virtual machine is resource-efficient and suitable for scheduling tasks, while a value of −1 denotes that the machine does not meet the efficiency criteria. Unlike a schematic diagram, this figure visualizes actual classification boundaries and support vectors derived from empirical data, providing a more accurate and realistic view of how the model behaves in practice. The resulting plot shows a clear separation between the two classes, validating the effectiveness of the MRKFL-FTS in identifying optimal VMs for task assignment in a dynamic cloud environment. This contributes directly to minimizing computational overhead and improving scheduling efficiency.

A cross-entropy loss function is employed to evaluate the model’s performance. This loss function measures the difference between the actual and predicted probabilities for the classification task. Specifically, it quantifies the uncertainty in the predictions made by the radial kernelized model from Equation (9), where the output is either resource-efficient or not resource-efficient. The cross-entropy loss function, denoted as L, is defined as follows:

L_{R_{i}} = - \frac{1}{n} \sum_{i = 1}^{n} [R K_{i}] \cdot log (\hat{R K_{i}}) + (1 - R K_{i}) \cdot log (1 - \hat{R K_{i}})]

(10)

L_{R_{i}}

, the cross-entropy loss function, measures the error between the actual and predicted classifications of the virtual machine’s

V M_{i}

resource efficiency.

R K_{i}

represents the actual classification label (resource-efficient or not) for virtual machine

V M_{i}

.

R_{i}^{K}

is the predicted probability that the virtual machine

V M_{i}

is resource-efficient, and n is the total number of virtual machines being evaluated.

In the proposed MRKFL-FTS technique, the global aggregator model is essential to unify the outputs from

V M_{i}

trained with local IoT service data. Each VM computes its radial kernel output, reflecting localized insights into task scheduling and fault tolerance. The global aggregator combines these outputs using a weighted average to create a more generalized model that enhances classification accuracy across all VMs. This aggregation process addresses the challenge of data heterogeneity among VMs, ensuring that the model benefits from diverse training data while preserving data privacy. Furthermore, the global model minimizes the cross-entropy loss function, reducing discrepancies between predicted and actual classifications for resource-efficient task assignments. By incorporating the global aggregator, the MRKFL-FTS technique achieves robust and resource-aware task scheduling, enabling efficient IoT service management even in the presence of faults or non-uniform data distributions across VMs.

A = \sum_{i = 1}^{n} R K_{i} * α_{i}

(11)

where A represents the global aggregator model,

R K_{i}

is the radial kernel output of the virtual machine

V M_{i}

, and

α_{i}

denotes a weight of the local model result for the virtual machine

V M_{i}

.

The federated learning framework aims to minimize the global loss function across all virtual machines. This objective function optimizes the overall learning accuracy.

Z = m i n L_{R_{i}} \forall R_{i}

(12)

where Z denotes a global objective function.

L_{R_{i}}

denotes the cross-entropy loss function, which measures the difference between the actual and predicted resource-efficient classifications, i.e., the loss function that the aggregation model aims to reduce the losses in the federated learning process. The term

m i n

is a mathematical notation that identifies the parameter values (resource allocation strategies or weights) that minimize the global loss function

L_{R_{i}}

. Minimizing this loss ensures that the aggregated model optimizes its performance in terms of resource efficiency. By reducing the loss, the system iteratively improves its predictions for resource-efficient scheduling, leading to a balanced utilization of energy, memory, and CPU time. This minimization process is critical to ensuring that the federated learning framework not only enhances classification accuracy but also operates effectively within the constraints of the cloud IoT environment.

A stochastic steepest descent function is employed to optimize the loss. In this process, each virtual machine

V M_{i}

is associated with a weight

α_{t}

at time t, which reflects its efficiency or suitability for handling incoming IoT service requests. The weight is iteratively updated using gradient descent to minimize the loss

L_{R_{i}}

specific to that VM. The update rule is defined as follows:

α_{t + 1} = α_{t} - η [\frac{\partial L_{R_{i}}}{\partial α_{t}}]

(13)

Here,

α_{t + 1}

denotes the updated weight for

V M_{i}

,

α_{t}

is the current weight, and

η

represents the learning rate (

η < 1

), controlling the update step size. The term

\frac{\partial L_{R_{i}}}{\partial α_{t}}

refers to the partial derivative of the local loss function

L_{R_{i}}

with respect to

α_{t}

, which quantifies how sensitive the loss is to the weight adjustment for

V M_{i}

. This process is repeated iteratively until convergence or the maximum number of iterations is reached. By applying this adaptive update mechanism, the model continuously refines its classification of resource-efficient virtual machines. Consequently, after identifying the most efficient VM, the task assigner schedules the incoming IoT service requests to this selected VM, thereby improving overall resource utilization, reducing the execution loss, and enhancing the fault-tolerant scheduling in cloud environments.

Algorithm 1 describes the processing steps for efficient IoT task scheduling in cloud computing using a Multi-objective Radial Kernelized Federated Learning (MRKFL) technique. The federated learning process begins with deploying a local training model across a set of virtual machines. Within the local training model, support vector regression constructs a hyperplane as a decision boundary for analyzing the resources of each virtual machine and determining a threshold value based on the radial kernel function. Depending on the kernel outcome, the hyperplane categorizes a virtual machine as either resource-efficient or not. The global aggregation model then combines all the local training output results through a weighted average model. For each output result, an objective function is defined. The weights are updated based on the loss value. The updated model weight is sent back to the local training model for training on the newly generated input. Finally, the classification of resource-efficient virtual machines is obtained.

Algorithm 1 Multi-objective Radial Kernelized Federated Learning (MRKFL)

1:: Input: Number of virtual machines $V M_{i} = V M_{1}, V M_{2}, V M_{3}, \dots, V M_{n}$
2:: Output: Classification of virtual machines
3:: Begin
4:: for each incoming IoT request $T_{g}$
5:: for each virtual machine $V M_{i}$
6:: Estimate resources $E_{a i}$ , $μ_{a i}$ , $t_{a i}$ , $A F R_{i}$
7:: Apply MRKSVR in local training
8:: Construct hyperplane $H P_{i}$
9:: Find two marginal hyperplanes $M_{1 i}$ , $M_{2 i}$
10:: Measure the radial kernel between the resource and threshold using (9)
11:: if $R K_{i} > 0$
12:: Virtual machine is classified as resource-efficient
13:: else
14:: Virtual machine is classified as not resource-efficient
15:: end if
16:: Find global aggregation model by weighted average using Equation (11)
17:: Minimize the loss function for the local training model using Equation (12)
18:: Update the weight $α_{t + 1}$ using Equation (13)
19:: if Minimum of loss is reached
20:: Convergence is met
21:: else
22:: Go to Step 16
23:: end if
24:: Return: Output results (Classified Virtual Machines)
25:: end for
26:: End

3.3. Weighted Round-Robin IoT Request Scheduling

This section introduces the integration of the classified resource-efficient virtual machines with an allocation technique. As this paper’s contribution focuses on the classification mechanism presented in Section 3.2, the integration shall improve scheduling solutions in cloud computing. To demonstrate the impact of the proposed MRKFL classification on the quality of the solution provided, we integrate it with the well-known round-robin technique.

We denote the resource-efficient virtual machine obtained by the MRKFL technique specified in Section 3.2 as

V M^{'} = V M_{1}, V M_{2}, \dots V M_{u}

, where

u \leq n

. To incorporate the classical round robin with MRKFL, a weight is assigned to each virtual machine based on the resource efficiency classification, which is based on the three dimensions of energy, memory, and CPU time produced by the proposed MRKFL technique presented in Algorithm 1. Throughout the paper, we will refer to the classical round-robin integration with the proposed MRKFL as the weighted round robin (WRR).

The WRR algorithm integrates a queue mechanism to manage the incoming workload systematically. Tasks from users are placed in a queue, where they are processed in a first-in, first-out order. The tasks are assigned to VMs based on their weights. If a task is not completing its execution, it is moved to the back of the queue, allowing the next task to proceed. This cyclic scheduling ensures effective utilization of available resources, prevents overloading any single VM, and balances the workload while maintaining a seamless flow of task execution.

Algorithm 2 describes the weighted round-robin IoT request scheduling method, which allocates tasks to resource-efficient virtual machines

V M^{'}

. For each virtual machine from

V M^{'}

, a weight is assigned based on its resource efficiency. IoT requests are then fetched from the task queue and scheduled to the appropriate VM based on their weight. If a task is executed successfully, the scheduling for that task is marked as complete. If the task is not executed, it is returned to the queue. This process continues iteratively for each VM, ensuring the execution of all tasks.

Algorithm 2 Weighted Round-Robin IoT Request Scheduling

1:: Input: Resource-efficient virtual machines $V M^{'} = V M_{1}^{'}, V M_{2}^{'}, V M_{3}^{'}, \dots, V M_{u}^{'}$ , where $u \leq n$ , Incoming tasks/IoT requests $T = T_{1}, T_{2}, T_{3}, \dots, T_{b}$ , where $T_{g} \in T$
2:: Output: Allocated IoT requests to $V M^{'}$
3:: Begin
4:: for each resource-efficient virtual machine $V M_{i}^{'}$ do
5:: Allocate weight to $V M_{i}$ based on resource efficiency
6:: end for
7:: while there are IoT requests in the task queue do
8:: for each resource-efficient virtual machine $V M_{i}^{'}$ do
9:: if a task is available in the queue then
10:: Assign the task $T_{g}$ to $V M_{i}^{'}$ with the highest weight.
11:: if the task completes successfully then
12:: Mark the task as completed
13:: else
14:: Reinsert the task into the queue for rescheduling
15:: end if
16:: end if
17:: end for
18:: end while
19:: End

4. Experimental Setup

The experimental analysis of the MRKFL-FTS technique and the existing methods HFTSA [1] and FD-SWMS [2] and the hybrid EMD-Transformer-BiLSTM prediction model [58] has been implemented in the Java programming language using the CloudSim simulator. To conduct the experiment, Personal Cloud Datasets were considered and collected from Supplementary File and the features description discussed in Table 3. The primary objective of this dataset is to schedule workloads or IoT requests within a cloud environment. The dataset comprises 17 attributes and 66,245 instances. These attributes include significant information such as unique identifiers for rows and accounts, task sizes, operation start and end times, operation types, bandwidth traces, node details including IP addresses and names, allocated quotas with start and end times, storage capacities, failure indicators, and failure information. The dataset is divided into the training dataset and the testing dataset. The training dataset is 70% and the testing dataset is 30%. Table 4 shows the experimental setup used for the proposed algorithm and the existing algorithms.

4.1. Experimentation Results

The Experimentation Results section presents the findings obtained from a series of controlled experiments designed to evaluate the performance and effectiveness of the proposed methodology. The primary objective of these experiments is to assess the impact of the MRKFL technique and its performance measure. This section analyzes the captured classification accuracy results, average resource utilization rates, and average waiting time using the proposed MRKFL-FTS technique with the existing HFTSA [1] and FD-SWMS methods [2] and the hybrid EMD-Transformer-BiLSTM prediction model [58].

4.1.1. Classification Accuracy

Resource-efficient virtual machine classification aims to accurately predict the appropriate virtual machine to optimize resource usage, such as CPU, memory, and energy, in cloud computing environments. The classification accuracy refers to correctly categorizing the VMs based on resource utilization.

A c c u r a c y = (\frac{N u m b e r o f c o r r e c t l y c l a s s i f i e d v i r t u a l m a c h i n e s}{T o t a l n u m b e r o f v i r t u a l m a c h i n e s}) \times 100

(14)

where

A c c u r a c y

denotes the classification accuracy. It is measured in terms of %.

Figure 4 and Table 5 illustrate the experimental results of the classification accuracy for four techniques: the proposed MRKFL-FTS, HFTSA [1], FD-SWMS [2], and hybrid EMD-Transformer-BiLSTM prediction model [58]. Classification accuracy refers to the ability of the technique to correctly categorize virtual machines (VMs) as resource-efficient or not, which is essential for enhancing task and request scheduling in cloud computing environments. As shown in Table 5, the MRKFL-FTS technique consistently outperforms the other methods across different numbers of VMs. For instance, with 10 VMs, the MRKFL-FTS achieves an accuracy of 98%, while the HFTSA, FD-SWMS, and hybrid model achieve 94%, 92%, and 90%, respectively. When the number of VMs increases to 30, the MRKFL-FTS reaches its peak performance at 99%, whereas the accuracies of the HFTSA, FD-SWMS, and hybrid model remain at 93%, 91%, and 90%, respectively. Overall, the MRKFL-FTS technique maintains a high classification accuracy (between 97% and 99%) across all the tested VM counts (10 to 50), clearly demonstrating its robustness and scalability. This superior performance is attributed to its use of radial kernelized support vector regression (RK-SVR), which enhances the ability to identify resource-efficient virtual machines accurately.

4.1.2. Average Resource Utilization

It is a measure of how efficiently resources such as CPU, memory, storage, and network bandwidth are being utilized by virtual machines in a cloud environment. It represents the proportion of a resource’s utilization by each virtual machine to the total number of virtual machines.

A v e r a g e = (\frac{S u m o f r e s o u r c e u t i l i z a t i o n f o r a l l v i r t u a l m a c h i n e s}{T o t a l n u m b e r o f v i r t u a l m a c h i n e s}) \times 100

(15)

where

A v e r a g e

denotes the average resource utilization. It is measured in percentage (%).

The experimental settings for evaluating average resource utilization across different virtual machine configurations are outlined in Table 6. The simulation considers between 1000 and 10,000 tasks, with the number of virtual machines (VMs) ranging from 10 to 50. As presented in Figure 5 and Table 6, the MRKFL-FTS technique demonstrates efficient and scalable resource utilization. It starts at 18% for 10 VMs and increases steadily, reaching 51% for 50 VMs. In comparison, the HFTSA [1] shows higher initial utilization at 22% with 10 VMs, rising to 56% with 50 VMs. The FD-SWMS [2] starts at 25% and increases to 60%, exhibiting a steeper rise but lacking the consistency of the MRKFL-FTS. Meanwhile, the hybrid EMD-Transformer-BiLSTM prediction model [58] achieves the highest utilization across all the VM counts, starting at 28% and peaking at 62%. Although the hybrid model achieves the highest resource utilization, the MRKFL-FTS offers a better balance between efficiency and predictability. Its consistent performance across increasing VM counts suggests a more controlled and optimized resource management process. Unlike the FD-SWMS and HFTSA, whose performance may fluctuate with scale, the MRKFL-FTS maintains a gradual and stable improvement, making it a more dependable solution for resource optimization in dynamic cloud computing environments.

4.1.3. Average Waiting Time

It refers to the mean amount of time that tasks spend waiting in the queue before they are scheduled to resource-efficient virtual machines. It is measured in milliseconds (ms).

A v e r a g e W a i t i n g T i m e = (\frac{S u m o f w a i t i n g t i m e s o f a l l t a s k s i n q u e u e}{T o t a l n u m b e r o f t a s k s}) \times 100

(16)

The experimental results presented in Figure 6 and Table 7 illustrate the average waiting time (in milliseconds) for four scheduling techniques—namely, the proposed MRKFL-FTS, HFTSA [1], FD-SWMS [2], and hybrid EMD-Transformer-BiLSTM prediction model [58]—across 11 time slices ranging from 0 ms to 10 ms. The simulation involves scheduling 1000 tasks across these time slices, with the waiting time measured as the delay each task experiences before being allocated to a resource-efficient virtual machine. At time slice 0, all the methods show a waiting time of 0 ms, as no tasks are yet present in the queue. As the time slices increase, the waiting times rise across all the methods, but the MRKFL-FTS consistently achieves the lowest waiting time. For instance, at time slice 1, the MRKFL-FTS records 4 ms, while the HFTSA, FD-SWMS, and hybrid model report 6 ms, 8 ms, and 9 ms, respectively. By time slice 10, the MRKFL-FTS maintains the lowest average waiting time at 10.7 ms, compared to 13 ms for the HFTSA, 15.8 ms for the FD-SWMS, and 16.8 ms for the hybrid model. These results clearly indicate that the MRKFL-FTS outperforms the other techniques in minimizing task queuing delays. This effectiveness becomes more pronounced as the task load increases over time slices, confirming MRKFL-FTS as a more efficient and scalable scheduling approach for cloud environments.

4.1.4. Computation Overhead

Computation overhead in data transmission refers to the total time it takes for data to travel from its sender node to the receiver node.

C O = \sum_{j = 1}^{m} M_{i} \cdot T

(17)

where

C O

indicates the computation overhead, M represents the number of virtual machines, and T denotes the task execution time. The computation overhead is measured in milliseconds (ms).

The experimental results shown in Table 8 and Figure 7 present the computation overhead (measured in milliseconds) for four task scheduling techniques—the MRKFL-FTS, HFTSA [1], FD-SWMS [2], and hybrid EMD-Transformer-BiLSTM prediction model [58]—across increasing numbers of virtual machines (VMs), ranging from 10 to 50. As the number of virtual machines increases, the computation overhead also increases for all the methods, which is expected due to the higher volume of task assignments and resource coordination. However, the MRKFL-FTS technique consistently exhibits lower computation overhead compared to the other methods at each scale. For example, at 10 VMs, the MRKFL-FTS records a computation overhead of 65 ms, while the HFTSA, FD-SWMS, and hybrid model report 72 ms, 80.6 ms, and 82.6 ms, respectively. As the scale increases to 30 VMs, the MRKFL-FTS maintains a moderate increase to 75.8 ms, whereas the HFTSA reaches 82.6 ms, the FD-SWMS climbs to 90.6 ms, and the hybrid model peaks at 95.6 ms. At 50 VMs, the MRKFL-FTS shows an overhead of 90.3 ms, which is significantly lower than the HFTSA (104.5 ms), the FD-SWMS (112.6 ms), and the hybrid model (114.6 ms). These results demonstrate that the MRKFL-FTS is more efficient in handling computation load, especially in large-scale cloud environments. Radial kernelized support vector regression is a machine learning technique used to find a hyperplane that categorizes the input into different classes based on the relationships between the resources of the virtual machine and the corresponding threshold value. Computation overhead is not minimized in existing methods; therefore, a Radial Kernelized support vector regression approach is used to identify the most efficient virtual machine, which helps reduce this overhead. This method minimizes redundant processing and intelligently distributes tasks by combining federated learning with Radial Kernelized approach.

4.1.5. Result Analysis of Precision, Recall, and F1 Score

Precision: In the context of virtual machine (VM) classification, precision refers to the effectiveness of a classification algorithm in accurately predicting specific characteristics of VMs based on various factors, such as resource usage. It is mathematically computed as follows:

Precision = \frac{T P}{T P + F P}

(18)

where

T P

(true positive) denotes the number of VMs correctly identified as resource-efficient, and

F P

(false positive) refers to the VMs incorrectly identified as resource-efficient.

Recall: Also known as sensitivity, recall is a key metric that measures the ability of a classification model to identify resource-efficient virtual machines. In VM classification, recall refers to the proportion of actual positive cases that are correctly identified by the model:

Recall = \frac{T P}{T P + F N}

(19)

where

F N

(false negative) represents the VMs that are actually resource-efficient but were not correctly identified as such by the classifier.

F1 Score: The F1 score is the harmonic mean of the precision and recall, combining both metrics into a single performance measure. It is especially useful when there is an uneven class distribution and a balance between precision and recall is needed:

F 1 Score = 2 \times \frac{Precision \times Recall}{Precision + Recall}

(20)

Figure 8, Figure 9 and Figure 10 present the comparative analysis of the precision, recall, and F1 score for four classification methods—the MRKFL-FTS, HFTSA [1], FD-SWMS [2], and hybrid EMD-Transformer-BiLSTM prediction model [58]—across varying numbers of virtual machines (VMs) ranging from 10 to 50. The results indicate that the MRKFL-FTS technique consistently achieves the highest values for all three metrics, demonstrating its effectiveness in classifying virtual machines as resource-efficient. For instance, at 10 VMs, the MRKFL-FTS records a precision of 0.963, a recall of 0.948, and an F1 score of 0.955. These values are superior to the HFTSA (0.942, 0.926, and 0.933), FD-SWMS (0.926, 0.911, and 0.918), and hybrid model (0.916, 0.910, and 0.915), respectively.

As the number of VMs increases, the MRKFL-FTS continues to maintain robust performance. At 50 VMs, it achieves a precision of 0.965, a recall of 0.957, and an F1 score of 0.960, while the other methods show slightly lower scores. Although the hybrid EMD-Transformer-BiLSTM shows competitive performance, especially in recall (e.g., 0.910 at 10 VMs and 0.901 at 50 VMs), its precision and F1 score remain slightly below those of the MRKFL-FTS. These findings demonstrate the superiority of the MRKFL-FTS technique in balancing both precision and recall, as reflected in its consistently high F1 scores. This indicates its ability to accurately and reliably identify resource-efficient virtual machines, which is essential for optimizing task scheduling and improving overall system performance in cloud computing environments.

5. Performance Analysis

This section presents the experimental assessment of the MRKFL-FTS technique, and the existing methods HFTSA [1], FD-SWMS [2], and hybrid EMD-Transformer-BiLSTM [58], with a focus on task scheduling efficiency, makespan, throughput, response time, fault prediction accuracy, and time complexity. The performance of these parameters is analyzed using tables and graphical representations.

5.1. Task Scheduling Efficiency

The task scheduling efficiency parameter, also called the success ratio, refers to the ratio of successfully allocated tasks to the total number of submitted tasks. This efficiency is calculated using Equation (17):

E f f i c i e n c y = (\frac{Correctly Scheduled Tasks}{Total Tasks}) \times 100

(21)

where

E f f i c i e n c y

represents the task scheduling efficiency. It is measured in terms of percentage (%).

Figure 11 and Table 9 illustrates the efficiency of task scheduling with varying numbers of IoT requests, ranging from 1000 to 10,000, as extracted from the dataset. The comparison involves four different scheduling algorithms, namely, the MRKFL-FTS technique and existing methods HFTSA [1], FD-SWMS [2], and a hybrid EMD-Transformer-BiLSTM prediction model [58]. The figure clearly shows that the MRKFL-FTS technique achieved higher scheduling efficiency. This observation is achieved through statistical evaluation. In an experiment conducted with 1000 user requests, the MRKFL-FTS technique achieved an efficiency of 96.2%. In contrast, the efficiencies of the existing methods were observed as 93.5%, 91.2%, and 90.2%, respectively. Similar results were obtained for each method. The performance outcomes of the proposed MRKFL-FTS technique are systematically compared to the existing methods. Overall, the comparison results demonstrate that the MRKFL-FTS technique enhances task scheduling efficiency by 6% compared to [1], 8% compared to [2], and 10% compared to [58]. Incorporating the federated learning technique in the MRKFL-FTS technique identifies resource-efficient and fault-tolerant virtual machines based on resource availability, employing kernelized support vector regression. The regression technique identifies resource-efficient virtual machines. The task assigner utilizes the weighted round-robin scheduling to assign the incoming IoT requests, resulting in an increase in efficiency.

As the number of requests increases, the task scheduling efficiency also improves. Similarly, when the cost increases, it reflects a higher CPU usage level, which contributes to achieving better task scheduling efficiency. Scenario-based testing involves simulating real-world situations and user interactions to evaluate the system’s functionality and behavior. On the other hand, context-driven testing focuses on adapting testing strategies based on the specific project context, environment, and constraints. In scenario-based testing, the user sends a request to the server, and task scheduling is then performed. During this process, resources such as the CPU are utilized, and, for example, the CPU cost may start at 0.000137. When a large number of requests are processed, the CPU cost increases accordingly, indicating higher resource usage and demand.

5.2. Makespan

It refers to the overall time required to complete a number of IoT requests or tasks. The objective is to minimize the time required to complete each task. The makespan formula is used to calculate the total time taken to complete a set of IoT tasks. The formula for calculating the makespan is in Equation (18),

M a k e s p a n = m i n (m a x \sum_{i = 1}^{n} \sum_{g = 1}^{b} X_{i g} * C_{i g})

(22)

where

M a k e s p a n

represents the makespan, n is the number of IoT resources, b is the number of IoT tasks,

C_{i g}

indicates the completion time of the task of g in resource i, and

X_{i j}

is a decision variable that is 1 if task g is allocated to resource i and 0 otherwise. It is measured in milliseconds (ms).

Figure 12 depicts the graphical representation of the makespan for request scheduling using four methods, namely, the MRKFL-FTS technique and existing HFTSA [1], FD-SWMS [2], and hybrid EMD-Transformer-BiLSTM prediction model [58]. The graph illustrates that the makespan increases with the growing number of user requests. However, in experiments involving 1000 requests, the MRKFL-FTS technique achieved a completion time of ‘112 ms’. The overall makespan was observed to be 125 ms, 133 ms, and 135 ms for the existing methods, respectively. After conducting the experiments, diverse results were observed for each method. The comparison results indicate that the makespan performance using the MRKFL-FTS technique is reduced by 18%, 7%, and 16% compared to the existing methods. The MRKFL-FTS technique utilizes multi-objective radial kernelized federated learning to analyze the resource status of a virtual machine based on the energy, memory, CPU, and failure rate. In the event of a fault, the virtual machine tolerates the fault and continues processing, thereby minimizing the time required to complete user requests.

5.3. Throughput

It is referred to as the number of completed tasks or requests in a specified period of time. The throughput is mathematically computed as Equation (19):

T h r o u g h p u t = \frac{N u m b e r o f I o T r e q u e s t s s u c c e s f u l l y c o m p l e t e d}{S p e c i f i e d p e r i o d o f t i m e}

(23)

where

T h r o u g h p u t

indicates a throughput. It is measured in terms of requests per second (requests/s).

Figure 13 illustrates a comparative performance analysis of the throughput versus the number of IoT requests. The horizontal axis represents the number of IoT requests, and the vertical axis indicates the throughput performance of four different methods. The analysis highlights that the proposed MRKFL-FTS technique achieved improved performance, resulting in higher throughput. To ensure the robustness of the MRKFL-FTS technique, ten different comparisons were conducted for each method. The average of these ten comparisons reveals that the throughput performance using the MRKFL-FTS technique improved by 21% compared to [1], 43% compared to [2], and 44% compared to [58], respectively. This enhancement is achieved through the selection of resource-efficient virtual machines. Consequently, these selected resource-efficient virtual machines have the ability to complete numerous requests within a specific time frame.

5.4. Response Time

It is defined as the duration it takes to respond to a user-requested task. It includes a submission time, waiting time, and processing time. The formula for calculating the response time is calculated as Equation (20),

Response Time = \sum_{g = 1}^{b} (ρ_{g}^{sub} + ρ_{g}^{wait} + ρ_{g}^{proc})

(24)

where

R e s p o n s e

indicates a response time, and

ρ_{g}^{s u b}

is the submission time of the gth task.

ρ_{g}^{w a i t}

is the waiting time of the gth task.

ρ_{g}^{p r o c}

is the processing time of the gth task by a virtual machine. The time is measured in milliseconds (ms).

Figure 14 depicts the experimental outcomes of the response time for various IoT requests, ranging from 1000 to 10,000. As the number of IoT requests increases, the response time also increases for all three methods. However, the proposed MRKFL-FTS technique achieves a lower response time compared to the existing methods. For instance, with 1000 IoT requests, the response time for the MRKFL-FTS technique was observed to be 125 ms, while [1,2,58] exhibited response times of 136 ms, 145 ms, and 147 ms, respectively. The overall performance results of the MRKFL-FTS technique are then compared to the existing methods, revealing that the MRKFL-FTS technique minimizes the response time by 8%, 13%, and 18% compared to [1], [2], and [58], respectively. This is because the MRKFL-FTS technique performs fault tolerance in the task scheduling process. This helps to effectively minimize both waiting and processing times for all IoT requests.

5.5. Fault Prediction Accuracy

It is defined as the ratio of the number of IoT requests for predicting the absence or presence of a fault to the total number of tasks. The fault prediction accuracy is calculated as

F a u l t_{a c c} = (\frac{N u m b e r o f c o r r e c t l y p r e d i c t e d t a s k s}{T o t a l n u m b e r o f t a s k s}) \times 100

(25)

where

F a u l t_{a c c}

represents the fault prediction accuracy; the fault prediction accuracy is measured in terms of percentage (%).

Figure 15 illustrates the fault prediction accuracy with varying numbers of IoT requests, ranging from 1000 to 10,000, as extracted from the dataset. The comparison involves four different scheduling algorithms, namely, the MRKFL-FTS technique and the existing methods HFTSA [1], FD-SWMS [2], and a hybrid EMD-Transformer-BiLSTM prediction model [58]. The figure clearly shows that the MRKFL-FTS technique achieved higher fault prediction accuracy. This observation is achieved through statistical evaluation. In an experiment conducted with 1000 user requests, the MRKFL-FTS technique achieved an efficiency of 97.2%. In contrast, the efficiencies of [1,2,58] were observed as 95.5%, 92.35%, and 90.35%, respectively. Similar results were obtained for each method. Overall, the comparison results demonstrate that the MRKFL-FTS technique enhances fault prediction accuracy by 5% compared to [1], 9% compared to [2], and 11% compared to [58].

5.6. Energy Consumption

It is measured as the amount of energy consumed by the server for processing the number of tasks.

E C = \sum_{j = 1}^{m} T_{j} \cdot E C (T)

(26)

where

E C

represents the total energy consumption,

E C (T)

denotes the energy consumed for each task, and

T_{j}

indicates the number of IoT requests or tasks. Energy consumption is measured in joules (J).

Figure 16 illustrates the performance results for energy consumption using the MRKFL-FTS technique, and the existing HFTSA [1], FD-SWMS [2], and hybrid EMD-Transformer-BiLSTM prediction model [58]. The number of user requests varies from 1000 to 10,000. The energy consumption in task scheduling is performed in cloud data centers. From the result analysis, the MRKFL-FTS method reduces the energy consumption for scheduling a large number of tasks compared to the existing [1,2,58] methods, respectively. The average of ten comparison results shows that the energy consumption using the MRKFL-FTS method was reduced by 11%, 19%, and 21% compared to the existing methods.

5.7. Scalability

It refers to the capacity of the system to efficiently manage an increasing number of tasks.

Scalability = \frac{S E_{H}}{S E_{B}}

(27)

where

S E_{H}

denotes the efficiency under a high task load, and

S E_{B}

indicates the efficiency of the baseline task.

Figure 17 illustrates the graphical results of scalability versus the number of tasks, ranging from 1000 to 10,000. As shown in the graph, the various scalability results were obtained for all three methods. For each method, ten different scalability results were observed. The results of the MRKFL-FTS method were compared to those obtained using the methods in [1,2,58]. The average of ten comparison results shows that the scalability using the MRKFL-FTS method was improved by 6%, 9%, and 9% compared to the existing methods.

5.8. Time Complexity

Time complexity refers to the amount of time required by the algorithm to utilize virtual machines for processing IoT tasks. It depends on the number of tasks assigned to each virtual machine and the time required by the virtual machine to complete them. The MRKFL-FTS algorithm is designed for fault-tolerant task scheduling by reducing the overall time complexity compared with the other existing algorithms. Figure 18 describes the graphical representation of time complexity for request scheduling using four methods, namely, the MRKFL-FTS technique and the existing HFTSA [1], FD-SWMS [2], and hybrid EMD-Transformer-BiLSTM prediction model [58]. The graph illustrates that the time complexity reduces with the number of user requests. However, in experiments involving 1000 requests, the MRKFL-FTS technique achieved a completion time of 105 ms. The overall makespan was observed to be 112 ms, 125 ms, and 130 ms for [1], [2], and [58], respectively. After conducting the experiments, diverse results were observed for each method. The comparison results indicate that the time complexity performance using the MRKFL-FTS technique is minimized by 9%, 16%, and 18% compared to the existing methods.

5.9. Cost

The cost of task scheduling in a cloud environment refers to the total monetary expense incurred for executing and managing tasks across the various virtual machines (VMs). It is calculated based on the consumption of compute resources such as the CPU, memory, and network bandwidth during task execution. In the MRKFL-FTS framework, the cost is minimized by intelligently allocating tasks to virtual machines using fault tolerance and efficiency-based strategies. The total cost is measured as follows:

T C = C (CPU + D + MEM + BW)

(28)

where

T C

denotes the total cost,

C P U

denotes the CPU cost, D denotes the data cost,

M E M

denotes the memory, and

B W

indicates the bandwidth. As the size increases, the task scheduling efficiency also increases. An increase in cost implies that the CPU usage level has also increased, which helps achieve better results in task scheduling efficiency.

Scenario-based testing focuses on simulating real-world situations and user interactions to estimate the system’s functionality and behavior. In this approach, users send task requests to the server, which then schedules these tasks across available virtual machines (VMs). The scheduling process involves consuming multiple resources including CPU, memory, data transfer, and bandwidth—each contributing to the overall operational cost.

As shown in Table 10, the MRKFL-FTS method demonstrates how these individual cost components contribute to the total cost for different tasks and VM configurations. For instance, for a single VM handling 100 tasks, the CPU cost is AED0.000135, the data cost is AED0.0016, the memory cost is AED0.000016, and the bandwidth cost is AED0.0180, resulting in a total cost of AED0.01975. As the number of VMs and tasks increases, the total cost rises correspondingly due to higher resource consumption. At 5 VMs and 500 tasks, the total cost increases to AED0.09919.

This demonstrates that although CPU cost starts small (e.g., AED0.000135), when a large number of tasks are scheduled simultaneously, CPU utilization grows, leading to a higher overall cost. The MRKFL-FTS method helps effectively estimate and minimize this cost by optimizing the allocation of tasks based on resource efficiency. This makes it particularly suitable for dynamic cloud environments where scalability and cost-awareness are critical.

5.10. Space Complexity

Space complexity is measured as the amount of memory space consumed for managing virtual machines in cloud IoT-based environments. The formula for calculating the space complexity of the proposed MRKFL-FTS technique is defined as follows:

S C = O (d + K)

(29)

where

S C

denotes the space complexity, d represents the global model and aggregation weights, and K indicates the number of clients.

In the proposed federated learning process, the number of clients is considered to be 66,245 and the global model and aggregation weight is 15. Therefore, the overall space complexity of MRKFL-FTS is

S C = O (15 + 66245) = O (66260)

(30)

The space complexity formulas for the comparison techniques are as follows:

HFTSA [1]:

$S C = O (n^{2})$

(31)

where n is the number of virtual machines, indicating quadratic space complexity.
FD-SWMS [2]:

$S C = O (K) or S C = O (min (n, K))$

(32)

where n denotes the number of virtual machines, and K represents the number of clients.
Hybrid EMD-Transformer-BiLSTM [58]:

$S C = O (b^{d / 2})$

(33)

where b is the base of the exponential function, and d is the data dimensionality.

Table 11 shows the space complexity analysis of various techniques.

6. Overall Performance Analysis

The overall performance analysis of the MRKFL-FTS method reveals significant enhancements in various parameters, including task scheduling efficiency, fault prediction accuracy, makespan, throughput, response time, and time complexity, demonstrating its effectiveness in improving overall performance compared to the existing methods.

Figure 19, Figure 20, Figure 21 and Figure 22 illustrate the overall performance results of nine parameters such as the task scheduling efficiency, makespan, throughput, response time, fault prediction accuracy, time complexity, and space complexity, energy consumption and scalability for four methods, namely, the proposed MRKFL-FTS technique, HFTSA [1], FD-SWMS [2], and hybrid EMD-Transformer-BiLSTM prediction model [58]. From these results, it is evident that the performance of all the parameters is improved using the proposed MRKFL-FTS technique compared to the existing methods. When the experiment was conducted, the average task scheduling efficiency was observed to be 97%, 91.92%, 89.34%, and 85.34% for the MRKFL-FTS technique and existing methods. Similarly, the average makespans were observed to be 163.7 ms, 180.89 ms, 193.2 ms, and 195.2 ms for the MRKFL-FTS technique, [1], [2], and [58], respectively. Moreover, the average throughputs were observed to be 650.8 requests/sec, 552.3 requests/sec, 478.5 requests/sec, and 474.5 requests/sec for the MRKFL-FTS technique and the existing methods. The average response times for the MRKFL-FTS technique, [1], [2], and [58] were observed to be 187.85 ms, 204.3 ms, 217 ms, and 237 ms, respectively. The average fault prediction accuracies for the MRKFL-FTS technique and the existing methods were observed to be 97.83%, 93.45%, 89.98%, and 85.98%, respectively. The average time complexities for the MRKFL-FTS technique and existing methods were observed to be 153 ms, 168.3 ms, 182.2 ms, and 185.2 ms, respectively. The average space complexities for the MRKFL-FTS technique and existing methods were observed to be 160.8 MB, 171.2 MB, 178.6 MB, and 180.6 MB, respectively. The average energy consumption of the MRKFL-FTS technique and [1,2,58] was observed to be 143.1 Joule, 160.5 Joule, 175.8 Joule, and 179.8 Joule, respectively. Finally, the average scalability of the MRKFL-FTS technique and existing methods was observed to be 0.969, 0.918, 0.893, and 0.843, respectively. From the results, it is concluded that the MRKFL-FTS technique maximizes the performance of task scheduling efficiency, throughput, and fault prediction accuracy while minimizing makespan, response time, time complexity, space complexity, energy consumption, and scalability compared to the existing methods.

6.1. ANOVA TEST

ANOVA (Analysis of Variance) is a statistical test used to assess whether there is a significant difference between two groups (i.e., two classes) within a dataset. In the context of resource-efficient virtual machine classification for the cloud, ANOVA is applied to determine if the mean values of predictive variables like memory and

C P U

differ significantly between two groups, such as resource-efficient and non-efficient ones. The formula for calculating the F-statistics is given below,

S B = n_{1} {(μ_{1} - μ_{T})}^{2} + n_{2} {(μ_{2} - μ_{T})}^{2}

(34)

S W = \sum_{i = 1}^{n} {(V M_{i} - μ_{T})}^{2}

(35)

where

S B

denotes the sum of the square differences between the classes;

S W

indicates the sum of the square differences within the class;

μ_{1}

and

μ_{2}

denote the means of class 1 and class 2, respectively;

μ_{T}

denotes the grand total mean of both classes; and

n_{1}

and

n_{2}

denote the number of virtual machines in each class.

D F_{b e t} = k - 1

(36)

D F_{w} = n - k

(37)

where

D F_{b e t}

denotes the degree of freedom between the classes,

D F_{w}

denotes the degree of freedom within the classes, and k denotes the number of classes (i.e., resource-efficient and resource non-efficient).

Further, the mean square error is calculated as follows:

M S E_{b e t} = \frac{S B}{D F_{b e t}}

(38)

M S E_{w} = \frac{S W}{D F_{w}}

(39)

where

M S E_{b e t}

and

M S E_{w}

denote the mean square error between and within the class, respectively.

Further, the F-statistics are computed as follows:

F - statistics = \frac{M S E_{b e t}}{M S E_{w}}

(40)

Table 12 and Figure 23 presents the results of the ANOVA statistical test. The p-value is derived from the F-statistic and the degrees of freedom. If the p-value is lower than the significance level (0.05), it indicates that there is a statistically significant difference between the two class values of the resource-efficient and non-efficient virtual machines.

6.2. Performance of MRKFL-FTS Technique Under Network Fluctuations or Varying Conditions

Network fluctuations have a significant impact on the performance of task scheduling in distributed computing environments. While the proposed scheduling technique works well under stable conditions, it may experience performance degradation when subjected to network disruptions, such as fluctuating bandwidth, high latency, or task loss. These challenges result in task transmission delays, poor resource allocation, and inefficient task execution, which reduces the overall system throughput. Therefore, the performance of network throughput in task scheduling under the network fluctuation is measured using the proposed MRKFL-FTS technique and existing HFTSA [1], FD-SWMS [2], and hybrid EMD-Transformer-BiLSTM prediction model [58].

Figure 24 illustrates a performance analysis of the throughput versus the number of IoT requests, ranging from 1000 to 10,000, under varying network conditions or network fluctuations. The horizontal axis represents the number of IoT requests, and the vertical axis indicates the throughput performance of three different methods. As network conditions fluctuate, with variables such as bandwidth, latency, and request loss, these factors affect the system’s overall throughput. Under stable network conditions, the throughput is relatively high, with fewer delays in the processing and transmission of IoT requests. However, in network fluctuations, the throughput may decrease due to longer task execution times, misallocated resources, or task retransmissions.

The analysis demonstrates that the proposed MRKFL-FTS technique delivers improved performance, achieving higher throughput compared to the existing methods. However, under network fluctuations, the throughput performance slightly decreased compared to the stable conditions when using the MRKFL-FTS technique. In contrast, the existing methods experienced a more significant decrease in throughput under fluctuating network conditions. This improvement is achieved by the MRKFL-FTS technique, which selects resource-efficient virtual machines for scheduling multiple tasks. As a result, these selected virtual machines can effectively handle a larger number of incoming requests within a specific time frame.

Bandwidth-efficient virtual machines are prioritized for task scheduling in the presence of network fluctuations. If fluctuations are detected, the scheduler assigns tasks to the VM with better bandwidth stability and efficiency. For instance, when 100 tasks are performed, the bandwidth required is approximately 0.0180 Kbps per task. As the available bandwidth increases, the system achieves lower latency and higher throughput. Table 13 and Figure 25 illustrate the impact of bandwidth on throughput. In Figure 25, the blue color represents the bandwidth, while the brown color indicates the throughput. It can be observed that as the bandwidth increases, the throughput also increases correspondingly.

7. Case Study of MRKFL-FTS Technique with Amazon Cloud Data Centers

Amazon is the world’s largest e-commerce platform that applies cloud computing for real-time task scheduling. The Amazon cloud infrastructure is highly reliable, scalable, and capable of handling millions of tasks daily. These tasks include everything from updating product inventories, processing customer orders, and tracking shopping behaviors to scheduling maintenance tasks across its massive infrastructures. In this case study, task scheduling in Amazon cloud data centers is discussed based on optimal performance, resource utilization, and fault tolerance for users.

Let us consider the 10 users who send their requests or tasks to the Amazon cloud data centers. Each user’s request may represent a different type of task. The main aim is to perform the task scheduling in Amazon’s cloud data centers with better scalability, fault tolerance, and high availability. The cloud data centers receive the incoming user tasks and apply a multi-objective radial kernelized federated learning technique to Amazon’s infrastructure to enhance task scheduling by analyzing resources of virtual machines, such as energy, memory, CPU time, and average failure rate. Based on the resource analysis, the radial kernel support vector regression categorizes a virtual machine as either resource-efficient or not. The global aggregation model of federated learning then combines all the local training regression output results. The global aggregation model provides an accurate classification of virtual machines.

Once the resource-efficient virtual machine is determined, the task assigner performs the user requests with the help of weighted round-robin scheduling. This scheduling method is used to allocate resources to different tasks or virtual machines based on the weight assigned to each virtual machine. In Amazon’s cloud infrastructure, this technique helps ensure that higher-priority tasks are assigned to resource-efficient virtual machines. This results in optimized resource usage, scalability, and cost-effectiveness, making weighted round-robin scheduling an essential technique for Amazon’s cloud-based systems and services.

8. Conclusions

In this paper, the significant issue of providing fault-tolerant IoT request scheduling to cloud components is addressed by introducing a novel technique called MRKFL-FTS. This scheduling technique aims to reduce makespan and enhance resource utilization between IoT users and servers, and improve the quality of service. The MRKFL-FTS technique leverages multi-objective radial kernelized federated learning to select a resource-efficient virtual machine for scheduling multiple IoT requests with increased efficiency, employing radial kernelized support vector regression. The weighted round-robin method is employed to enhance the scheduling efficiency of IoT requests. Experimental analysis is conducted to assess the performance of the MRKFL-FTS technique, comparing it with conventional scheduling methods using various metrics, such as task scheduling efficiency, makespan, throughput, response time, fault prediction accuracy, and time complexity. The implementation results demonstrate that the proposed MRKFL-FTS technique achieved an 8% improvement in task scheduling efficiency and fault prediction accuracy and 36% improvement in throughput while reducing the makespan and time complexity by 14% compared to the existing methods. Additionally, the MRKFL-FTS technique minimizes the response time by 13% compared to the existing methods. Finally, the energy consumption of the MRKFL-FTS technique is reduced by 17% and increases the scalability by 8%.

Future work may extend the MRKFL-FTS framework to support heterogeneous IoT devices with varying capabilities and integrate it with edge computing environments to enable enhanced, real-time processing. Additionally, the framework can be optimized for energy-constrained devices, making it suitable for low-power IoT networks. Furthermore, advanced security and privacy-preserving mechanisms can be incorporated into the federated learning process to ensure secure and trustworthy model training.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/math13132198/s1, Personal Cloud Datasets.

Author Contributions

Methodology, S.R.S. and R.A.; Software, S.R.S.; Formal analysis, S.R.S.; Data curation, S.R.S.; Writing—original draft, S.R.S. and R.A.; Writing—review & editing, S.R.S. and R.A.; Supervision, R.A.; Funding acquisition, R.A. All authors have read and agreed to the published version of the manuscript.

Funding

The work in this paper was supported, in part, by the Open Access Program from the American University of Sharjah. This paper represents the opinions of the author(s) and does not mean to represent the position or opinions of the American University of Sharjah.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yao, G.; Ren, Q.; Li, X.; Zhao, S.; Ruiz, R. A hybrid fault-tolerant scheduling for deadline-constrained tasks in cloud systems. IEEE Trans. Serv. Comput. 2022, 15, 1371–1384. [Google Scholar] [CrossRef]
Ahmad, Z.; Jehangiri, A.I.; Mohamed, N.; Othman, M.; Umar, A.I. Fault tolerant and data oriented scientific workflows management and scheduling system in cloud computing. IEEE Access 2022, 10, 77614–77632. [Google Scholar] [CrossRef]
Farhan, A.; Aburukba, R.; Sagahyroon, A.; Elnawawy, M.; El-Fakih, K. Virtualizing and scheduling fpga resources in cloud computing datacenters. IEEE Access 2022, 10, 96909–96929. [Google Scholar] [CrossRef]
Aburukba, R.O.; AliKarrar, M.; Landolsi, T.; El-Fakih, K. Scheduling internet of things requests to minimize latency in hybrid fog–cloud computing. Future Gener. Comput. Syst. 2020, 111, 539–551. [Google Scholar] [CrossRef]
Aburukba, R.O.; Landolsi, T.; Omer, D. A heuristic scheduling approach for fog-cloud computing environment with stationary iot devices. J. Netw. Comput. Appl. 2021, 180, 102994. [Google Scholar] [CrossRef]
Long, T.; Ma, Y.; Wu, L.; Xia, Y.; Jiang, N.; Li, J.; Fu, X.; You, X.; Zhang, B. A novel fault-tolerant scheduling approach for collaborative workflows in an edge-iot environment. Digit. Commun. Netw. 2022, 8, 911–920. [Google Scholar] [CrossRef]
Alahmad, A.; Daradkeh, T.; Agarwal, A. Proactive failure-aware task scheduling framework for cloud computing. IEEE Access 2021, 9, 106152–106168. [Google Scholar] [CrossRef]
Marahatta, A.; Xin, Q.; Chi, C.; Zhang, F.; Liu, Z. Pefs: Ai-driven prediction based energy-aware fault-tolerant scheduling scheme for cloud data center. IEEE Trans. Sustain. Comput. 2021, 6, 655–666. [Google Scholar] [CrossRef]
Mangalampalli, S.; Karri, G.R.; Gupta, A.; Chakrabarti, T.; Nallamala, S.H.; Chakrabarti, P.; Unhelkar, B.; Margala, M. Fault-tolerant trust-based task scheduling algorithm using harris hawks optimization in cloud computing. Sensors 2023, 23, 8009. [Google Scholar] [CrossRef]
Krishna, M.S.R.; Mangalampalli, S. A novel fault-tolerant aware task scheduler using deep reinforcement learning in cloud computing. Appl. Sci. 2023, 13, 12015. [Google Scholar] [CrossRef]
Li, Z.; Chang, V.; Hu, H.; Hu, H.; Li, C.; Ge, J. Real-time and dynamic fault-tolerant scheduling for scientific workflows in clouds. Inf. Sci. 2021, 568, 13–39. [Google Scholar] [CrossRef]
Malik, M.K.; Joshi, H.; Swaroop, A. An effective fault tolerance aware scheduling using hybrid horseherdoptimisation-reptile search optimisation approach for a cloud computing environment. Cogn. Comput. Syst. 2023, 5, 231–242. [Google Scholar] [CrossRef]
Abdullah, S.H.; Ayad, A.-H.; Mohammed, N.M.; Saad, R.M.A. Adaptive fault-tolerance during job scheduling in cloud services based on swarm intelligence and apache spark. Int. J. Intell. Syst. Appl. Eng. 2023, 11, 74–81. [Google Scholar]
Zhang, Y.; Tang, B.; Luo, J.; Zhang, J. Deadline-aware dynamic task scheduling in edge–cloud collaborative computing. Electronics 2022, 11, 2464. [Google Scholar] [CrossRef]
Jing, W.; Zhao, C.; Miao, Q.; Song, H.; Chen, G. Qos dpso: Qos aware task scheduling for cloud computing system. J. Netw. Syst. Manag. 2021, 29, 5. [Google Scholar] [CrossRef]
Tarafdar, A.; Debnath, M.; Khatua, S.; Da, R.K. Energy and makespan aware scheduling of deadline sensitive tasks in the cloud environment. J. Grid Comput. 2021, 19, 19. [Google Scholar] [CrossRef]
Lipsa, S.; Dash, R.K.; Ivković, N.; Cengiz, K. Task scheduling in cloud computing: A priority-based heuristic approach. IEEE Access 2023, 11, 27111–27126. [Google Scholar] [CrossRef]
Dehury, C.K.; Sahoo, P.K.; Veeravalli, B. Rrft: A rank-based resource aware fault tolerant strategy for cloud platforms. IEEE Trans. Cloud Comput. 2023, 11, 1257–1272. [Google Scholar] [CrossRef]
Alaei, M.; Khorsand, R.; Ramezanpour, M. An adaptive fault detector strategy for scientific workflow scheduling based on improved differential evolution algorithm in cloud. Appl. Soft Comput. 2021, 99, 106895. [Google Scholar] [CrossRef]
Gupta, P.; Sahoo, P.K.; Veeravalli, B. Dynamic fault tolerant scheduling with response time minimization for multiple failures in cloud. J. Parallel Distrib. Comput. 2021, 158, 80–93. [Google Scholar] [CrossRef]
Reddy, P.V.; Reddy, K.G. A multi-objective based scheduling framework for effective resource utilization in cloud computing. IEEE Access 2023, 11, 37178–37193. [Google Scholar] [CrossRef]
Ibrahim, H.; Aburukba, R.O.; El-Fakih, K. An integer linear programming model and adaptive genetic algorithm approach to minimize energy consumption of cloud computing data centers. Comput. Electr. Eng. 2018, 67, 551–565. [Google Scholar] [CrossRef]
Osman, A.; Sagahyroon, A.; Aburukba, R.; Aloul, F. Optimization of energy consumption in cloud computing datacenters. Int. J. Electr. Comput. Eng. Ijece 2021, 11, 686–698. [Google Scholar] [CrossRef]
Guo, X. Multi-objective task scheduling optimization in cloud computing based on fuzzy self-defense algorithm. Alex. Eng. J. 2021, 60, 5603–5609. [Google Scholar] [CrossRef]
Kruekaew, B.; Kimpan, W. Multi-objective task scheduling optimization for load balancing in cloud computing environment using hybrid artificial bee colony algorithm with reinforcement learning. IEEE Access 2022, 10, 17803–17818. [Google Scholar] [CrossRef]
Guo, P.; Liu, M.; Wu, J.; Xue, Z.; He, X. Energy-efficient fault-tolerant scheduling algorithm for real-time tasks in cloud-based 5g networks. IEEE Access 2018, 6, 53671–53683. [Google Scholar] [CrossRef]
Fan, G.; Chen, L.; Yu, H.; Liu, D. Modeling and analyzing dynamic fault-tolerant strategy for deadline constrained task scheduling in cloud computing. IEEE Trans. Syst. Man, Cybern. Syst. 2020, 50, 1260–1274. [Google Scholar] [CrossRef]
Azizi, S.; Shojafar, M.; Abawajy, J.; Buyya, R. Deadline-aware and energy-efficient iot task scheduling in fog computing systems: A semi-greedy approach. J. Netw. Comput. Appl. 2022, 201, 103333. [Google Scholar] [CrossRef]
Sathiyamoorthi, V.; Keerthika, P.; Suresh, P.; Zhang, Z.J.; Rao, A.P.; Logeswaran, K. Adaptive fault tolerant resource allocation scheme for cloud computing environments. J. Organ. End User Comput. Joeuc 2021, 33, 135–152. [Google Scholar] [CrossRef]
Yuan, G.; Xu, Z.; Yang, B.; Liang, W.; Chai, W.K.; Tuncer, D.; Galis, A.; Pavlou, G.; Wu, G. Fault tolerant placement of stateful vnfs and dynamic fault recovery in cloud networks. Comput. Netw. 2020, 166, 106953. [Google Scholar] [CrossRef]
Mahmoud, H.; Thabet, M.; Khafagy, M.H.; Omara, F.A. Multiobjective task scheduling in cloud environment using decision tree algorithm. IEEE Access 2022, 10, 36140–36151. [Google Scholar] [CrossRef]
Han, H.; Bao, W.; Zhu, X.; Feng, X.; Zhou, W. Fault-tolerant scheduling for hybrid real-time tasks based on cpb model in cloud. IEEE Access 2018, 6, 18616–18629. [Google Scholar] [CrossRef]
Shaukat, M.; Alasmary, W.; Alanazi, E.; Shuja, J.; Madani, S.A.; Hsu, C.-H. Balanced energy-aware and fault-tolerant data center scheduling. Sensors 2022, 22, 1482. [Google Scholar] [CrossRef]
Saxena, D.; Gupta, I.; Singh, A.K.; Lee, C.-N. A fault tolerant elastic resource management framework towards high availability of cloud services. IEEE Trans. Netw. Serv. Manag. 2022, 19, 3048–3061. [Google Scholar] [CrossRef]
Dreibholz, T.; Mazumdar, S. Towards a lightweight task scheduling framework for cloud and edge platform. Internet Things 2023, 21, 100651. [Google Scholar] [CrossRef]
Singh, H.; Bhasin, A.; Kaveri, P.R. Qras: Efficient resource allocation for task scheduling in cloud computing. SN Appl. Sci. 2021, 3, 474. [Google Scholar] [CrossRef]
Zhang, H.; Wu, Y.; Sun, Z. Eheft r: Multi objective task scheduling scheme in cloud computing. Complex Intell. Syst. 2022, 8, 4475–4482. [Google Scholar] [CrossRef]
Gupta, S.; Iyer, S.; Agarwal, G.; Manoharan, P.; Algarni, A.D.; Aldehim, G.; Raahemifar, K. Efficient prioritization and processor selection schemes for heft algorithm: A makespan optimizer for task scheduling in cloud environment. Electronics 2022, 11, 2557. [Google Scholar] [CrossRef]
Mansouri, N.; Javidi, M.M. Cost-based job scheduling strategy in cloud computing environments. Distrib. Parallel Databases 2020, 38, 365–400. [Google Scholar] [CrossRef]
Hussain, M.; Chang, M.H.; Lee, Y.-J. Fault-tolerant dynamic task scheduling for cloud computing. J. Cloud Comput. Adv. Syst. Appl. 2021, 10, 1–12. [Google Scholar]
Ciptaningtyas, H.T.; Shiddiqi, A.M.; Purwitasari, D.; Rosyadi, F.D.; Fauzan, M.N. Multi-objective task scheduling algorithm in cloud computing using improved squirrel search algorithm. Int. J. Intell. Eng. Syst. 2023, 17, 895–912. [Google Scholar] [CrossRef]
Ilavarasan, E.; Thambidurai, P. Low complexity performance effective task scheduling algorithm for heterogeneous computing environments. J. Comput. Sci. 2007, 3, 94–103. [Google Scholar] [CrossRef]
Hussien, A.G.; Chhabra, A.; Hashim, F.A.; Pop, A. A novel hybrid artificial gorilla troops optimizer with honey badger algorithm for solving cloud scheduling problem. Clust. Comput. 2024, 27, 13093–13128. [Google Scholar] [CrossRef]
Ullah, A.; Alomari, Z.; Alkhushayni, S.; Al-Zaleq, D.; Taha, M.B.; Remmach, H. Improvement in task allocation for vm and reduction of makespan in iaas model for cloud computing. Clust. Comput. 2024, 27, 11407–11426. [Google Scholar] [CrossRef]
Amer, D.A.; Attiya, G.; Ziedan, I. An efficient multi-objective scheduling algorithm based on spider monkey and ant colony optimization in cloud computing. Clust. Comput. 2024, 27, 1799–1819. [Google Scholar] [CrossRef]
Khademi Dehnavi, M.; Broumandnia, A.; Hosseini Shirvani, M.; Ahanian, I. A hybrid genetic-based task scheduling algorithm for cost-efficient workflow execution in heterogeneous cloud computing environment. Clust. Comput. 2024, 27, 10833–10858. [Google Scholar] [CrossRef]
Khaledian, N.; Khamforoosh, K.; Akraminejad, R.; Abualigah, L.; Javaheri, D. An energy-efficient and deadline-aware workflow scheduling algorithm in the fog and cloud environment. Clust. Comput. 2024, 106, 109–137. [Google Scholar] [CrossRef]
Aslani, A.; Ghobaei-Arani, M. Machine learning inference serving models in serverless computing: A survey. Computing 2025, 107, 47. [Google Scholar] [CrossRef]
Ghorbian, M.; Ghobaei-Arani, M. A survey on the cold start latency approaches in serverless computing: An optimization-based perspective. Computing 2024, 106, 3755–3809. [Google Scholar] [CrossRef]
Ghorbian, M.; Ghobaei-Arani, M.; Asadolahpour-Karimi, R. Function placement approaches in serverless computing: A survey. J. Syst. Archit. 2024, 157, 103291. [Google Scholar] [CrossRef]
Jazayeri, F.; Shahidinejad, A.; Ghobaei-Arani, M. A latency-aware and energy-efficient computation offloading in mobile fog computing: A hidden markov model-based approach. J. Supercomput. 2021, 77, 4887–4916. [Google Scholar] [CrossRef]
Aghazadeh, R.; Shahidinejad, A.; Ghobaei-Arani, M. Proactive content caching in edge computing environment: A review. Software Pract. Exp. 2023, 53, 811–855. [Google Scholar] [CrossRef]
Ebrahimi, A.; Ghobaei-Arani, M.; Saboohi, H. Cold start latency mitigation mechanisms in serverless computing: Taxonomy, review, and future directions. J. Syst. Archit. 2024, 151, 103115. [Google Scholar] [CrossRef]
Ghorbian, M.; Ghobaei-Arani, M.; Esmaeili, L. A survey on the scheduling mechanisms in serverless computing: A taxonomy, challenges, and trends. Clust. Comput. 2024, 27, 5571–5610. [Google Scholar] [CrossRef]
Ghorbian, M.; Ghobaei-Arani, M. Function offloading approaches in serverless computing: A survey. Comput. Electr. Eng. 2024, 120, 109832. [Google Scholar] [CrossRef]
Choppara, P.; Mangalampalli, S.S. A Hybrid Task scheduling technique in fog computing using fuzzy logic and Deep Reinforcement learning. IEEE Access 2024, 12, 176363–176388. [Google Scholar] [CrossRef]
Choppara, P.; Mangalampalli, S.S. Reliability and trust-aware task scheduler for cloud-fog computing using advantage actor critic a2c algorithm. IEEE Access 2024, 12, 102126–102145. [Google Scholar] [CrossRef]
Ramadan, M.N.A.; Ali, M.A.H.; Jaber, H.; Alkhedher, M. Blockchain-secured iot-federated learning for industrial air pollution monitoring: A mechanistic approach to exposure prediction and environmental safety. Ecotoxicol. Environ. Saf. 2025, 300, 115459. [Google Scholar] [CrossRef]
Rehaimi, A.; Sadqi, Y.; Maleh, Y.; Gaba, G.S.; Gurtov, A. Towards a federated and hybrid cloud computing environment for sustainable and effective provisioning of cyber security virtual laboratories. Expert Syst. Appl. 2024, 252 Part B, 120278. [Google Scholar] [CrossRef]
Lakhan, A.; Abdul Lateef, A.A.; Abd Ghani, M.K.; Abdulkareem, K.H.; Mohammede, M.A.; Nedoma, J.; Martinek, R.; Garcia-Zapirain, B. Secure-fault-tolerant efficient industrial internet of healthcare things framework based on digital twin federated fog-cloud networks. J. King Saud Univ. Comput. Inf. Sci. 2023, 35, 101747. [Google Scholar] [CrossRef]
Zhao, B.; Ji, Y.; Shi, Y.; Jiang, X. Design and implementation of privacy-preserving federated learning algorithm for consumer iot. Alex. Eng. J. 2024, 106, 206–216. [Google Scholar] [CrossRef]
Kathole, A.B.; Singh, V.K.; Goyal, A.; Kant, S.; Savyanavar, A.S.; Ubale, S.A.; Jain, P.; Islam, M.T. Novel load balancing mechanism for cloud networks using dilated and attention-based federated learning with coati optimization. Sci. Rep. 2025, 15, 15268. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Architecture diagram of the proposed MRKFL-FTS technique.

Figure 2. Block diagram of multi-objective radial kernelized federated learning.

Figure 3. Output of Radial Kernelized Support Vector Regression.

Figure 4. Classification accuracy versus number of virtual machines.

Figure 5. Experimental results of average resource utilization.

Figure 6. Average waiting time versus time slice.

Figure 7. Result analysis of computation overhead.

Figure 8. Precision versus number of virtual machines.

Figure 9. Recall versus number of virtual machines.

Figure 10. F1 score versus number of virtual machines.

Figure 11. Task scheduling efficiency.

Figure 12. Makespan result analysis.

Figure 13. Throughput result analysis.

Figure 14. Response time result analysis.

Figure 15. Fault prediction accuracy analysis.

Figure 16. Energy consumption result analysis.

Figure 17. Scalability result analysis.

Figure 18. Time complexity result analysis.

Figure 19. Overall performance results of task scheduling efficiency and fault prediction accuracy.

Figure 20. Overall performance results of makespan, throughput, response time, and time complexity.

Figure 21. Overall performance results of space complexity and energy consumption.

Figure 22. Overall performance results of scalability.

Figure 23. ANOVA test results.

Figure 24. Performance analysis of throughput under varying network conditions.

Figure 25. Impact of bandwidth vs. throughput.

Table 1. Parameter descriptions and performance goals.

Parameter	Description	Superiority/Inferiority
L	Latency	Achieve minimum latency
EC	Energy Consumption	Obtain minimum energy consumption
MEM	Memory	Achieve minimum memory
CC	Computational Cost	Obtain minimum computational cost
RT	Response Time	Attain minimum response time
M	Makespan	Achieve minimum makespan
Th	Throughput	Attain higher throughput
E	Efficiency	Attain higher efficiency
FPA	Fault Prediction Accuracy	Achieve enhanced accuracy
P	Precision	Attain higher precision
R	Recall	Obtain higher recall
F1	F1 Score	Achieve higher F1 score

Table 2. Summary of literature review with objectives, advantages, limitations, and results.

No.	Method Name	Objective	Advantages	Limitations	L	EC	MEM	CC	RT	M	Th	E	FPA	P	R	F1
[11]	ReadyFS algorithm	Fault-tolerant scheduling	Enhances resource utilization	Lack of minimizing makespan	✓	✓			✓
[12]	Hybrid optimization algorithms	Efficient task scheduling	Enhances scheduling efficiency	Throughput was not improved		✓	✓		✓				✓
[13]	AFTM	Address job scheduling problem	Enhances scheduling efficiency	Latency was high				✓	✓				✓
[14]	Dynamic, time-sensitive scheduling algorithm	Task scheduling	Reduces computation time	Lack of addressing fault tolerance	✓	✓	✓		✓	✓	✓
[15]	Quality of service (QoS)-aware scheduling	Fault tolerance during task execution	Enhance throughput	Inefficient in accurately scheduling tasks	✓		✓					✓
[16]	Bi-objective optimization	Task scheduling	Minimizes energy and makespan	Lack of improving the QoS-aware scheduling	✓	✓		✓	✓
[17]	M/M/n queuing model	Task scheduling	Minimizes response time	More resource utilization	✓	✓			✓		✓	✓
[18]	Rank-based, resource-aware fault-tolerant method	Task scheduling	Increases the system efficiency	Inapplicable for microservice-based cloud applications	✓	✓	✓			✓		✓	✓	✓	✓	✓
[19]	Adaptive fault detector method	Task scheduling	Minimizes the energy consumption and makespan	High time complexity	✓	✓	✓	✓	✓	✓	✓
[20]	E-DFT scheduling	Fault tolerance task scheduling	Reduces response times	Failed to improve the efficiency	✓	✓	✓	✓	✓		✓
[21]	Scheduling approach with linear scaling-crow search optimization	Workflow scheduling	Improves scheduling efficiently	Failed to enhance the security and scalability of medical data management systems		✓	✓			✓					✓
[22]	Dynamic task scheduling approach	Task scheduling	Minimizes the energy consumption	High time consumption		✓	✓
[23]	ILP method	Task scheduling	Minimizes computational resource utilization	High computational cost	✓	✓	✓	✓		✓
[24]	Fuzzy self-defense method	Multi-objective task scheduling	Efficient resource utilization	Inefficient to enhance task scheduling performance	✓			✓	✓		✓	✓	✓	✓	✓
[25]	Multi-objective algorithm incorporating Q-learning	Task scheduling	Minimizes makespan and enhances throughput	High latency	✓	✓	✓	✓	✓	✓	✓
[26]	Energy-efficient, fault-tolerant scheduling method	Fault-tolerant scheduling	Improves resource utilization	Less throughput	✓	✓	✓	✓
[27]	Fault-tolerant approach to achieve deadline-constrained task scheduling	Task scheduling	Minimizes the response time	Lack of improving scheduling efficiency	✓				✓	✓
[28]	Two semi-greedy-based algorithms	Map IoT tasks	Minimizes makespan	Higher latency	✓				✓	✓
[29]	Effective and adaptive fault-tolerant scheduling method	Error-free task scheduling	Enhances the throughput	Time consumption was not reduced	✓				✓	✓	✓
[30]	Heuristic algorithm	Fault-tolerant virtual network placement	Efficient resource utilization	High response time	✓		✓		✓		✓	✓
[31]	Task Scheduling–Decision Tree (TS-DT) method	Task scheduling	Minimizes makespan	Less throughput	✓			✓	✓		✓
[32]	Fault-tolerant scheduling approach	Fault-tolerant scheduling efficiently	Increases the resource utilization	More time complexity	✓							✓	✓
[33]	EAFT method	Task scheduling	Improves efficiency	High resource utilization	✓				✓		✓	✓
[34]	FT-ERM approach	Task scheduling	Improves efficiency	More time consumption	✓				✓			✓
[35]	Lightweight task scheduling approach	Task scheduling	Reduces the response time	Inefficient in geographically distributed scenarios	✓				✓	✓
[36]	QoS-based resource allocation and scheduling	Task scheduling	Minimizes the delay	High response time	✓	✓	✓	✓			✓
[37]	Enhanced task scheduling approach	Task scheduling	Improves execution efficiency and reduces energy consumption	Makespan was not minimized	✓	✓			✓	✓
[38]	HEFT algorithm	Task scheduling	Minimizes the time consumption	High memory consumption	✓			✓	✓	✓
[39]	Cost-based Job Scheduling	Job scheduling	Enhances the distributed scheduling	Inefficient resource utilization	✓							✓
[40]	Energy- and Performance-Efficient Task Scheduling method	Task scheduling	Minimizes energy consumption	High response time	✓	✓		✓
[41]	SSA	Task scheduling	Minimizes makespan and improves throughput and resource utilization	Failed to reduce latency	✓	✓	✓		✓	✓
[42]	Simple scheduling method	Task scheduling	Minimizes the task running time and complexity	Lack of efficient resource utilization	✓			✓	✓
[43]	Optimal hybrid metaheuristic algorithm	Task scheduling	Enhances task scheduling efficiency and reduces makespan	Less efficiency		✓	✓			✓
[44]	Enhanced load balancing methods	Dynamic task distribution	Minimizes resource utilization	High latency	✓	✓	✓		✓	✓
[45]	Hybrid multi-objective optimization algorithm	Task scheduling	Allocates the resources efficiently	Failed to apply hybrid cloud computing model	✓	✓	✓	✓		✓	✓
[46]	Hybrid genetic algorithm	Task scheduling	Reliable and cost-efficient task scheduling	Dynamic failure-aware task scheduling remained unaddressed	✓	✓	✓	✓
[47]	Hybrid optimization algorithm	Task scheduling	Minimizes energy consumption	High response time	✓	✓			✓	✓
[48]	Machine learning inference-serving models	To optimize inference and improve scalability	Improves system efficiency with minimal latency	Failed to perform integration between serverless platforms and ML frameworks	✓			✓	✓
[49]	Detailed taxonomy of several serverless computing models	Handling the cold start latency	Optimizes loading time as well as resource usage	Lack of improving efficiency	✓				✓	✓
[50]	Function placement approaches	Serverless computing	Improves system efficiency	Execution time, resource consumption, and cost were not reduced	✓				✓		✓
[51]	Q-learning Auto-scaling Offloading	To enhance offloading	Minimizes latency, cost, and energy	Computation load prediction was not performed	✓	✓	✓	✓			✓
[52]	Review of content caching approach	Edge computing environment	Enhances processing performance	Model has limited storage capacity	✓	✓			✓	✓	✓
[53]	Cold start latency mitigation approaches	Serverless computing platforms	Minimizes response time	Resource utilization was not minimized	✓	✓		✓	✓
[54]	Scheduling approaches	Serverless computing	More energy-efficient and deadline-aware scheduling	Less efficiency	✓	✓					✓
[55]	Function offloading methods	Serverless computing	Improves system efficiency and reduces latency	Throughput was not improved	✓			✓	✓	✓
[56]	Hybrid task scheduling technique	Task scheduling	Enhances makespan, energy, cost, and fault tolerance performance	Lesser throughput	✓		✓		✓		✓	✓
[57]	RTATSA2C	Resource-efficient task scheduling	Reduces makespan and improves reliability, fault tolerance, trust, and scalability	Efficiency of the algorithm was not improved	✓			✓			✓	✓	✓	✓

Table 3. Dataset features and descriptions.

S.No	Feature	Description
1	row id	Unique identifier for each row entry
2	account id	Identifier assigned to each user account
3	file size (task size)	Size of the file associated with the task (in bytes or MB)
4	operation_time_start	Start time of the task operation
5	operation_time_end	End time of the task operation
6	time zone	Time zone in which the operation occurred
7	operation_id	Unique identifier for the operation performed
8	operation type	Type of task operation (e.g., upload and download)
9	bandwidth trace	Bandwidth usage trace data during the operation
10	node_ip	IP address of the node executing the task
11	node_name	Name of the server or node used for processing
12	quoto_start	Initial allocated quota/storage at the start of the task
13	quoto_end	Remaining quota/storage at the end of the task
14	quoto_total	Total storage capacity allocated
15	capped	Whether the storage was capped (Yes/No or Boolean)
16	failed	Flag indicating if the task failed (1 = failed; 0 = successful)
17	failure info	Detailed information about failure, if any

Table 4. Experimental settings and requirements.

Simulation Settings
Number of tasks used in simulation	1000–10,000
Number of virtual machines	10–50
Network bandwidth of VMs	10,000 Kbps
Memory of VMs	4096 MB
Energy of VMs	100 kWh
Operating system	Windows 7 and above
Software used	CloudSim 3.0, Java
Simulation environment	CloudSim 3.0
Physical Node Specifications
Processor	Intel^® Core™ i3
RAM	16 GB
Hard disk capacity	1 TB
Java Software and Hardware Requirements
Operating system	Windows 10 and above
Programming language	Java
System	Intel Core i3 processor
RAM	4 GB
Hard disk	512 GB
Mouse	Logitech
Keyboard	110 keys enhanced

Table 5. Experimental results of classification accuracy.

VMs	MRKFL-FTS	HFTSA	FD-SWMS	Hybrid EMD-Transformer-BiLSTM
10	98	94	92	90
20	97	93	90	89
30	99	93	91	90
40	98	92	90	88
50	97	93	92	90

Table 6. Experimental results of average resource utilization (%).

VMs	MRKFL-FTS	HFTSA	FD-SWMS	Hybrid EMD-Transformer-BiLSTM
10	18	22	25	28
20	25	30	33	35
30	33	40	44	46
40	40	45	50	55
50	51	56	60	62

Table 7. Average waiting time (ms) for different time slices.

Time Slice t (ms)	MRKFL-FTS	HFTSA	FD-SWMS	Hybrid EMD-Transformer-BiLSTM
0	0	0	0	0
1	4	6	8	9
2	5	7	9	10
3	7	8.9	10.2	11.2
4	9	11.7	12.8	13.8
5	8.5	12.4	14.5	15.5
6	11.2	13.8	15.2	16.2
7	9.7	11.4	13	14
8	8.5	10.5	12.8	13.8
9	9.8	11.9	14.2	15.2
10	10.7	13	15.8	16.8

Table 8. Computation overhead (ms) for different virtual machine configurations.

Number of VMs	MRKFL-FTS	HFTSA	FD-SWMS	Hybrid EMD-Transformer-BiLSTM
10	65.0	72.0	80.6	82.6
20	70.3	75.8	82.8	84.8
30	75.8	82.6	90.6	95.6
40	80.2	90.3	98.5	100.5
50	90.3	104.5	112.6	114.6

Table 9. CPU cost and task scheduling efficiency for various techniques.

Number of Requests	CPU Cost (AED)	Task Scheduling Efficiency (%)
Number of Requests	CPU Cost (AED)	MRKFL-FTS	HFTSA	FD-SWMS	Hybrid EMD-Transformer-BiLSTM
1000	0.000137	96.2	93.5	91.2	90.2
2000	0.000156	97.75	91.2	89.25	88.25
3000	0.000155	97.73	93.33	90.5	89.5
4000	0.000146	96.4	91.3	89.125	88.125
5000	0.000140	97.3	92.66	88.2	86.2
6000	0.000159	96.41	92.75	90.2	88.2
7000	0.000158	97.72	93.21	90.35	89.35
8000	0.000149	96.42	91.56	89.45	87.45
9000	0.000139	97.91	90.02	88.03	86.03
10,000	0.000159	96.25	89.69	87.15	86.15

Table 10. Total cost estimation using MRKFL-FTS method.

VMs	Tasks	CPU Cost (AED)	Data Cost (AED)	Memory Cost (AED)	Bandwidth Cost (AED)	Total Cost (AED)
1	100	0.000135	0.001600	0.000016	0.0180	0.01975
2	200	0.000155	0.003200	0.000025	0.0350	0.03838
3	300	0.000152	0.005300	0.000026	0.0540	0.05948
4	400	0.000144	0.007200	0.000018	0.0740	0.08136
5	500	0.000165	0.009000	0.000025	0.0900	0.09919

Table 11. Space complexity for various techniques with formulas and values.

Technique	Space Complexity Formula	Evaluated Space Complexity
MRKFL-FTS	$S C = O (d + K)$	$O (15 + 66245) = O (66260)$
HFTSA [1]	$S C = O (n^{2})$	$O (1000^{2}) = O (1000000)$
FD-SWMS [2]	$S C = O (K)$ or $O (min (n, K))$	$O (66245)$ or $O (1000)$
Hybrid EMD-Transformer-BiLSTM [58]	$S C = O (b^{d / 2})$	$O (2^{15}) = O (32768)$

Table 12. ANOVA test summary.

Source	Sum of Squares	df	Mean Square	F-Statistics	p-Value
Between the class	SB = 8.715	$d f_{bet}$ = 1	$M S E_{bet}$ = 8.715	86.283	0.00000413
Within the class	SW = 0.808	$d f_{w}$ = 8	$M S E_{w}$ = 0.101	-	-
Total Mean	50.5100

Table 13. Impact of bandwidth vs. throughput.

Bandwidth (Kbps)	Throughput (requests/s)
180	192
200	233
300	321
510	514

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

S, S.R.; Aburukba, R. Federated Learning-Driven IoT Request Scheduling for Fault Tolerance in Cloud Data Centers. Mathematics 2025, 13, 2198. https://doi.org/10.3390/math13132198

AMA Style

S SR, Aburukba R. Federated Learning-Driven IoT Request Scheduling for Fault Tolerance in Cloud Data Centers. Mathematics. 2025; 13(13):2198. https://doi.org/10.3390/math13132198

Chicago/Turabian Style

S, Sheeja Rani, and Raafat Aburukba. 2025. "Federated Learning-Driven IoT Request Scheduling for Fault Tolerance in Cloud Data Centers" Mathematics 13, no. 13: 2198. https://doi.org/10.3390/math13132198

APA Style

S, S. R., & Aburukba, R. (2025). Federated Learning-Driven IoT Request Scheduling for Fault Tolerance in Cloud Data Centers. Mathematics, 13(13), 2198. https://doi.org/10.3390/math13132198

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Federated Learning-Driven IoT Request Scheduling for Fault Tolerance in Cloud Data Centers

Abstract

1. Introduction

2. Related Works

3. Proposal Methodology

3.1. Cloud IoT Environment Model

3.2. Multi-Objective Radial Kernelized Federated Learning

3.3. Weighted Round-Robin IoT Request Scheduling

4. Experimental Setup

4.1. Experimentation Results

4.1.1. Classification Accuracy

4.1.2. Average Resource Utilization

4.1.3. Average Waiting Time

4.1.4. Computation Overhead

4.1.5. Result Analysis of Precision, Recall, and F1 Score

5. Performance Analysis

5.1. Task Scheduling Efficiency

5.2. Makespan

5.3. Throughput

5.4. Response Time

5.5. Fault Prediction Accuracy

5.6. Energy Consumption

5.7. Scalability

5.8. Time Complexity

5.9. Cost

5.10. Space Complexity

6. Overall Performance Analysis

6.1. ANOVA TEST

6.2. Performance of MRKFL-FTS Technique Under Network Fluctuations or Varying Conditions

7. Case Study of MRKFL-FTS Technique with Amazon Cloud Data Centers

8. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI