A Proposal for a Tokenized Intelligent System: A Prediction for an AI-Based Scheduling, Secured Using Blockchain

: Intelligent systems are being proposed every day as advances in cloud systems are increasing. Mostly, the services offered by these cloud systems are dependent only on their providers, without the inclusion of services from other providers, specialized third parties, or individuals. This ‘vendor lock-in’ issue and the limitations related to offering tailored services could be resolved by allowing multiple providers or individuals to collaborate through intelligent task scheduling. To address such real-world systems’ limitations in provisioning and executing heterogeneous services, we employed Blockchain and Deep Reinforcement Learning here; the first is used for the token-based secured communication between parties, and the latter is to predict the appropriate task scheduling; hence, we guarantee the quality of not only the immediate decision but also the long-term. The empirical results show a high reward achieved, meaning that it accurately selected the candidates and adaptably assigned the tasks based on job nature and executors’ individual computing capabilities, with 95 s less than the baseline in job completion time to maintain the Quality of Service. The successful collaboration between parties in this tokenized system while securing transactions through Blockchain and predicting the right scheduling of tasks makes it a promising intelligent system for advanced use cases.


Introduction
We are currently depending on cloud computing systems for our everyday tasks, whether it is for work, personal, commerce, financial transactions, or data storage tasks, to name a few.Cloud services can be offered through a single provider or multiple clouds federated as a single one with a variety of services, called federated cloud, which is the interconnection and cooperation of multiple cloud computing environments [1].The IEEE 2302-2021 Standard for Intercloud Interoperability and Federation (SIIF) highlights the importance of establishing common mechanisms and APIs to enable resource sharing among different stakeholders, fostering the testing of complex technologies in a federated environment [2].The need for more secured heterogeneous cloud services from federated clouds and the demand for high efficiency in performing tasks on multiple clouds are increasing nowadays, especially in cases that require high confidentiality and reliability [3][4][5][6].
As known, there are potential risks associated with cloud computing, like exposure or leakage of data, unauthorized access to data, insider threats, regulatory violations, and others [7][8][9][10].Also, there are additional challenges that need to be managed to make it more suitable for advanced use cases.For example, sensitive data or computing tasks may require a careful selection of cloud resources to process these tasks securely and efficiently, or they might need specialized software or hardware to properly execute these tasks.Therefore, to minimize the centralization issue of cloud platforms on their respective providers, we can benefit from the evolution of web 3.0, which allows us to utilize the capabilities of emerging technologies to overcome these challenges to some extent and achieve the desired enhancements.One of these technologies is Blockchain technology, with its unique features that include immutability, decentralization, enhanced security, and faster settlement through its smart contract system [11].
Blockchain is like a distributed state machine that operates over a consensus, it is a place in which to store information, and we do not need central authorities or external parties to ensure the integrity of that data.It is considered as a database that operates over different kinds of technologies that allow you to store and to process data without the intervention of third parties, where you can just trust the network itself that is going to be completing the job [12].Blockchain is suitable for handling reliable interactions and secured sharing of messages between parties in the Blockchain.This distributed system is immutable, so all transactions of all parties are recorded in a reliable way, either in a permissioned or permissionless Blockchain. Figure 1 shows the Blockchain architecture.
Systems 2024, 12, x FOR PEER REVIEW 2 of 21 tasks.Therefore, to minimize the centralization issue of cloud platforms on their respective providers, we can benefit from the evolution of web 3.0, which allows us to utilize the capabilities of emerging technologies to overcome these challenges to some extent and achieve the desired enhancements.One of these technologies is Blockchain technology, with its unique features that include immutability, decentralization, enhanced security, and faster settlement through its smart contract system [11].
Blockchain is like a distributed state machine that operates over a consensus, it is a place in which to store information, and we do not need central authorities or external parties to ensure the integrity of that data.It is considered as a database that operates over different kinds of technologies that allow you to store and to process data without the intervention of third parties, where you can just trust the network itself that is going to be completing the job [12].Blockchain is suitable for handling reliable interactions and secured sharing of messages between parties in the Blockchain.This distributed system is immutable, so all transactions of all parties are recorded in a reliable way, either in a permissioned or permissionless Blockchain. Figure 1 shows the Blockchain architecture.Blockchain-enabled systems provide more enhanced capabilities as compared to conventional cloud-based systems, as shown in Figure 2, and allow for open-ended opportunities and capabilities for Society 5.0 and Industry 4.0 [13].Blockchain-enabled systems provide more enhanced capabilities as compared to conventional cloud-based systems, as shown in Figure 2, and allow for open-ended opportunities and capabilities for Society 5.0 and Industry 4.0 [13].One use case of utilizing such capabilities, which is the proposed system here in this paper, is to split a job into multiple tasks or chunks to be processed separately through multiple different nodes "executors" that are connected though a Blockchain network to One use case of utilizing such capabilities, which is the proposed system here in this paper, is to split a job into multiple tasks or chunks to be processed separately through multiple different nodes "executors" that are connected though a Blockchain network to execute the required tasks without involving any intermediary, making use of its smart contracts to guarantee a quick and secured settlement and transparent task execution.We employed Blockchain technology here to immutably store all transactions to avoid the possibility of data manipulation or the violation of Quality of Service (QoS), also to link the service requester to the distributed executors, whether they are cloud resources or individual service providers.This way we can achieve confidentiality through executing fragmented sensitive tasks, perform compute-intensive tasks that are not possible using a single machine, and execute tailored services that require specialized hardware or software, all without the constraints of a central provider.In this case, the individual computation capability of each executor impacts the overall execution efficiency, which will be considered before being selected to execute the required job.Also, there should be a mechanism of reputation to be considered before selecting an executor to ensure execution trustworthiness, instead of randomly choosing executors or choosing them only based on stakes.It is known that task scheduling affects the efficiency of execution, so designing their schemes properly is an important challenge.Therefore, we want to design a mechanism that can address the aforementioned issues in an efficient way.
It is difficult to use traditional optimization mechanisms that are based on static optimization, which may not be suitable for the dynamic and complex features of distributed environments such as cloud computing [14].Optimizing the performance on the long term requires designing an efficient policy that considers the challenges related to the dynamicity of the system, like workload nature, QoS requirements, and the characteristics of candidates (potential executors), which usually have an enormous state space.To tackle the issue of dynamicity and complexity of the federated cloud environment, we can involve artificial intelligence (AI) here to further enhance the proposed system.Nowadays, AI is being utilized in many different fields, with endless applications including prediction, Internet-of-Things (IoT), classification, detection, recognition, etc. [15][16][17][18][19][20].The AI technique that we employed here is Deep Reinforcement Learning (DRL), a promising approach in handling advanced control problems [21][22][23].It comprises Reinforcement Learning (RL) and Deep Learning (DL), where RL learns to make the right decisions through taking the best-known actions then, based on the action taken, receives penalties or rewards as feedback.Whereas, DL approximates the complex functions where deep neural networks learn to represent the value of actions in different states to correlate state and action that are correspondent with the value function.The incorporation of such techniques is also called hyper-heuristics, which are high-level search methodologies [24][25][26].
The decisions in DRL are made under the guidance of deep neural network (DNN), which provides precise prediction and estimation that lead to a long-term optimal solution.As it is not an easy-to-handle problem, we employed the Deep Q-Learning (DQL) technique, which is a specific algorithm within DRL that specifically addresses the problem of learning the action value for a given state to tackle the challenges of selecting the proper executor and resource allocation, which helps in reducing the huge size of the action space.This significant reduction resulted from applying Deep Q-Network (DQN) that eliminates the need for a table to store Q-values for each state-action pair; instead, it utilizes the DNN to estimate these Q-values.
The contribution of this work is to build a proof-of-concept system that allows service requestors to benefit from broad range of services offered heterogeneously by cloud providers and, later, individuals in a decentralized collaborative way that minimizes the centralization issue of single cloud platforms.Also, it enhances the overall process security in terms of immutability and integrity of all related transactions.This incentive-based system integrates DQL and Blockchain to predict the appropriate task scheduling for executors that are available on federated clouds, avoiding potential vulnerabilities and taking advantage of the smart contracts feature.This proof-of-concept system is validated on a federated cloud environment and successfully performs a job through choosing suitable executors for that job by considering their computing capabilities and job type.This system is applicable in different fields including, but not limited to, trading systems, E-commerce, E-government, open-source intelligence (OSINT) and others.
The rest of the paper is structured as follows: Section 2 summarizes the literature review, Section 3 describes the proposed system, Section 4 discusses the results of the implementation, and finally, Section 5 concludes the paper and briefly mentions the planned future work related to this topic.

Literature Review
Blockchain technology has recently gained significant attention, as it fits in almost everywhere while still providing its distinct features.While investigating the related work for this research, we found that most of the recent studies that combine Blockchain with DRL for cloud systems were mostly applied in the two fields of Internet-of-Things (IoT) and Industrial IoT [13,16,[27][28][29][30].On the other hand, only a few studies have been completed combining Blockchain and DRL in the field of token-based task scheduling prediction on federated cloud systems and for participants' evaluation, which is our focus in this work.Hence, we are highlighting some of this partially-completed related work that inspired us to work on this proof-of-concept.
Xiao, H et al. [31] have proposed a decentralized architecture to develop a Blockchainsupported Internet of Vehicles (IoV) system using an algorithm that is based on Deep Reinforcement Learning for resource optimization.They formulated a resource optimization problem and exploited the Deep Reinforcement Learning algorithm to determine the allocation scheme.
Gao, S et al. [32] have also built a resource allocation and task offloading mechanism for a video frame resolution scaling that is based on DQL, they formulated an optimization problem of cost minimization of energy, delay, and accuracy to improve the experience quality of the user equipment; they utilized DQN for dimensionality reduction while selecting the action for offloading and resource allocation, resulting in an accelerated training speed through overcoming the sparseness of single-layer reward functions.
In the field of cloud-edge-end cooperation environments, Fang, C et al. [33] have solved the issue of the overloaded mobile traffic that is resulted from the Internet services by designing a DRL-based optimization mechanism to allocate resources in heterogeneous environments with cloud-edge-end collaboration, focused on improving the distribution of content, where they schedule the tasks of the arrived content requests according to the historical requests from users.In another study supported by edge computing, Quan, T et al. [34] have designed a seismic data query mechanism using DQN for mobile edge computing; they formulated a minimization optimization problem aiming to minimize the task delay in the long-term by considering the computing capacity constraints of edge servers, resulting in outstanding performance results.
Most of the related studies have considered their proposed ideas as optimization problems, either minimization or maximization, to minimize the cost of any aspects: delay, energy, resources, operating expenses, etc., or to maximize the reward, performance, etc. Todorović, M et al. [35] have proposed a consensus mechanism based on PoUW that assumes solving instances of optimization problems for more efficient utilization of computational resources and provision incentives for the Blockchain participants.
In [36], they designed a multi-objective mechanism to optimize the response time for task scheduling and energy consumption in an edge-cloud collaborative environment.This design is based on the A3C algorithm to propose a task scheduling policy optimization algorithm, with acceptable simulation results confirming its ability to reduce the overall response time and the calculated energy consumption of their proposed system.
For the purpose of handling the problem of the unpredictability of tasks and QoS requirements of users in Fog Computing, the authors in [37] have proposed three different DRL-based solutions, with the aim of maximizing rewards with reference to resource utilization, designed as the Markov Decision Process, by taking into consideration the task priority, energy consumption, and latency as the QoS factors.Another effort has been made in [38] for a trusted supervision of federated learning and malicious node attacks by smart contract; they proposed the concept of a decentralized trusted computing sandbox, which is a federated learning multi-task scheduling mechanism using DRL to solve the resource scheduling optimization problem.Both of these two studies have not incorporated Blockchain in their proposed ideas of resolving the scheduling optimization problem.On the other hand, ref. [39] has proposed a DRL-aware Blockchain-based task scheduling system for healthcare applications through a solution that provides makespan efficient scheduling to solve the issues of task scheduling, security, and the cost related to processing tasks in the IIoT healthcare paradigms.We have also been inspired by the work carried out in [40], where they have introduced a multi-agent collaborative DRL-based scheduling algorithm with DQN in Mobile Edge Computing (MEC) to solve the issues of low latency, offloading, and task scheduling in MEC using Karush-Kuhn-Tucker (KKT) to minimize the total latency and DQN to reduce the energy consumption.
To the best of our knowledge, and compared to IoT-related contributions, a few studies have been carried out for enhancing the utilization of federated clouds, and none of the studies in the literature review have proposed an incentive-based system integrating the Blockchain features with DQL in federated clouds, nor have they utilized the DRL capabilities to select nodes (executors) reliably by predicting efficient task scheduling.In addition, the effectiveness of Blockchain-enabled cooperative execution is not yet studied thoroughly, and there should be more evaluation mechanisms for selecting the proper distributed executors.Therefore, we are trying here to fill this research gap and propose this system as a proof-of-concept applicable in use cases that minimize cloud providers' lock-in issue by allowing for heterogeneous services from federated cloud and, later, the individuals who offer specialized token-based services.

Methodology
The methodology we followed here was used to build a proof-of-concept solution that integrates Blockchain technology with a DRL approach with the purpose of enabling a collaboration among the distributed executors using a rewarding mechanism as an incentive, also for the purpose of creating an evaluation mechanism for selecting the proper executors.This methodology should allow for secured transactions on federated clouds by taking advantage of the immutable Blockchain and the smart contract features, while also utilizing the DRL approach to predict appropriate task scheduling.
The notations used in this section are tabulated in Table 1 below, with a description of each notation to help readers easily understand the abbreviations and algorithms.In the following subsections, we explain the workflow and the design of the proposed concept and its architecture, in addition to how we defined the state space, action space, and reward function.

Workflow of the Proposed System
The requester submits a job request with the dedicated reward and the number of tokens Q needed to pay for the successfully executed job.This request transaction includes the workload (number of tasks N), the QoS requirement (assumed as timeout T ), and the reward Q.To handle these transactions, as we adopted DPoS here, the block creators (elected by the stakeholders) will generate a block and append it to the Blockchain; then, after reaching a consensus, they will add the produced block permanently to the Blockchain.This job request transaction is broadcasted through the BCM and received by the Predictor, the DQL agent, which will assess the job type and perform the algorithm to select the suitable executor set θ E from the already available candidate set θ C .This selection is carried out through the evaluation process of the candidates, which evaluates the S, V, and F values of each candidate.Ideally, any service providers or nodes on the network who have submitted an execution availability transaction can be considered as candidates, but here, the Predictor collects and groups all resources (including VMs as executors) that are available in the federated cloud into a candidate set θ C .
After candidate evaluation and executor selection, and based on the selected executors, the Predictor will design the appropriate policy of task scheduling on the selected executors and send it to the requester; the result of this selection is also appended to the Blockchain.Considering that the selected executors in θ E are VM instances, the FCM will initiate and initialize the designated instances to be ready by uploading the related folders with scripts to execute the workload.The requester should then dispatch the required job tasks to the executor and send the output IDs to the Blockchain to announce job acceptance and the executor acknowledgment for the job approval, which, in turn, will execute the job based on the specified policy.
For the executors, to obtain the highest reward, they should allocate the related computational resources to successfully execute the job and meet the QoS requirements efficiently.After successful job execution, they send a completion announcement on the Blockchain when it is completed.This process will either return true if successful (no violations of QoS) or false if otherwise.Finally, and only if successful, the smart contract will release the allocated tokens, and the executor will obtain the reward.
All these transactions, along with the token ledger, are secured by and immutably recorded on the Blockchain, guaranteeing the transparency and integrity of transactions for both the requester and executor, where all parties are connected through the Blockchain network.Figure 3 illustrates how different parties are connected in the proposed concept.violations of QoS) or false if otherwise.Finally, and only if successful, the smart contract will release the allocated tokens, and the executor will obtain the reward.All these transactions, along with the token ledger, are secured by and immutably recorded on the Blockchain, guaranteeing the transparency and integrity of transactions for both the requester and executor, where all parties are connected through the Blockchain network.Figure 3 illustrates how different parties are connected in the proposed concept.The workflow of the proposed system is formulated in Algorithm 1 below, whereas Algorithm 2 explains the steps taken by the Predictor (Deep Reinforcement Learning agent) to select the appropriate executors and build task scheduling and resource allocation policy by solving problem 1, as shown in (9).These steps utilize a DQL (Deep Q-Network) that employs a neural network with a pre-trained main Q-Network to predict The workflow of the proposed system is formulated in Algorithm 1 below, whereas Algorithm 2 explains the steps taken by the Predictor (Deep Reinforcement Learning agent) to select the appropriate executors and build task scheduling and resource allocation policy by solving problem P1, as shown in (9).These steps utilize a DQL (Deep Q-Network) that employs a neural network with a pre-trained main Q-Network to predict the Q-value for a given state, through two phases: 1.
Estimating the "state-action" value function that is correspondent to an action-value function (offline DNN construction, steps 4 to 6 in Algorithm 2); 2.
Action selection and dynamic network updating (online dynamic Deep Q-Learning, steps from 7 to 23 in Algorithm 2).
The "offline" phase here refers to setting up the DNN before the interaction with the environment, where we decide the architecture in terms of the number of layers, the configured neurons in each layer, and the applied activation functions.First, the initialization of the Main Q-Network weights with small random values to break the symmetry and allow for more proper learning.Then, the initialized Main Q-Network is duplicated by creating a Target Q-Network, ensuring both have the same initial weights.The use of two networks reduces the correlations between the Q-values being predicted and the targets used for training to stabilize learning.The Experience Replay is also initialized in this phase, which will store the Predictor's experiences in terms of state, action, reward, and next state tuples.Lastly, in this phase, we pre-trained the network using the Azure 2019 workload dataset to speed up learning.
The "online" phase involves the actual learning process, where the Predictor interacts with the environment and updates the DNN based on its experiences.Using an ε-greedy strategy, the Predictor decides whether to take a random action (explore) or the action suggested by the current policy (exploit).For each action taken, the agent receives a new state and a reward from the environment, and the tuple (current state, action, reward, next state) is stored in the Experience Replay.Periodically, a batch of experiences from Experience Replay are sampled to allow the DNN to learn from a more diverse set of experiences, reducing the correlation between consecutive learning updates and improving stability.Using sampled experiences, the Predictor calculates target Q-values using the Bellman equation, and these targets reflect the expected future rewards.To minimize the loss between the predicted Q-values and the target Q-values, we use the loss function.The weights from the DNN are copied, after several learning updates, to the Target Q-Network to update the targets.These steps allow the DQN to effectively learn policies that map the states to the actions to maximize the cumulative reward, accounting for both immediate and future rewards.Using sampled experiences, the Predictor calculates target Q-values using the Bellman equation, and these targets reflect the expected future rewards.To minimize the loss between the predicted Q-values and the target Q-values, we use the loss function.The weights from the DNN are copied, after several learning updates, to the Target Q-Network to update the targets.These steps allow the DQN to effectively learn policies that map the states to the actions to maximize the cumulative reward, accounting for both immediate and future rewards.

System Design and Architecture
We designed the architecture of this system to be based on DRL to solve the complexity of the solution and to maintain high scalability, benefitting from its distinct performance features from RL and DNN.RL adopts a Q-table to store Q-values to determine the best way to get the best results by adhering to a specific methodology; hence, it generates near-optimal control actions through immediate reward feedback by interacting with its environment [41,42].This makes RL effective for resource allocation in dynamic systems such as the federated cloud environment.However, sometimes the decisions becom complex for the RL approach, making it harder for the Q-table to accommodate; therefore, the DNN is applied to help with estimating the potential states rather than mapping every solution to determine the reward path, which reduces the action space size.The resultant approach is DRL, which we utilized here in our system design to handle the dynamicity of tackling advanced control problems that have high-dimensional state space [43]; particularly, we adopted DQL for selecting actions and dynamically updating the network.The architecture of the proposed system is shown in Figure 4.
Some of the quality attributes we can achieve through this design are the following: accommodate; therefore, the DNN is applied to help with estimating the potential states rather than mapping every solution to determine the reward path, which reduces the action space size.The resultant approach is DRL, which we utilized here in our system design to handle the dynamicity of tackling advanced control problems that have highdimensional state space [43]; particularly, we adopted DQL for selecting actions and dynamically updating the network.The architecture of the proposed system is shown in Figure 4. Some of the quality attributes we can achieve through this design are the following: The role of the task scheduling and resource allocation module, the Predictor, is to perform the DQL algorithms to predict the appropriate task scheduling and resource allocation policy.For that, we designed a mechanism of evaluation to assist in the process of executor selection from the available candidates by considering the computation capability of the candidates in addition to their stakes and reputation.The DQL approach is used here to deal with the complexity of the system and the huge action space related to the execution workloads, QoS requirements, and characteristics of the candidates, as explained below, which requires us to design an approach to create a manageable solution space in the decision process.To some extent, this methodology handles the enormous state space and efficiently meets the workload adaptation and QoS requirements [43,44].
The Blockchain Manger (BCM), on the other hand, manages all Blockchain-related transactions and appends them to the distributed ledger, including the transactions from the Predictor and both parties, the Requestor and the executor.This makes a Blockchain-enabled system that allows service providers to be the "executors" (for some tokens) and offers its services to the "requesters", who pay the tokens to avail these services or to carry out the completion of their tasks, leading to a managed token-based economy.In this system, we have adopted the Delegated Proof-of-Stake (DPoS) as the consensus mechanism, as it is flexible, efficient, and fast as compared to PoW and PoS [45].In this consensus mechanism, stakeholders do not create the blocks themselves; they vote to elect the delegates, also referred to as witnesses or block producers, giving them the right of creating blocks; hence, there is no computational power consumption for the stakeholders.There is also a Federated Cloud Manager (FCM) that manages all connected clouds in the federated cloud with their resources, of any type.It translates all actions related to resource allocation to the respective clouds, and coordinates with BCM for a distributed collaborative environment hosting the Blockchain ledger.The services available in the federated cloud are collected and grouped by a module called the Resource Collector (RC).A process diagram of these modules is simplified in Figure 5 below.system, we have adopted the Delegated Proof-of-Stake (DPoS) as the consen mechanism, as it is flexible, efficient, and fast as compared to PoW and PoS [45].In t consensus mechanism, stakeholders do not create the blocks themselves; they vote to el the delegates, also referred to as witnesses or block producers, giving them the right creating blocks; hence, there is no computational power consumption for the stakeholde There is also a Federated Cloud Manager (FCM) that manages all connected clou in the federated cloud with their resources, of any type.It translates all actions related resource allocation to the respective clouds, and coordinates with BCM for a distribu collaborative environment hosting the Blockchain ledger.The services available in federated cloud are collected and grouped by a module called the Resource Collector (R A process diagram of these modules is simplified in Figure 5 below.In the following, we define the state space of the proposed solution, how we set the action space in the simplest way, and, finally, how we formulated the reward function. State space: In a decision epoch T i (i = 1, 2, 3, . ..), the state S i is the union of the following: job workload (number of tasks N i ), QoS requirement (here, timeout T i ), candidate's stake S i m , candidate's reputation value V i m , and computational resource F i m , where candidate m belongs to the candidate set θ C .Hence, the state space is represented as follows: i here, and in the rest of the paper, denotes the variable in decision epoch T i .This is not to be confused with T , which is the timeout as the QoS requirement.For the candidate's stake, S i m , it comprises its own stake and the stake of others who delegate towards it; the higher the stake the better, as the candidates with higher stakes technically have more influence and a higher probability of being selected as an executor.For the reputation value, V i m , we have adopted a similar concept to the reputation-based rating mechanism proposed by Liu, G et al. [46] to determine the candidates' reputation value.
Action space: The Predictor here should properly select a set of executors out of the candidates available, which will be carried out just after a job is released with a detailed workload.While choosing the executors, it should also design the task scheduling policy for the selected executors, along with the resource allocation policy to execute the job successfully.Thus, we represent the action in decision epoch T i as: Here, , where a i m = 1 indicates that the candidate U C m is selected as the executor, and a i m = 0 indicates that candidate U C m is not selected as the executor.Once the action a i is taken, the Predictor chooses the executors set θ i E from the available candidates set θ C .This executors set is represented as and, therefore, θ i E has a cardinality of θ i E as the number of elements inside.The execution scheduling profile is and the computational resources required by the executors to perform the job are denoted as f = { f m }, f m ≤ F m , where F = {F m } is the computational resources available, so the computational resource allocation profile is Reward function: The reward which executors are receiving through executing the required job collaboratively is what we mean by reward here.This involves two things: the income of completing the job (which affects the executor's reputation value), and the received block reward as a stakeholder (which affects the executor's stake).The function can be formed as: where Q is the reward (token/task), wS i j is the weight of stake, and V max is the reputation value, which should be ≤1.But first, we must ensure that the executor has satisfied the assumed QoS requirement, the timeout; so, the executer must complete the assigned n i j job tasks before the timeout to receive the reward.Hence, we will consider the duration D as a constraint: n i j D i j ≤ T i ∀U c j ∈ θ E .This constraint is calculated by: where ψ I represents the individual size of each job task in bytes, X is the computation intensity in CPU cycle/byte.The reward function here should represent the maximum reward achievable while also satisfying the QoS requirements.As a result, the reward function will be rewritten to be as follows: Task scheduling: After determining the executor set θ E that has been selected out of the candidate set θ C , the Predictor should maximize the cumulative rewards while satisfying the constrains related to the QoS requirements, i.e., it should build an efficient decision of task scheduling and resource allocation, taking into consideration the constraints.This decision will be considered as an optimization problem that should be solved to establish the best resource allocation strategy suitable for the executors selected.Usually, the optimization problem of the decision leads to a non-convex NP problem, meaning that determining the best strategy for allocating resources is difficult, especially in decentralized environments [32].If we look at the action space A i in (2) and the reward function R i S i , A i in (8), they form a difficult problem to solve due to the large space size and the constrained reward function.Hence, we formulated the scheduling problem into an optimization problem to satisfy the constraints and to maximize the achieved reward, Q, to the highest possible, as follows: The objective function is derived from (8), where constraint C1 is the validity of the task scheduling, as already represented in (4).C2 is the satisfaction of the QoS (the timeout T ), as formulated in (7).Finally, C3 is the computational resource allocation, as defined in (5).Since the problem includes constraints and involves randomness and uncertainty, it is computationally expensive to use traditional methods to find the best option to allocate the resources; this is because of the non-convex optimization problem [23].For that, we employed the Deep Q-Learning approach to address this optimization problem for task scheduling and resource allocation.Deep Q-Learning was adopted to predict the workload in the federated clouds.We have built on the work carried out by Ahmed, Z. et al. [47]; they used DQL to predict the workload and demonstrated a significant improvement over popular existing scheduling methods that include Genetic Algorithm, Heteroscedastic Gaussian Processes, Median Attribute Deviation Minimum Migration Time, and Robust Logistic Regression Minimum Migration Time algorithms [47].
As part of this problem representation, we can see how our defined state space, action space, and reward function encourage the desired outcome.To maximize the reward resulting from successful task scheduling, the reward could be positive for successfully completing tasks within the QoS constraints and negative for failing to schedule a task, violating QoS constraints, or inefficient resource allocation.We incorporated logic in the environment that only allows for valid actions as part of C1, and implemented checks that verify if a task's scheduling and execution meet its QoS requirements to satisfy C2, in addition to ensuring that actions respect the computational resource constraints in terms of attempts to allocate more resources than available, which should be penalized or invalidated, to meet constraint C3.
We used the DNN here and initialized the Q-Network that takes the state as input and outputs the predicted Q-values for all actions in the action space, which, in turn, assesses the quality of a particular action taken in each state.This is, with the help of the initialized experience replay to store past experiences (state, action, reward, next state), used to train the Q-Network by sampling batches of experiences to reduce correlations in the observation sequence, in addition to the implementation of the Epsilon-Greedy policy for exploration and exploitation.We trained the model as described in Algorithm 2, and after sufficient training, the policy for task scheduling and resource allocation is implicitly defined by the Q-Network.For any given state, the action with the highest predicted Q-value is considered the best action under the learned policy.
As a result, the DQL can learn the optimal strategies by interacting with the environment, adapt to dynamic changes, and learn from the consequences of its actions.This solves the optimization problem P1 and satisfies the formulated constraints.

Results
We implemented and evaluated the proposed system to prove the concept and its validity.We have four main modules in this system, the Federated Cloud Manager (FCM), Blockchain Manager (BCM), Predictor, and Resource Collector (RC).The FCM module is responsible for managing everything related to the federated clouds.In this experiment, we chose clouds from three providers: Azure, Google Cloud Platform (GCP), and Amazon Web Services (AWS).The BCM module manages all transactions related to the Blockchain.Here, we chose Tron Blockchain, an open-source platform that supports decentralized applications (dApps) and enables us to write Blockchain applications in a customizable and flexible way [48].For the test purposes, we used TronPy and the public "Testnet" network in Tron Blockchain; we built and tested all related smart contracts for validation in this public Testnet network.As mentioned earlier, we adopted the DPoS consensus mechanism in this network, as it is more scalable and needs fewer computational resources [49].The Predictor module is where the Deep Reinforcement Learning agent acts.It performs the DQL algorithms to predict the appropriate task scheduling and resource allocation policy.Finally, the RC module is responsible for collecting and unifying configuration data from the three cloud infrastructure providers, Azure, GCP, and AWS.This RC module dynamically collects and maintains the available resources from these cloud providers (comprising the federated cloud).Here in this experiment, there are about 1200 collected resources; we categorized them by provider, machine type, memory (GB), max disk size (TB), SSD availability, number of GPUs, GPU memory, GPU model, CPUs, CPU frequency (GHz), and machine family.There are five types of machine family: memory-optimized, compute-optimized, graphic rendering, general-purpose, and high-performance compute (HPC) machine families.All of these 1200 resources are considered as the candidate set θ C ; this set is managed dynamically by RC, which act as a manager for all API calls.
The following subsections show the details of the experiment, the related settings, and the evaluation of the performance.

Experimental Details and Settings
Table 2 shows the configurations and parameters used in the implementation.α controls the step size during the Q-value updates and how much the neural network weights are updated during training, while γ determines the importance of future rewards, where values close to 1 give more weight to future rewards, whereas values close to 0 focus on immediate rewards.ε determines the probability of choosing a random action for the exploration-exploitation strategy; over time, ε is reduced to favor exploitation using ε-decay.For the Experience Replay, its size is crucial, as it should be large enough to store a representative set of experiences.The Minibatch Size should be chosen properly when sampling from the Experience Replay.It specifies the number of experiences in each Minibatch.The frequency of updating the Target Network is every 10,000 steps to avoid instability, and the mean squared error (MSE) is used as the loss function between the predicted and target Q-values.
These hyperparameters were experimented, fine-tuned, and adapted after gaining insights from training and evaluation to achieve the best performance in terms of balancing between exploration and exploitation, which, in turn, optimizes the performance of learning and the convergence of the Deep Q-Learning algorithm.
We have configured the neural network here with four layers, where the two hidden layers have 512 and 64 neurons, using the Rectified Linear Unit activation function and the Epsilon-Greedy action selection algorithm to choose between exploration and exploitation randomly [50].We used a workload dataset, Azur workload 2019, to train the model [51].It contains the following information: min, max, and avg CPU utilization, VM id, VM category, VM core count, VM memory, count VMs created, deployment size, and Timestamp VM created and deleted.We have manipulated this dataset to merge the 1200 resources we collected using the RC module, as per the categorization mentioned above, along with the assumed reputation values for the experiment purpose and to prove the concept.Therefore, the Predictor learned how to differentiate between the available candidates to select the most appropriate one to be the executor for the job.The job here is assumed to be any of the workloads from the dataset.The experiment was conducted using a system with an Intel Core i7 CPU @ 2.30 GHz processor and 16 GB RAM, and we used several technical tools and frameworks including: Python (3.7), Tronpy (0.4.0), Azure-core (1.29.5),Google-api-core (2.12.0),AWS Boto3 (1.28.61),Gymnasium (0.

Performance Evaluation
Figure 6 shows the performance of the proposed Predictor, where the total reward is noticed to be low at the beginning of learning then becomes higher as the number of decision epochs increases.We can notice that after around 400 decision epochs, it becomes stable.This fluctuation is a result of the changes in the environment, which is the state space, affecting the obtained reward in different decision epochs.
We validated the proposed mechanism for selecting suitable executors θ E from the candidate set θ C .We took a snapshot of θ E from the available θ C at one decision epoch.Figures 7-9 show how the proposed algorithm has selected only five suitable executors from the seven candidates available, based on the workload nature, the executors' stake wS, and the reputation value V. Hence, the executor selection action is a = {1, 0, 1, 1, 1, 0, 1}.This ensures a trustworthy evaluation of the potential candidates before selecting them.
At the same decision epoch, candidate 5 is assigned with 20 tasks (chunks of the job workload), as shown in Figure 8, and this candidate utilizes 2.8 GHz of computing power to completely execute the job, as shown in Figure 9.The same goes for candidate 7, where only 10 tasks were assigned to this candidate, and it utilizes 1.7 GHz of computing power.This proves that while the Predictor aims to achieve the highest total execution reward possible, it also carefully evaluates the candidates and schedules tasks accordingly.
Figure 6 shows the performance of the proposed Predictor, where the total reward noticed to be low at the beginning of learning then becomes higher as the number o decision epochs increases.We can notice that after around 400 decision epochs, it become stable.This fluctuation is a result of the changes in the environment, which is the sta space, affecting the obtained reward in different decision epochs.We validated the proposed mechanism for selecting suitable executors ⍬ from th candidate set ⍬ .We took a snapshot of ⍬ from the available ⍬ at one decision epoch Figures 7-9 show how the proposed algorithm has selected only five suitable executor from the seven candidates available, based on the workload nature, the executors' stak  , and the reputation value  .Hence, the executor selection action is  {1, 0, 1, 1, 1, 0, 1}.This ensures a trustworthy evaluation of the potential candidates befor selecting them.At the same decision epoch, candidate 5 is assigned with 20 tasks (chunks of the jo workload), as shown in Figure 8, and this candidate utilizes 2.8 GHz of computing powe to completely execute the job, as shown in Figure 9.The same goes for candidate 7, wher only 10 tasks were assigned to this candidate, and it utilizes 1.7 GHz of computing powe This proves that while the Predictor aims to achieve the highest total execution rewar possible, it also carefully evaluates the candidates and schedules tasks accordingly.We validated the proposed mechanism for selecting suitable executors ⍬ candidate set ⍬ .We took a snapshot of ⍬ from the available ⍬ at one decisio Figures 7-9 show how the proposed algorithm has selected only five suitable e from the seven candidates available, based on the workload nature, the executo  , and the reputation value  .Hence, the executor selection action {1, 0, 1, 1, 1, 0, 1}.This ensures a trustworthy evaluation of the potential candidat selecting them.At the same decision epoch, candidate 5 is assigned with 20 tasks (chunks o workload), as shown in Figure 8, and this candidate utilizes 2.8 GHz of computin to completely execute the job, as shown in Figure 9.The same goes for candidate only 10 tasks were assigned to this candidate, and it utilizes 1.7 GHz of computin This proves that while the Predictor aims to achieve the highest total execution possible, it also carefully evaluates the candidates and schedules tasks according    We can also interpret from this snapshot that because of the different co capabilities of the different executors, the resource allocation policy is not unifor 3 summarizes the snapshot and shows how executors are selected based assessment of individual candidates' capabilities, showing how dynamic and sele Predictor is and that its policies are not uniform.To test the computational resource allocation capability of the Predictor and t of the QoS requirements, we examined the performance in terms of the total co computational resources.Figure 10 depicts how the Predictor managed to consumed computational resources with varying timeout  constraints.Evident computing resources are usually required when we increase the timeout and flexible QoS.We can also interpret from this snapshot that because of the different computing capabilities of the different executors, the resource allocation policy is not uniform.Table 3 summarizes the snapshot and shows how executors are selected based on the assessment of individual candidates' capabilities, showing how dynamic and selective the Predictor is and that its policies are not uniform.To test the computational resource allocation capability of the Predictor and the effect of the QoS requirements, we examined the performance in terms of the total consumed computational resources.Figure 10 depicts how the Predictor managed to save the consumed computational resources with varying timeout T constraints.Evidently, fewer computing resources are usually required when we increase the timeout and have a flexible QoS.To validate the proposed idea in a real-world federated cloud environment, w assumed the required job is related to "dataset processing", which is resource-intensiv and requires high memory.The dataset used in this test case was the Titanic Dataset [52 as a CSV file contains all passengers' data with the purpose of processing and analyzin To validate the proposed idea in a real-world federated cloud environment, we assumed the required job is related to "dataset processing", which is resource-intensive and requires high memory.The dataset used in this test case was the Titanic Dataset [52], as a CSV file contains all passengers' data with the purpose of processing and analyzing by pulling out different fields and calculating the average of different subfields, so multiple mathematical operations were required on the data.We evaluated the performance here by just completing the job successfully on a federated cloud that included AWS, GCP, and Azure cloud providers, as this proves that the Predictor has chosen suitable executors (here, VMs) for the job by considering their computing capabilities and job type, ensuring a successful job execution to minimize the possibility of increased execution time that might violate the assumed QoS requirement, here T ≤ 600 s, hence avoiding aborting the job before completion.So, here we initialized 15 VMs as candidates θ C , with hypothetical wS and V values for each, 5 VMs from each cloud provider, and all being participants in Tron Blockchain with the help of a tool called Kaleido [53] to securely manage and share files in Blockchain.We measured the total completion time for the job and compared it with a flexible case as a baseline where there is no constraint on the timeout.
This indicates that the proposed DQL-based Predictor has optimized its selection of θ E from the 15 candidates available in θ C and met the required QoS, T ≤ 600 s.As shown in Figure 11, the time taken without considering the QoS is 648 s, and only 553 s using the proposed mechanism that considers the QoS to complete the same job.
To validate the proposed idea in a real-world federated cloud environm assumed the required job is related to "dataset processing", which is resourceand requires high memory.The dataset used in this test case was the Titanic Da as a CSV file contains all passengers' data with the purpose of processing and a by pulling out different fields and calculating the average of different sub multiple mathematical operations were required on the data.We evalu performance here by just completing the job successfully on a federated cl included AWS, GCP, and Azure cloud providers, as this proves that the Pred chosen suitable executors (here, VMs) for the job by considering their co capabilities and job type, ensuring a successful job execution to minimize the p of increased execution time that might violate the assumed QoS requirement, 600 s, hence avoiding aborting the job before completion.So, here we initialized as candidates ⍬ , with hypothetical  and  values for each, 5 VMs from ea provider, and all being participants in Tron Blockchain with the help of a to Kaleido [53] to securely manage and share files in Blockchain.We measured completion time for the job and compared it with a flexible case as a baseline wh is no constraint on the timeout.
This indicates that the proposed DQL-based Predictor has optimized its se ⍬ from the 15 candidates available in ⍬ and met the required QoS,  ≤ shown in Figure 11, the time taken without considering the QoS is 648 s, and o using the proposed mechanism that considers the QoS to complete the same job As the proposed system should theoretically allow for services from third pa independent individual service providers, we can ensure that it is not possi malicious service provider (candidate executor) to join the network by ado Delegated Proof-of-Stake (DPoS) as the consensus mechanism in the Blockchai the executor trust is guaranteed here through the strict verification procedure of t As the proposed system should theoretically allow for services from third parties like independent individual service providers, we can ensure that it is not possible for a malicious service provider (candidate executor) to join the network by adopting the Delegated Proof-of-Stake (DPoS) as the consensus mechanism in the Blockchain.Hence, the executor trust is guaranteed here through the strict verification procedure of the DPoS, which ensures that the elected nodes responsible for signing blocks are doing that in an unbiased way.So, unlike other consensus mechanisms, there is no need to depend on untrusted nodes to verify a transaction before confirming that transaction using the DPoS [54].Also, every signed block will have to verify that it is generated from a trusted node.As per the voting system in DPoS, all voting is carried out in real-time, so any malicious action can be immediately detected.In the implementation of this research, we used TRON as it is a DPoS network, where Super Representatives (SRs) are responsible for validating transactions and record keeping, and in case of inefficient or unavailable SRs, voters can switch over their votes to a better node to guarantee network security.

Discussion
From the above results and evaluation, the proposed DQL-based mechanism in this system has shown a high level of accuracy in predicting the proper task scheduling in the federated cloud where the state space is a decentralized environment; it handled this dynamicity and achieved a high reward for executing jobs while satisfying the QoS requirements.Generally, the performance of hyper-heuristics depends on the nature of the problem and the current state of the solution, meaning that in addition to the solution quality and feedback mechanisms, the variability in problems (domain and complexity) highlight the adaptability and flexibility of hyper-heuristics to cater to a wide array of problems.
When we tested the Predictor without BCM (ignoring wS and V values), we found that sometimes, a set of candidates were selected as executors for specific job workloads, but also at other times, they were not selected for the same workloads under the same conditions and QoS requirements.This is because the candidates' cumulative stakes and reputation values were not evaluated before selection, and this confirms how the weight of stake and candidate reputation affects the selection criteria while the Predictor is selecting the executors.On the other hand, when we assumed a higher weight for wS, ranging from 0.05 to 0.9, the reward increased, as implied by Equation (8), which aims to maximize the reward for executors while satisfying the QoS requirements.
This proof-of-concept shows how Blockchain here enabled our proposed DQL-based system to be considered as a token-based incentive mechanism for executors to offer their various services, hence allowing us to decentralize the processing of jobs on federated clouds in a transparent way that is based on automated smart contracts, avoiding any possibility of a biased selection of candidates that is not based on their reputation or stakes.
It is worth mentioning that transaction cost in Tron is very high.We noticed that every function call to the Blockchain and smart contracts regarding any job transactions consumes too much energy and bandwidth, causing it to run out of energy and bandwidth so fast that it needs to stake more assets (TRX) for resources and Tron power to obtain more energy and bandwidth.In one case, a smart contract was called only three times and consumed the full bandwidth available, 1500B Bandwidth Points (BP), 1500/1024 KB ≈ 1.46 KB, and we had to stake (freeze) at least 3000 TRX to obtain more bandwidth, with a 3-day period for un-staking to redeem the frozen TRX.Even though the way the smart contract is built is very memory optimized, the transaction cost is still very high.However, this issue can be resolved, or at least minimized, by considering different platforms other than Tron that have an acceptable transaction cost.
Other limitations encountered during the research include the significant computational resources required for training and the difficult integration of Blockchain and DQL, which requires careful consideration of interoperability between different components.Moreover, the complexity of task scheduling, as an NP-hard problem, is compounded here in this tokenized system, as the Blockchain operations add another layer of operational complexity.Although it needs a lot of configurations, integrations, and technical tools to get this solution working in a basic form as a proof-of-concept, we can say it is promising, and it opens the possibilities of different advanced use cases, considering that further fine-tuning is required to get a better performance.

Conclusions
The proposed system of employing Deep Q-Learning with Blockchain for tokenized task scheduling aims to combine the power of DQL with the transparent and decentralized nature of Blockchain to enable a tokenized system that can solve, to some extent, the vendor lock-in issue that is related to cloud providers and the potential security considerations related to heterogeneous services from third parties like individuals or other cloud providers that offer their services based on tokens.From the discussed results of this proposed proof-of-concept, we can conclude that it is capable of offering:

•
Efficient task scheduling and resource allocation by leveraging the DQL for more optimization; such algorithms excel at learning from interactions and making sequential decisions based on the requirements of different tasks through adapting to the dynamic conditions, resulting in an optimized performance and utilization; • Tokenized incentives and rewards for independent service providers (executors) who contribute their resources or compute power to task execution to ensure fairness; • Transparent and trustworthy task execution through the utilization of the immutable Blockchain technology, which helps with verifying the execution of tasks, enhancing trust and reducing the reliance on centralized authorities and single cloud providers; • Distributed and secured task execution responsibilities among participants by leveraging Blockchain's decentralized nature.This enhances system resilience, reduces single points of failure, and improves security against malicious activities.
For future work, we will try to optimize the system for more specialized fields, such as digital forensics, Decentralized Finance (DeFi), and healthcare systems.In addition, we will try to find other Blockchain platforms with less expensive transactions that can support the future demand for Blockchain enabled AI systems.

Figure 3 .
Figure 3. Illustration of how parties are connected in the proposed system.

Figure 3 .
Figure 3. Illustration of how parties are connected in the proposed system.

Figure 5 .
Figure 5. Process diagram of the system modules.

Figure 5 .
Figure 5. Process diagram of the system modules.

Figure 6 .
Figure 6.Performance of the proposed Predictor.

Figure 6 .
Figure 6.Performance of the proposed Predictor.

Figure 6 .
Figure 6.Performance of the proposed Predictor.

Figure 10 .
Figure 10.Total consumed computational resources as timeout constraint varies.

Figure 10 .
Figure 10.Total consumed computational resources as timeout constraint varies.

Figure 11 .
Figure 11.Calculation of the total job completion time.

Figure 11 .
Figure 11.Calculation of the total job completion time.
* Computational resource allocation profile W * Task scheduling and resource allocation policy θ C Candidates set; the potential executors not yet selected θ E Executors set; the selected executors from the candidates set Systems 2024, 12, 84 6 of 21

1
Requester initiates Job request to Predictor and submit Job transaction to BCM with amount of tokens 2 for each Job transaction sequence do 3 BCM publicizes Job's parameters , , and  on the Blockchain network 4 RC groups the available candidates from FCM in a candidates set ⍬ 5 Predictor performs executor selection & evaluation and builds W * using Algorithm 2 6 Predictor sends W * to Requester and BCM 7 BCM generates and appends a block to the Blockchain using DPoS mechanism 8 Requester sends Job to the selected Executor and Blockchain 9 Executor executes Job according to W* 10 Executor announces job completion on the Blockchain network 11 Executor sends back the completed job to Requester 12 Execute smart contract and release reward tokens to Executor if QoS is satisfied 13 end Requester sends Job to the selected Executor and Blockchain 9 Executor executes Job according to W* 10 Executor announces job completion on the Blockchain network 11 Executor sends back the completed job to Requester 12 Execute smart contract and release reward tokens to Executor if QoS is satisfied 13 Algorithm 2: Predictor Algorithm 1 Input: Job submitted by Requester 2 Output: ⍬ and W * 3 Initialize: BCM, FCM, Predictor, Experience Replay, Main Q-Network, Target Q-Network 4 Obtain the (, ) value estimates

end Algorithm 2: Predictor Algorithm 1 Input:
Job submitted by Requester 2 Output: ⍬ and W * 3 Initialize: BCM, FCM, Predictor, Experience Replay, Main Q-Network, Target Q-Network 4 Obtain the (, ) value estimates 5 Store (, ) value estimates in the Experience Replay 6 Pretrain the Main Q-Network with (, ) and corresponding Q value (, )) * and  * to  * = { * ,  * } 14 end 15 Calculate the immediate reward  16 The chosen executors execute the job according to  in the action space  17 State transition observation  (next decision epoch) with the new state  18 Store this experience ( ,  ,  ,  ) into the Experience Replay 19 Pick samples randomly ( ,  ,  ,  ) from the Experience Replay 20 From the Target Q-Network, calculate target Q-Value using the equation:  =  +  max ( ,  ) 21 Use the Loss Function (θ) = [ − ( ,  ; θ)] to update Target Q-Network every C iterations 22 end 23 end

Table 3 .
A snapshot of ⍬ from the available ⍬ at one decision epoch.

Table 3 .
A snapshot of θ E from the available θ C at one decision epoch.