Scalable Resource Provisioning Framework for Fog Computing Using LLM-Guided Q-Learning Approach

Bhargavi Krishnamurthy; Sajjan G. Shiva

doi:10.3390/a18040230

and

¹

Department of Computer Science and Engineering, Siddaganga Institute of Technology, Tumakuru 572103, Karnataka, India

²

Department of Computer Science, University of Memphis, Memphis, TN 38152, USA

^*

Author to whom correspondence should be addressed.

Algorithms2025, 18(4), 230;https://doi.org/10.3390/a18040230

This article belongs to the Section Algorithms for Multidisciplinary Applications

Version Notes

Order Reprints

Abstract

Fog computing is one of the growing distributed computing platforms incorporated by Industries today as it performs real-time data analysis closer to the edge of the IoT network. It offers cloud capabilities at the edge of the fog networks through improved efficiency and flexibility. As the demands of Internet of Things (IoT) devices keep varying, it is important to rapidly modify the resource allocation policies to satisfy them. Constant fluctuation of the demands leads to over or under provisioning of resources. The computing capability of the fog nodes is small, and hence there is a necessity to develop resource provisioning policies that reduce the delay and bandwidth consumption. In this paper, a novel large language model (LLM)-guided Q-learning framework is designed and developed. The uncertainty in the fog environment in terms of delay incurred, bandwidth usage, and heterogeneity of fog nodes is represented using the LLM model. The reward shaping of a Q-learning agent is enriched by considering the heuristic value of the LLM model. The experimental results ensure that the proposed framework is good with respect to processing delay, energy consumption, load balancing, and service level agreement violation under a finite and infinite fog computing environment. The results are further validated through the expected value analysis statistical methodology.

Keywords:

fog computing; large language model; resource provisioning; Q learning; scalability

1. Introduction

Fog computing is an extended part of cloud computing that uses edge devices to carry out computation, communication, and storage. The data, applications, and resources of the cloud are moved closer to the end users. Even the workload generated by IoT devices is decentralized to reduce the latency incurred and to increase the efficiency of operation. The fog computing market is increasing as there is a need for developing low-latency solutions and real-time processing of applications. The market size of fog computing at the global level is expected to reach USD 12,206 million by 2033. The growing market size is because of the increasing number of IoT devices which demand the need to process and analyze data closer to the source of the applications [1,2].

The distributed nature of fog computing poses significant challenges with respect to accessibility, resource provisioning, security, privacy, load balancing, and many other factors. Fog computing extends support for millions of IoT requests, and their demands are heterogeneous and stringent in nature. Resource provisioning is one of the prominent challenges in a fog environment as the user exhibits high demands for resources like bandwidth, throughput, response time, and availability. Improper provisioning of resources has a greater impact on performance by increasing the latency and also causes inefficiency for services within the fog network. Hence, there is a need to formulate a proper resource management policy for a better user experience [3,4].

Q learning is a basic form of reinforcement learning that determines the optimal action policy for a finite Markov Decision Process through interaction with the environment in repetition. In a computing environment with a large state space, the learning process is slow and often requires several episodes of training and ends up with suboptimal solutions. With the increase in the state space, the size of the Q-table keeps growing, which leads to high memory usage and results in inefficient exploitation of the state space. The success of the Q-learning model is determined by the reward function, which reflects the ability of the Q-learning agent to reach the goal state with minimum episodes of training. Proper reward shaping enhances the effectiveness of the Q-learning algorithm [5,6].

A typical large language model (LLM) represents a world model for planning and control operation. It enables autonomous state exploration and directly outputs rewards and states. LLMs are lightweight in nature, which makes them suitable for deployment in fog environments. The models are further optimized to reduce the size of the model and speed up the arrival of inference using the Generative Pre-trained Transformer Quantized (GPTQ) method. The use of edge accelerators like NVIDIA JavaScript Object Notation (JSON) and a Tensor Processing Unit (TPU) helps in efficient execution of LLM models at edge devices by preserving the privacy of the data. The LLM-guided Q-learning algorithm uses the LLM’s action probability as a heuristic value to influence the Q function. Here, the LLM model is employed to guide the Q-learning agent for reward shaping using the heuristic value. It modulates the Q-value function by implicitly inducing the desired resource provisioning policy in it. The reshaped Q function prevents the over and under estimation of the policies and converges to an optimal solution [7,8].

In this paper, a novel LLM-guided Q-learning framework is designed to address the resource provisioning problem. The action bonus form of the heuristic, i.e., maximum entropy, is generated by the LLM model, which encourages exploration of the large state space fog environment. The heuristic value generated by the LLM provides navigation guidance to a Q-learning agent at every iteration step, which results in the formulation of the desired policy in the upcoming iterations of training. The expected value analysis under a finite and infinite fog environment is performed towards objective functions to arrive at the desired resource provisioning policies [9,10].

The objectives of this paper are as follows:

Mathematical representation of the fog computing system model and objective functions considered for evaluation purposes.
Design of a novel framework for resource provisioning using an LLM-guided Q-learning model.
Illustration of algorithms for each of the components in the LLM-guided Q-learning model.
Expected value analysis of the proposed framework in a finite and infinite fog environment.
Simulation of the proposed framework using the iFogSim 3.3 simulator by considering the ChatGPT classic model as a large language model and synthetic workload.

The remaining sections of the paper are organized as follows: Section 2 deals with the related work, Section 3 presents the system model, Section 4 discusses the proposed framework, Section 5 presents expected value analysis of the proposed framework under a finite and infinite fog environment, Section 6 discusses the results, and finally Section 7 arrives at the conclusion.

2. Related Work

In this section, some of the potential recent papers on scalable resource provisioning for fog computing are discussed and their limitations are identified.

Najwa et al. discuss a Quality of Service (QoS) aware resource provisioning approach for fog environments. Two crucial challenges are involved in designing the resource provisioning framework for maximizing the throughput and minimizing the latency [11]. Low latency applications prefer the fog paradigm, as delay and congestion in service are overcome by bringing the cloud services to the edge of the network. An application placement strategy based on a greedy approach was designed to minimize the latency of IoT applications. Here, the incoming requests are distributed among the fog nodes using a greedy edge placement strategy. The approach is composed of two stages: the first stage is proper placement of requests across fog nodes to satisfy the high QoS needs of the real-time applications; and the second stage is depth-first search logic, which is used to identify the fog nodes that satisfy the bandwidth requirement of the incoming application requests. Each of the incoming requests is assigned priority during the placement process according to its processing requirements. The requests having higher processing requirements are sent to the nearest IoT devices for servicing. Here, resource provisioning latency requirement of the incoming requests is addressed. The evaluation was performed by considering the Distributed Camera Network (DCN) as a benchmark application. The DCN applications are mounted onto the fog nodes using a variable-sized tree network topology. From the results it was inferred that the end-to-end delay is minimized. However, the chances of getting stuck in an infinite loop is greater as the approach does not keep track of the visited fog nodes during placement of the application requests.

Masoumeh et al. present an autonomic approach to efficiently allocate the resources for IoT requests [12]. The IoT requests experience heavy workload variation with respect to time, which needs to be addressed to prevent over/under provisioning of the resources. A Bayesian learning-inspired automated computing model was designed to make dynamic scaling decisions about whether to increase/decrease the resources for fog nodes. The Bayesian learning model is dependent on conditional probability for deciding the occurrence or non-occurrence of the event. A time-series prediction model was used to perform planning and analysis of the control loop. The autonomic system is composed of four major components: monitor, analyze, plan, and execute (MAPE). The MAPE module was further enriched with a knowledge database (MAPE-k). The MAPE-k computing model was introduced by IBM and executes MAPE modules automatically to calculate the number of fog nodes to be deployed for executing the IoT services. The time-series model is used to predict the future demand for resources. A three-tier fog computing architecture was considered for implementation purposes comprising a IoT device tier, fog tier, and cloud tier. The proposed MAPE-k module sits at the fog tier, which does automatic resource provisioning. The MAPE-k control module is executed in a loop through several iterations at regular intervals of time to allocate the fog nodes as per the dynamic changes in the IoT services workloads. However, the practical applicability of the approach is limited, as the Bayesian learning expects the computing environment to be static. In a static computing environment, sufficient information on the computing environment is known in advance; however, the fog environment is totally unpredictable, complex, heterogeneous, and highly dynamic in nature.

Nguyen et al. present an elastic form of resource provisioning framework for container-based applications in the fog computing paradigm [13]. Kubernetes is one of the most widely used orchestration platforms for container-based applications. The framework is designed to execute over the Kubernetes computing platform to deliver resources in real-time. The ongoing traffic information is gathered in real-time and resources are allocated in proportion to the traffic distribution. Here, the Kubernetes platform is used for the purpose of orchestration, and the IoT applications are kept in the form of a container and deployed inside the fog nodes. Fog nodes situated in the nearby vicinity operate together in the form of a cluster. Two important limitations of traditional Kubernetes are that it does not consider network traffic status for application scheduling and it does not monitor the changing popularity of the applications over time. Here, the real-time traffic information is gathered at frequent time intervals. Affinity-based priority rules are formulated that specify the intended location for the fog nodes. The relocation decisions about the fog nodes are made as per the formulated affinity-based priority rules. To determine the best-fit node for the application, several factors are considered, including the health of the node, node affinity, location of the node, and application spreading statistics. The filtering and scoring steps of Kubernetes are completed first. Then the scheduler determines the bandwidth requirement and distributes them among the fog nodes in the nearest vicinity. The performance is tested by considering the small-scale and medium-scale applications. Compared to the traditional Kubernetes performance, the proposed framework achieves an improvement in terms of enhanced throughput and reduced latency. However, the performance considering large-scale applications with dynamic changes in the network traffic is not achieved. This limits its practical application for providing the resources considering computation-intensive applications.

Amina et al. present a centralized controller approach combined with a distributed computing model for resource provisioning in a fog environment [14]. The fog computing architecture is made up of heterogeneous fog nodes, which represent high mobility, complex distribution, and sporadic availability of the resources. The computing model of the fog environment is modeled as a Markov Decision Process (MDP) to satisfy the end users with minimum latency. At first, a centralized fog controller was designed, which acquires global knowledge about the fog environment. Then, collaborative reinforcement learning agents are deployed, which manage the fog nodes by satisfying the user requirements. Proper resource coordination is carried out to provide continuous Quality of Service (QoS) satisfaction to end users. Flawless resource provisioning is guaranteed by accurately tracking the resources. The number of satisfied users is enhanced within the predefined latency requirement through distributed computation. The centralized fog controller along with the set of Q-learning agents and market actors formulate a multiple agent-based system. The integral number of fog instances is enumerated, which ensures a lower cost from fog users. The Q-learning agents that are deployed are capable of sharing the resource information among each other to arrive at common resource provisioning policies. No agent has complete knowledge of the fog computing model, and hence the decentralized learning approach is depended on to make better decisions. A real-world mobility-based dataset is considered for experiment purposes, and the higher efficiency is achieved using a collaborative learning approach. Dynamic allocation of resources is performed based on the real-time workload demands. It guarantees better resource utilization without over provisioning or under provisioning of the resources. However, if synchronization does not happen between the Q-learning agents, significant communication overhead occurs. The size of the Q-table increases exponentially and prevents the scalability of the system due to state space explosion.

Mohammad et al. present an autonomic resource provisioning approach for IoT applications using fuzzy Q-learning methodology [15]. The optimal states of the fog nodes are represented using the MAPE-k, loop which is composed of five major components; these are monitoring, analysis, planning, and execution, which are enriched with a knowledge database. All incoming user requests enter the admission control unit, which decides whether the requests need to be sent to the cloud based on their deadline requirement. Otherwise, the requests are processed at fog nodes for quicker responses. The controller is responsible for scheduling the requests to the nearest fog nodes that have sufficient resources for execution. A feedback loop is added for the autonomic component, which does real-time gathering of data typically representing the variation in the monitored system. The monitoring component takes charge of gathering the response time, power requirement, and container productivity. The main goal is to meet the service level agreement (SLA) and reduce the response time incurred. The analysis component does the job of data preprocessing and the planning component performs an action to adapt the monitoring system to attain the desired state. The set of control inputs is mapped onto control outputs using fuzzy inference mechanisms. It helps in handling the nonlinearity in the system by approximation of nonlinear problems using knowledge expression similar to human reasoning. The uncertainty involved in the workload is handled effectively using fuzzy rules. The self-learning-enabled fuzzy algorithm is responsible for automatically scaling the resource provisioning policies. The computation overhead is less due to limited static fuzzy rules and the resource classification is enhanced by incorporating the signals from the environment. However, due to formulation of static rules, the fuzzy system becomes poorly tuned and leads to inefficient resource allocation. Even though fuzzy inference systems are capable of handling vagueness, they require extensive reconfiguration of rules to adapt to the rapidly changing state of fog nodes.

Masoumeh et al. present a learning-based approach for resource provisioning in the fog paradigm [16]. The varying workload demands of the IoT applications are managed very well by formulating appropriate scaling decisions about whether to provision the resources. Nonlinear autoregressive neural networks embedded with a Hidden Markov Model (HMM) are used to predict the resource requirement of the applications. An autoregressive model is used to determine the probability-based correlation between the workload demands of the incoming IoT applications. Then, that knowledge is used to determine the next workload for a new/unknown IoT application. It operates with the assumption that the current value is a function of a past value in a time-series model. The proposed framework is composed of a control module, monitor module, prediction module, and decision module. The control module receives the incoming IoT application and if the deadline is less than the threshold set, corresponding information is stored in the shared memory. Otherwise, it is transferred to the cloud for processing. The prediction module is responsible for predicting the future workload demands of the IoT applications. The quantity of fog resources required for deployment is determined by the prediction module. Finally, the decision module takes up the decision of whether to scale up or scale down the resources, which is transferred to actuator as output. The performance is evaluated over the real-time dataset and is found to achieve a significant reduction in the delay and cost. However, the controller module operates by setting a threshold value, which is not suitable for highly varying dynamic situations. The threshold method does not penalize for false positive classifications; as a result, arriving at a stable threshold value becomes highly difficult.

Sabiha et al. [17] discuss deep Q-learning-enabled frameworks, which mainly work on optimizing the Quality of Experience (QoE) and reducing the energy consumption. Mixed-integer linear programming is employed to perform effective resource utilization. It considers both continuous variables and discrete variables to formulate global optimum task scheduling policies. It approximates the Q-value function, which helps in handling the large state space environment. It begins by initializing the Q network, target network, and replay buffer. An epsilon greedy policy is employed to perform actions based on experience replay. The target network is updated using the gradient descent methodology. A proper balance between exploration and exploitation is achieved through sampling past experiences. However, the chance of overestimation of the Q value is higher and random experience replay leads to poor prioritization of important transitions of the agent.

To summarize, most of the existing works exhibit the following drawbacks. Most of the approaches are not suitable for a highly dynamic and heterogeneous fog computing environment as they use a static set of rules. They also suffer from scalability problems, which results in over provisioning or under provisioning of resources for computation-intensive applications. Improper handling of fog nodes’ state space explosion results in increased computational complexity. The vagueness or uncertainty in the request and resource parameters is not handled efficiently, which affects the stability of the system.

3. System Model

The system model inputs the set of requests coming from various IoT devices that are capable of being executed among the fog nodes in the fog tier.

I o T (R E Q) = {{R e q}_{1}, {R e q}_{2}, {R e q}_{3}, {R e q}_{4}, \dots, {R e q}_{n}}

Every request is composed of varying resource requirements in terms of Deadline (Ddl), Computational power (Cp), Memory (Mem), Storage (St), and Bandwidth (Bw).

{R e q}_{i} = {D d l ({R e q}_{i}), C p ({R e q}_{i}), M e m ({R e q}_{i}), S t ({R e q}_{i}), B w ({R e q}_{i})}

The set of fog nodes is distributed across the fog tier.

F N = {{F n}_{1}, {F n}_{2}, {F n}_{3}, {F n}_{4}, \dots, {F n}_{n}}

The following end Objective Functions (OFs) are set for the proposed LLM-guided Q-learning (LLM_QL) framework.

OF1: Processing Delay (PD(LLM_QL)): This is the time required to service the requests sent by the IoT devices over the fog layer.

P D (L L M_Q L) = M i n i m i z e (\sum_{i \in I} \frac{1}{P R ({R e q}_{i})} \times S ({R e q}_{i}))

(1)

where

P R ({R e q}_{i})

stands for requests processing rate across fog nodes (million instructions per second),

S ({R e q}_{i})

represents the size of the requests (million instructions), and i represents the index term to traverse through the number of requests. The objective of the LLM_QL is to minimize the processing delay involved in servicing the requests.

OF2: Energy Consumption (EC(LLM_QL)): This is the amount of energy required for input request pre-processing, computation, and transmission over the fog layer.

E C (L L M_Q L) = M i n i m i z e (\sum_{i \in I} P C ({R e q}_{i}) * \frac{W L ({R e q}_{i})}{C P ({f n}_{i})})

(2)

where

P C ({R e q}_{i})

represents the power required to process the input requests,

W L ({R e q}_{i})

is the total workload allocated, which is measured in terms of Floating Point Operations Per Second (FLOPS),

C P ({f n}_{i})

is the computational capacity of the fog node in Operations Per Second (OPS), and i represents the index term to traverse through the number of requests.

OF3: Load (LD(LLM_QL)): This is the measure of a large quantity of work put on fog nodes for execution.

L D (L L M_Q L) = B a l a n c e (\sum_{i \in I} \frac{M e a n ({N (f n}_{i})) * N ({R e q}_{i})}{N ({f n}_{i}) * C C ({f n}_{i})})

(3)

where

M e a n (N ({f n}_{i}))

represents the mean number of fog nodes that process the requests from IoT devices for their successful execution,

N ({R e q}_{i})

is the number of requests from IoT devices which are up and running,

{N (f n}_{i})

is the total number of fog nodes,

C C ({f n}_{i})

is the computational capacity of the fog node measured in terms of Million Instructions Per Second (MIPS), and i represents the index term to traverse through the number of fog nodes.

OF4: Service Level Agreement Violation (SLAV(LLM_QL)): This is the measure of violation that occurs when the agreed standards of service are not provided by the fog service provider.

S L A V (L L M_Q L) = M i n i m i z e \{\begin{matrix} S L A V = Y e s, i f P T ({R e q}_{i}, ∆ T) > 0 \\ S L A V = N o, i f P T ({R e q}_{i}, ∆ T) \leq 0 \end{matrix}

(4)

The SLAV has happened when the

P T ({R e q}_{i}, ∆ T)

value exceeds zero, and similarly the SLAV has not happened when

P T ({R e q}_{i}, ∆ T)

becomes lesser than or equal to zero.

4. Proposed Work

The high-level architecture of the proposed LLM-guided Q-learning framework for task scheduling is shown in Figure 1. The architecture is divided into three compartments: IoT requests, the fog node resource pool, and the LLM-powered Q-learning agent. The resource requirements of the IoT devices include temporary storage, bandwidth, processing power, and energy. The fog nodes represent an essential component of the architecture, and include virtual computers, switches, gateways, and routers. These nodes are tightly coupled and provide a rich set of computing resources to the IoT devices as per their requirement. Basically, the Q-learning agent undertakes a representative form of learning, which keeps continuously improving the agent’s action according to the response of the operating environment. One of the significant challenges encountered by the Q-learning agent is overestimation of the bias as it tries to approximate the action value by considering the maximum estimated value of the action. The computed Q value is greater than the true value of one or more actions. The maximum operator applied over the action’s value is more likely to be inclined towards the overestimated Q value. The effectiveness of the learning is limited as it does not provide the correct heuristic value for each state action pair. Hence, in this framework the Q-learning is enriched with the heuristics of the LLM model, leading to high inference speed and also overcomes the impact of bias introduced.

Figure 1. The high-level architecture of LLM-guided Q-learning framework for resource provisioning.

The detailed working of the LLM_QL framework is as follows. The state space of the Q-learning agent represents a set of all possible states that a Q-learning agent encounters in the environment. It is an important component of Q-learning agent as it learns to formulate the optimal action depending on the prevailing states. The state space indicates all possible combinations of configurations that the environment is composed of. Each state is used to provide a unique experience for the agent. The state typically represents the condition of the system, i.e., resource usage, task load, latency, and energy consumption. The fog nodes are made to learn best resource allocation policies by consequently interacting with the environment. The Q values are updated using the Bellman equation. The agent performs actions, which include allocation of resources, migration of tasks from fog node to the cloud, and queuing of the task for execution at later stages. A positive reward is provided for actions such as less processing delay, effective load balance, and lesser service level violations. Similarly, a negative reward is provided for actions such as higher processing delay, frequent load imbalance, and higher service level violations. Traditional Q-learning agents suffer from dimensionality problems and they cannot update the state action pairs for all possible combinations. However, the Q-learning agent enriched with the heuristic guidance of LLM helps in easy generalization and can operate efficiently in large and continuous state space environments.

The environment setup for the Q-learning agent involves the state space, action space, reward function, and transition dynamics. A pre-built environment is used for implementation using Application Programming Interfaces (APIs) like OpenAI Gym and unity machine learning agents. The Q-learning process inputs the IoT device requests and processes them using the Bellman equation to determine the optimal fog node for servicing.

Algorithm 1 provides the working of the LLM-guided Q-learning framework in detail. The IoT device requests are mapped onto the best fit fog nodes by considering the action value computed by the LLM-guided Q-learning agent. The heuristic value encourages the thinking ability of the agent in reaching the goal state. While navigating the Q-learning agent, the landmark positions are given highest heuristic values, which makes the traversal easier. It converges rapidly to appropriate promising solutions instead of searching for the perfect solution. By performing limited sampling and putting constraints over the responses, LLMs are made capable enough to perform zero-shot learning in varying computing scenarios of the fog environment. The actor–critic Q-learning algorithm usually finds it difficult to derive the policies through the implicit exploration process. Just by applying sample efficiency, proper reward reshaping is not possible and often ends up with the wrong heuristic value. Hence, the use of LLMs represents a promising approach for reward reshaping through autonomous state space exploration for Q-learning agents. Any possible inaccuracies in the learning steps are handled through fine tuning of the learning steps, which enhances the speed of inference and minimizes the impact of hallucination.

Algorithm 1: Working of LLM-guided Q-learning framework

1: Start
2: Input: IoT device request set:

I o T (R E Q) = {{R e q}_{1}, {R e q}_{2}, {R e q}_{3}, {R e q}_{4}, \dots, {R e q}_{n}}

,
Markov Decision Process MDP, Large Language Model G, Prompt P
3: Output: Resource provisioning policies i.e., R

P P = {{R p p}_{1}, {R p p}_{2}, {R p p}_{3}, {R p p}_{4}, \dots, {R p p}_{n}}

4: Training phase of LLM_QL
5: Initialize the Heuristic Q buffer: MDP(G(P)), actor-critic

(μ_{\emptyset}, Q_{θ 1}, Q_{θ 2}) = N U L L

, and target actor-critic

(μ_{\emptyset^{'}}, Q_{{θ 1}^{'}}, Q_{{θ 2}^{'}}) = N U L L

6: Generate Q buffer:

D_{g} \leftarrow {(S_{i}, a_{i}, Q_{i}) | (S_{i}, a_{i}, Q_{i}) \in M D P (G (P)), i = 1,2, \dots, n}

, where

S_{i}

= state of the agent at time step i,

a_{i} =

action performed at time step i, and

Q_{i} =

Q value for action

a_{i}

in state

S_{i}

.
7: Q Bootstrapping

θ = θ - α \nabla_{θ} L_{B o o t s t r a p}

8:

L_{B o o t s t r a p} (θ) = E_{(S_{i}, a_{i}, Q_{i}) \approx D_{g}} {(Q_{i} - q_{θ}^{Λ} (S_{i}, a_{i}))}^{2}

9: For each IoT device request in request task set

{R e q}_{i} \in I o T (R E Q)

do
10: Perform sampling of the Q state

(S_{i}, a_{i}, r_{i}, S_{i}^{'})

from the fog state space,

r_{i}

= reward received for the action

a_{i}

11: Compute Q buffer

D_{g} = D_{g} \cup (S_{i}, a_{i}, r_{i}, S^{'})

12: Perform sampling over the computed value of the Q buffer

D_{g}

13:

α^{^} = μ (S_{i}^{'}) + \in,

where

\in = c l i p (N (0, σ), - c, c)

14:

y (S_{i}^{'}) = (r + γ m i n (q_{θ} (S_{i}^{'}, a^{^}

))
15: Update the critic

θ_{i} = {a r g m i n}_{θ_{i}} \times L_{m a j o r} (θ_{i})

16: if t mod

D_{g}

== 0 then
17: Update actor

\emptyset = {a r g m i n}_{\emptyset} * L_{a c t o r} (\emptyset)

18: Update the target networks
19:

θ_{i}^{'} = τ θ_{i} + (1 - τ) θ_{i}^{'}

20:

\emptyset = τ \emptyset + (1 - τ) \emptyset

21: End if
22: End for
23: End Training phase of LLM_QL
24: Testing phase of LLM_QL
25: For each testing iteration t of IoT device request in request task set

{R e q}_{i} \in I o T (R E Q)

do
26: Re-compute Q buffer

D_{g}^{'} = D_{g}^{'} \cup (S_{i}, a_{i}, r_{i}, S^{'})

, where

D_{g}^{'}

represent the memory buffer
27: Compute the updated value of the critic

θ_{i}^{″} = {a r g m i n}_{θ_{i}} * L_{m a j o r} (θ_{i})

28: Update the critic

θ_{i}^{″} = {a r g m i n}_{θ_{i}} * L_{m a j o r} (θ_{i})

29: Update the target networks
30:

θ_{i}^{″} = τ θ_{i}^{'} + (1 - τ) θ_{i}^{″}

31:

\emptyset^{'} = τ \emptyset^{'} + (1 - τ) \emptyset^{'}

32: End For
33: End Testing phase of LLM_QL
34: Output

R P P = {{R p p}_{1}, {R p p}_{2}, {R p p}_{3}, {R p p}_{4}, \dots, {R p p}_{n}}

35: Stop

5. Expected Value Analysis

The expected value analysis of the proposed LLM_QL is performed by considering four objective functions, i.e., processing delay, energy consumption, load imbalance, and service level agreement violation. Two types of fog computing scenarios are considered for analysis purposes, i.e., finite fog computing scenarios and infinite fog computing scenarios. The four objective functions (OFs) considered for analysis purpose are Processing Delay (PD(LLM_QL)), Energy Consumption (EC(LLM_QL)), Load (LD(LLM_QL)), and Service Level Agreement Violation (SLAV(LLM_QL)). The performance of the proposed LLM_QL is compared with three of the recent existing works, i.e., Bayesian Learning (BAY_L) [12], Centralized Controller (CC) [14], and Fuzzy Q Learning (F_QL) [15].

5.1. Finite Fog Computing Scenario

The finite fog computing scenario consists of N number of IoT device requests

R E Q = {{R e q}_{1}, {R e q}_{2}, {R e q}_{3}, \dots, {R e q}_{n}}

, K fog computing nodes

F N = {{f n}_{1}, {f n}_{2}, {f n}_{3}, \dots, {f n}_{n}}

, and P resource provisioning policies

π = < π_{1}, π_{2}, π_{3}, \dots, π_{p} >

. The probability of the outcome of the objective functions varies between low, medium, or high

P (O F) = < L o w, M e d i u m, H i g h >

.

OF1: Processing Delay (PD(LLM_QL)): The expected value of the processing delay of the proposed LLM_QL, i.e., EV(PD(LLM_QL)) is influenced by the expected value of the processing rate of the requests

E V (P R ({R e q}_{i}))

and the expected value of the size of the requests

E V (S ({R e q}_{i}))

.

E V (\frac{P D (L L M_Q L)}{π}, ∆ T) = \int_{x}^{y} \frac{\sum_{a ϵ π} P D (L L M_Q L) (a)}{| π |}

E V (\frac{P D (L L M_Q L)}{π}, ∆ T) = \sum_{d \in D} d \int_{x}^{y} \frac{\sum_{a ϵ π} P D (L L M_Q L) (a)}{| π |}

E V (\frac{P D (L L M_Q L)}{π}, ∆ T) = \frac{E V (1 \times π \times \sum_{i = 1}^{i = I} \frac{1}{P R ({R e q}_{i})} \times S ({R e q}_{i}))}{π} = \sum_{d \in D} d \int_{q}^{Q} [\frac{1}{P R ({R e q}_{i})}] \times S ({R e q}_{i})] = \int_{q}^{Q} Q \times d P \frac{[\frac{1}{P R ({R e q}_{i})}] \times S ({R e q}_{i})]}{π}

PD(LLM_QL):

E V (\frac{P D (L L M_Q L)}{Π}, T) ≅ l o w

E V (\frac{P D (B A Y_L)}{π}, ∆ T) = \int_{x}^{y} \frac{\sum_{a ϵ π} P D (B A Y_L) (a)}{| π |}

E V (\frac{P D (B A Y_L)}{π}, ∆ T) = \sum_{d \in D} d \int_{x}^{y} \frac{\sum_{a ϵ π} P D (B A Y_L) (a)}{| π |}

E V (\frac{P D (L L M_Q L)}{π}, ∆ T) = \frac{E V (1 \times π \times \sum_{i = 1}^{i = I} \frac{P (d a t a| θ) \times P (θ)}{P (d a t a)})}{π}

PD(BAY_L) =

E V (\frac{P D (B A Y_L)}{Π}, T) ≅ H i g h

E V (\frac{P D (C C)}{π}, ∆ T) = \int_{x}^{y} \frac{\sum_{a ϵ π} P D (C C) (a)}{| π |}

E V (\frac{P D (C C)}{π}, ∆ T) = \sum_{d \in D} d \int_{x}^{y} \frac{\sum_{a ϵ π} P D (C C) (a)}{| π |}

E V (\frac{P D (C C)}{π}, ∆ T) = \frac{E V (1 \times π \times \sum_{i = 1}^{i = I} f (x (t), r (t), P))}{π}

where f(x(t),r(t),P) = System state, Desired setpoint, and system parameters.

PD(CC) =

E V (\frac{P D (C C)}{Π}, T) ≅ M e d i u m

E V (\frac{P D (F_Q L)}{π}, ∆ T) = \int_{x}^{y} \frac{\sum_{a ϵ π} P D (F_Q L) (a)}{| π |}

E V (\frac{P D (F_Q L)}{π}, ∆ T) = \sum_{d \in D} d \int_{x}^{y} \frac{\sum_{a ϵ π} P D (F_Q L) (a)}{| π |}

E V (\frac{P D (F_Q L)}{π}, ∆ T) = \frac{E V (1 * π * Q (s, a) + \propto (R + γ \max Q^{'} (s, a) - Q (s, a))}{π}

PD(F_QL) =

E V (\frac{P D (F_Q L)}{Π}, T) ≅ H i g h

OF2: Energy Consumption (EC(LLM_QL)): The expected value of the energy consumption of the proposed LLM_QL, i.e., EV(EC(LLM_QL)) is influenced by the expected value of the power required to process the input requests

E V (P C ({R e q}_{i}))

, workload allocated

E V (W L ({R e q}_{i}))

, and computational capacity of the fog node

E V (C P ({f n}_{i}))

.

E V (\frac{E C (L L M_Q L)}{π}, ∆ T) = \int_{x}^{y} \frac{\sum_{a ϵ π} E C (L L M_Q L) (a)}{| π |}

E V (\frac{E C (L L M_Q L)}{π}, ∆ T) = \sum_{d \in D} d \int_{x}^{y} \frac{\sum_{a ϵ π} E C (L L M_Q L) (a)}{| π |}

E V (\frac{E C (L L M_Q L)}{π}, ∆ T) = \frac{E (1 \times π \times \sum_{i = 1}^{i = I} P C ({R e q}_{i}) \times \frac{W L ({R e q}_{i})}{C P (({f n}_{i})})}{P (π)} = \sum_{d \in D} d \int_{q}^{Q} [P C ({R e q}_{i}) \times \frac{W L ({R e q}_{i})}{C P (({f n}_{i})}] = \int_{q}^{Q} Q \times d P \frac{[P C ({R e q}_{i})] \times \frac{W L ({R e q}_{i})}{C P (({f n}_{i})}]}{π}

EC(LLM_QL):

E V (\frac{E C (L L M_Q L)}{Π}, T) ≅ L o w

E V (\frac{E C (B A Y_L)}{π}, ∆ T) = \int_{x}^{y} \frac{\sum_{a ϵ π} E C (L L M_Q L) (a)}{| π |}

E (\frac{E C (B A y_L)}{π}, ∆ T) = \sum_{d \in D} d \int_{x}^{y} \frac{\sum_{a ϵ π} E C (L L M_Q L) (a)}{| π |}

E (\frac{E C (B A Y_L)}{π}, ∆ T) = \frac{E (1 \times π \times \sum_{i = 1}^{i = I} P C ({R e q}_{i}) \times \frac{P (d a t a| θ) \times P (θ)}{P (d a t a)})}{P (π)}

EC(BAY_L) =

E V (\frac{E C (B A Y_L)}{Π}, T) ≅ M e d i u m

E V (\frac{E C (C C)}{π}, ∆ T) = \int_{x}^{y} \frac{\sum_{a ϵ π} E C (L L M_Q L) (a)}{| π |}

E (\frac{E C (C C)}{π}, ∆ T) = \sum_{d \in D} d \int_{x}^{y} \frac{\sum_{a ϵ π} E C (L L M_Q L) (a)}{| π |}

E (\frac{E C (C C)}{π}, ∆ T) = \frac{E (1 \times π \sum_{i = 1}^{i = I} P C ({R e q}_{i}) \times f (x (t), r (t), P))}{P (π)}

EC(CC) =

E V (\frac{E C (C C)}{Π}, T) ≅ H i g h

E V (\frac{E C (F_Q L)}{π}, ∆ T) = \int_{x}^{y} \frac{\sum_{a ϵ π} E C (F_Q L) (a)}{| π |}

E V (\frac{E C (F_Q L)}{π}, ∆ T) = \sum_{d \in D} d \int_{x}^{y} \frac{\sum_{a ϵ π} E C (F_Q L) (a)}{| π |}

E V (\frac{E C (F_Q L)}{π}, ∆ T) = \frac{E V (1 \times π \times \sum_{i = 1}^{i = I} P C ({R e q}_{i}) \times Q (s, a) + \propto (R + γ \max Q^{'} (s, a) - Q (s, a)))}{P (π)}

EC(F_QL) =

E V (\frac{E C (F_Q L)}{Π}, T) ≅ M e d i u m

OF3: Load (LD(LLM_QL)): The expected value of the load on the proposed LLM_QL, i.e., EV(LD(LLM_QL)) is influenced by the expected value of the request processing time

E V (P T ({R e q}_{i}

)) and number of requests generated

E V (N ({R e q}_{i}, ∆ T)) .

E V (\frac{L D (L L M_Q L)}{π}, ∆ T) = \int_{x}^{y} \frac{\sum_{a ϵ π} L D (L L M_Q L) (a)}{| π |}

E V (\frac{L D (L L M_Q L)}{π}, ∆ T) = \sum_{d \in D} d \int_{x}^{y} \frac{\sum_{a ϵ π} L D (L L M_Q L) (a)}{| π |}

E V (\frac{L D (L L M_Q L)}{π}, ∆ T) = \frac{E V (1 \times π \times \sum_{i = 1}^{i = I} \frac{M e a n ({N (f n}_{i})) \times N ({R e q}_{i})}{N ({f n}_{i}) \times C C ({f n}_{i})})}{P (π)} = \sum_{d \in D} d \int_{q}^{Q} \frac{M e a n ({N (f n}_{i})) \times N ({R e q}_{i})}{N ({f n}_{i}) \times C C ({f n}_{i})} = \int_{q}^{Q} Q \times d P \frac{\frac{M e a n ({N (f n}_{i})) \times N ({R e q}_{i})}{N ({f n}_{i}) \times C C ({f n}_{i})}}{π}

LD(LLM_QL):

E V (\frac{L D (L L M_Q L)}{Π}, T) ≅ B a l a n c e d

E V (\frac{L D (B A Y_L)}{π}, ∆ T) = \int_{x}^{y} \frac{\sum_{a ϵ π} L D (B A Y_L) (a)}{| π |}

E V (\frac{L D (B A Y_L)}{π}, ∆ T) = \sum_{d \in D} d \int_{x}^{y} \frac{\sum_{a ϵ π} L D (B A Y_L) (a)}{| π |}

E V (\frac{L D (B A Y_L)}{π}, ∆ T) = \frac{E V (1 \times π \times \sum_{i = 1}^{i = I} \frac{M e a n_L o a d}{P e a k_L o a d})}{P (π)}

LD(BAY_L) =

E V (\frac{L D (B A Y_L)}{Π}, T) ≅ P a r t i a l l y B a l a n c e d

E V (\frac{L D (C C)}{π}, ∆ T) = \int_{x}^{y} \frac{\sum_{a ϵ π} L D (C C) (a)}{| π |}

E V (\frac{L D (C C)}{π}, ∆ T) = \sum_{d \in D} d \int_{x}^{y} \frac{\sum_{a ϵ π} L D (C C) (a)}{| π |}

E V (\frac{L D (C C)}{π}, ∆ T) = \frac{E V (1 \times π \times \sum_{i = 1}^{i = I} \frac{C P ({f n}_{i})}{L o a d ({f n}_{i})} \times f (x (t), r (t), P))}{P (π)}

LD(CC) =

E V (\frac{L D (C C)}{Π}, T) ≅ I m b a l a n c e d

E V (\frac{L D (F_Q L)}{π}, ∆ T) = \int_{x}^{y} \frac{\sum_{a ϵ π} L D (F_Q L) (a)}{| π |}

E V (\frac{L D (F_Q L)}{π}, ∆ T) = \sum_{d \in D} d \int_{x}^{y} \frac{\sum_{a ϵ π} L D (F_Q L) (a)}{| π |}

\begin{array}{l} E V (\frac{L D (F_Q L)}{π}, ∆ T) \\ = \frac{E V (1 \times π \times \sum_{i = 1}^{i = I} \frac{C P ({R e q}_{i})}{L o a d ({R e q}_{i})} \times Q (s, a) + \propto (R + γ \max Q^{'} (s, a) - Q (s, a)))}{P (π)} \end{array}

LD(F_QL) =

E V (\frac{L D (F_Q L)}{Π}, T) ≅ I m b a l a n c e d

OF4: Service Level Agreement Violation (SLAV(LLM_QL)): The expected value of the Service Level Agreement Violation of the proposed LLM_QL, i.e., EV(SLAV(LLM_QL)) is influenced by the expected value of the request processing time

E V (P T ({R e q}_{i}

)).

E V (\frac{S L A V (L L M_Q L)}{π}, ∆ T) = \int_{x}^{y} \frac{\sum_{a ϵ π} S L A V (L L M_Q L) (a)}{| π |}

E V (\frac{S L A V (L L M_Q L)}{π}, ∆ T) = \sum_{d \in D} d \int_{x}^{y} \frac{\sum_{a ϵ π} S L A V (L L M_Q L) (a)}{| π |}

E (\frac{S L A V (L L M_Q L)}{π}, ∆ T) = \frac{E (1 \times π \times \sum_{i = 1}^{i = n} P T ({R e q}_{i}, ∆ T))}{P (π)} = \sum_{d \in D} d \int_{q}^{Q} M i n i m i z e \{\begin{matrix} S L A V = Y e s, i f P T ({R e q}_{i}, ∆ T) > 0 \\ S L A V = N o, i f P T ({R e q}_{i}, ∆ T) \leq 0 \end{matrix} = \int_{q}^{Q} Q \times d P \frac{[S L A V = N o, \forall P T ({R e q}_{i}, ∆ T) \leq 0]]}{π}

SLAV(LLM_QL):

E V (\frac{S L A V (L L M_Q L)}{Π}, T) ≅ L o w

E V (\frac{S L A V (B A Y_L)}{π}, ∆ T) = \int_{x}^{y} \frac{\sum_{a ϵ π} S L A V (B A Y_L) (a)}{| π |}

E V (\frac{S L A V (B A Y_L)}{π}, ∆ T) = \sum_{d \in D} d \int_{x}^{y} \frac{\sum_{a ϵ π} S L A V (B A Y_L) (a)}{| π |}

E (\frac{S L A V (B A Y_L)}{π}, ∆ T) = \sum_{d \in D} d \int_{q}^{Q} S L A V = Y e s, \forall P T ({R e q}_{i}, ∆ T) \geq 0

SLAV(BAY_L) =

E V (\frac{S L A V (B A Y_L)}{Π}, T) ≅ M e d i u m

E V (\frac{S L A V (C C)}{π}, ∆ T) = \int_{x}^{y} \frac{\sum_{a ϵ π} S L A V (C C) (a)}{| π |}

E V (\frac{S L A V (C C)}{π}, ∆ T) = \sum_{d \in D} d \int_{x}^{y} \frac{\sum_{a ϵ π} S L A V (C C) (a)}{| π |}

E (\frac{S L A V (C C)}{π}, ∆ T) = \sum_{d \in D} d \int_{q}^{Q} S L A V = Y e s, \forall P T ({R e q}_{i}, ∆ T) \geq 0

SLAV(CC) =

E V (\frac{S L A V (C C)}{Π}, T) ≅ H i g h

E V (\frac{S L A V (F_Q L)}{π}, ∆ T) = \int_{x}^{y} \frac{\sum_{a ϵ π} S L A V (F_Q L) (a)}{| π |}

E V (\frac{S L A V (F_Q L)}{π}, ∆ T) = \sum_{d \in D} d \int_{x}^{y} \frac{\sum_{a ϵ π} S L A V (F_Q L) (a)}{| π |}

E (\frac{S L A V (F_Q L)}{π}, ∆ T) = \sum_{d \in D} d \int_{q}^{Q} S L A V = Y e s, \forall P T ({R e q}_{i}, ∆ T) \geq 0

SLAV(F_QL) =

E V (\frac{S L A V (F_Q L)}{Π}, T) ≅ H i g h

5.2. Infinite Fog Computing Scenario

The infinite fog computing scenario consists of an infinite number of IoT device requests

R E Q = {{R e q}_{1}, {R e q}_{2}, {R e q}_{3}, \dots, {R e q}_{\infty}}

, K fog computing nodes

F N = {{f n}_{1}, {f n}_{2}, {f n}_{3}, \dots, {f n}_{\infty}}

, and

\infty

resource provisioning policies

π = < π_{1}, π_{2}, π_{3}, \dots, π_{\infty} >

.

OF1: Processing Delay (PD(LLM_QL)): The EV(PD(LLM_QL)) during the infinite scenario remained less compared to EV(PD(LLM_QL)) during the finite scenario.

E V (\frac{P D (L L M_Q L)}{π}, ∆ T) = \int_{- \infty}^{0} \frac{\sum_{a ϵ π} P D (L L M_Q L) (a)}{| π |} + \int_{0}^{+ \infty} \frac{\sum_{a ϵ π} P D (L L M_Q L) (a)}{| π |}

E V (\frac{P D (L L M_Q L)}{π}, ∆ T) = \sum_{d \in D} d \int_{- \infty}^{0} \frac{\sum_{a ϵ π} P D (L L M_Q L) (a)}{| π |} + \int_{0}^{\infty} \frac{\sum_{a ϵ π} P D (L L M_Q L) (a)}{| π |}

E V (\frac{P D (L L M_Q L)}{π}, ∆ T) = \int_{- \infty}^{+ \infty} \frac{E V (1 \times π \times \sum_{i = 1}^{i = I} \frac{1}{P R ({R e q}_{i})} \times S ({R e q}_{i}))}{π} = \sum_{d \in D} d \int_{- \infty}^{+ \infty} [\frac{1}{P R ({R e q}_{i})}] \times S ({R e q}_{i})] = \int_{- \infty}^{+ \infty} Q \times d P \frac{[\frac{1}{P R ({R e q}_{i})}] \times S ({R e q}_{i})]}{π}

PD(LLM_QL):

E V (\frac{P D (L L M_Q L)}{Π}, T) ≅ l o w

E V (\frac{P D (B A Y_L)}{π}, ∆ T) = \int_{- \infty}^{0} \frac{\sum_{a ϵ π} P D (B A Y_L) (a)}{| π |} + \int_{0}^{+ \infty} \frac{\sum_{a ϵ π} P D (B A Y_L) (a)}{| π |}

E V (\frac{P D (B A Y_L)}{π}, ∆ T) = \sum_{d \in D} d \int_{- \infty}^{0} \frac{\sum_{a ϵ π} P D (B A Y_L) (a)}{| π |} + \int_{0}^{\infty} \frac{\sum_{a ϵ π} P D (B A Y_L) (a)}{| π |}

E V (\frac{P D (B A Y_L)}{π}, ∆ T) = \int_{- \infty}^{+ \infty} \frac{E V (1 \times π \times \sum_{i = 1}^{i = I} \frac{P (d a t a| θ) * P (θ)}{P (d a t a)})}{π}

PD(BAY_L) =

E V (\frac{P D (B A Y_L)}{Π}, T) ≅ H i g h

E V (\frac{P D (C C)}{π}, ∆ T) = \int_{- \infty}^{0} \frac{\sum_{a ϵ π} P D (C C) (a)}{| π |} + \int_{0}^{+ \infty} \frac{\sum_{a ϵ π} P D (C C) (a)}{| π |}

E V (\frac{P D (C C)}{π}, ∆ T) = \sum_{d \in D} d \int_{- \infty}^{0} \frac{\sum_{a ϵ π} P D (C C) (a)}{| π |} + \int_{0}^{\infty} \frac{\sum_{a ϵ π} P D (C C) (a)}{| π |}

E V (\frac{P D (L L M_{Q L})}{π}, ∆ T) = \int_{- \infty}^{+ \infty} \frac{E V (1 \times π \times \sum_{i = 1}^{i = I} f (x (t), r (t), P))}{π}

PD(CC) =

E V (\frac{P D (C C)}{Π}, T) ≅ H i g h

E V (\frac{P D (F_Q L)}{π}, ∆ T) = \int_{- \infty}^{0} \frac{\sum_{a ϵ π} P D (F_Q L) (a)}{| π |} + \int_{0}^{+ \infty} \frac{\sum_{a ϵ π} P D (F_Q L) (a)}{| π |}

E V (\frac{P D (F_Q L)}{π}, ∆ T) = \sum_{d \in D} d \int_{- \infty}^{0} \frac{\sum_{a ϵ π} P D (F_Q L) (a)}{| π |} + \int_{0}^{\infty} \frac{\sum_{a ϵ π} P D (F_Q L) (a)}{| π |}

E V (\frac{P D (F_Q L)}{π}, ∆ T) = \int_{- \infty}^{+ \infty} \frac{E V (1 \times π \times \sum_{i = 1}^{i = I} Q (s, a) + \propto (R + γ \max Q^{'} (s, a) - Q (s, a))}{π}

PD(F_QL) =

E V (\frac{P D (F_Q L)}{Π}, T) ≅ H i g h

OF2: Energy Consumption (EC(LLM_QL)): The EV(EC(LLM_QL)) during the infinite scenario is found to be less compared to EV(EC(LLM_QL)) during the finite scenario.

E V (\frac{E C (L L M_Q L)}{π}, ∆ T) = \int_{- \infty}^{0} \frac{\sum_{a ϵ π} E C (L L M_Q L) (a)}{| π |} + \int_{0}^{+ \infty} \frac{\sum_{a ϵ π} E C (L L M_Q L) (a)}{| π |}

E (\frac{E C (L L M_Q L)}{π}, ∆ T) = \sum_{d \in D} d \int_{- \infty}^{0} \frac{\sum_{a ϵ π} E C (L L M_Q L) (a)}{| π |} + \int_{0}^{+ \infty} \frac{\sum_{a ϵ π} E C (L L M_Q L) (a)}{| π |}

E (\frac{E C (L L M_Q L)}{π}, ∆ T) = \int_{- \infty}^{+ \infty} \frac{E (1 \times π \times \sum_{i = 1}^{i = I} P C ({R e q}_{i}) \times \frac{W L ({R e q}_{i})}{C P ({f n}_{i})})}{P (π)} = \sum_{d \in D} d \int_{- \infty}^{+ \infty} [P C ({R e q}_{i}) \times \frac{W L ({R e q}_{i})}{C P ({f n}_{i})}] = \int_{- \infty}^{+ \infty} Q \times d P \frac{[P C ({R e q}_{i})] * \frac{W L ({R e q}_{i})}{C P ({f n}_{i})}]}{π}

EC(LLM_QL):

E V (\frac{E C (L L M_Q L)}{Π}, T) ≅ L o w

E V (\frac{E C (B A Y_L)}{π}, ∆ T) = \int_{- \infty}^{0} \frac{\sum_{a ϵ π} E C (B A Y_L) (a)}{| π |} + \int_{0}^{+ \infty} \frac{\sum_{a ϵ π} E C (B A Y_L) (a)}{| π |}

E (\frac{E C (B A Y_L)}{π}, ∆ T) = \sum_{d \in D} d \int_{- \infty}^{0} \frac{\sum_{a ϵ π} E C (B A Y_L) (a)}{| π |} + \int_{0}^{+ \infty} \frac{\sum_{a ϵ π} E C (B A Y_L) (a)}{| π |}

E (\frac{E C (B A Y_L)}{π}, ∆ T) = \int_{- \infty}^{+ \infty} \frac{E (1 \times π \times \frac{P (d a t a| θ) \times P (θ)}{P (d a t a)})}{P (π)}

EC(BAY_L) =

E V (\frac{E C (E X W 1)}{Π}, T) ≅ M e d i u m

E V (\frac{E C (C C)}{π}, ∆ T) = \int_{- \infty}^{0} \frac{\sum_{a ϵ π} E C (C C) (a)}{| π |} + \int_{0}^{+ \infty} \frac{\sum_{a ϵ π} E C (C C) (a)}{| π |}

E (\frac{E C (C C)}{π}, ∆ T) = \sum_{d \in D} d \int_{- \infty}^{0} \frac{\sum_{a ϵ π} E C (C C) (a)}{| π |} + \int_{0}^{+ \infty} \frac{\sum_{a ϵ π} E C (C C) (a)}{| π |}

E (\frac{E C (C C)}{π}, ∆ T) = \int_{- \infty}^{+ \infty} \frac{E (1 \times π \times \sum_{i = 1}^{i = I} P C ({R e q}_{i}) \times f (x (t), r (t), P))}{P (π)}

EC(CC) =

E V (\frac{E C (E X W 2)}{Π}, T) ≅ H i g h

E V (\frac{E C (F_Q L)}{π}, ∆ T) = \int_{- \infty}^{0} \frac{\sum_{a ϵ π} E C (F_Q L) (a)}{| π |} + \int_{0}^{+ \infty} \frac{\sum_{a ϵ π} E C (F_Q L) (a)}{| π |}

E (\frac{E C (F_Q L)}{π}, ∆ T) = \sum_{d \in D} d \int_{- \infty}^{0} \frac{\sum_{a ϵ π} E C (F_Q L) (a)}{| π |} + \int_{0}^{+ \infty} \frac{\sum_{a ϵ π} E C (F_Q L) (a)}{| π |}

\begin{array}{l} E (\frac{E C (F_Q L)}{π}, ∆ T) \\ = \int_{- \infty}^{+ \infty} \frac{E (1 \times π \times \sum_{i = 1}^{i = I} P C ({R e q}_{i}) \times Q (s, a) + \propto (R + γ \max Q^{'} (s, a) - Q (s, a)))}{P (π)} \end{array}

EC(F_QL) =

E V (\frac{E C (E X W 3)}{Π}, T) ≅ M e d i u m

OF3: Load (LD(LLM_QL)): The EV(LD(LLM_QL)) during the infinite scenario is very much balanced compared to EV(LD(LLM_QL)) during the finite scenario.

E V (\frac{L D (L L M_Q L)}{π}, ∆ T) = \int_{- \infty}^{0} \frac{\sum_{a ϵ π} L D (L L M_Q L) (a)}{| π |} + \int_{0}^{+ \infty} \frac{\sum_{a ϵ π} L D (L L M_Q L) (a)}{| π |}

E V (\frac{L D (L L M_Q L)}{π}, ∆ T) = \sum_{d \in D} d \int_{- \infty}^{+ \infty} \frac{\sum_{a ϵ π} L D (L L M_Q L) (a)}{| π |}

E V (\frac{L D (L L M_Q L)}{π}, ∆ T) = \int_{- \infty}^{+ \infty} \frac{E V (1 \times π \times \sum_{i = 1}^{i = I} \frac{M e a n ({N (f n}_{i})) \times N ({R e q}_{i})}{N ({f n}_{i}) \times C C ({f n}_{i})})}{P (π)} = \sum_{d \in D} d \int_{- \infty}^{+ \infty} \frac{M e a n ({N (f n}_{i})) \times N ({R e q}_{i})}{N ({f n}_{i}) \times C C ({f n}_{i})} = \int_{- \infty}^{+ \infty} Q \times d P \frac{\frac{M e a n ({N (f n}_{i})) \times N ({R e q}_{i})}{N ({f n}_{i}) \times C C ({f n}_{i})}}{π}

LD(LLM_QL):

E V (\frac{L D (L L M_Q L)}{Π}, T) ≅ B a l a n c e d

E V (\frac{L D (B A Y_L)}{π}, ∆ T) = \int_{- \infty}^{0} \frac{\sum_{a ϵ π} L D (B A Y_L) (a)}{| π |} + \int_{0}^{+ \infty} \frac{\sum_{a ϵ π} L D (B A Y_L) (a)}{| π |}

E V (\frac{L D (B A Y_L)}{π}, ∆ T) = \sum_{d \in D} d \int_{- \infty}^{+ \infty} \frac{\sum_{a ϵ π} L D (B A Y_L) (a)}{| π |}

E V (\frac{L D (B A Y_L)}{π}, ∆ T) = \int_{- \infty}^{+ \infty} \frac{E V (1 \times π \times \sum_{i = 1}^{i = I} \frac{C P ({f n}_{i})}{L o a d ({f n}_{i})} \times f (x (t), r (t), P))}{P (π)}

LD(BAY_L) =

E V (\frac{L D (B A Y_L)}{Π}, T) ≅ P a r t i a l l y B a l a n c e d

E V (\frac{L D (C C)}{π}, ∆ T) = \int_{- \infty}^{0} \frac{\sum_{a ϵ π} L D (C C) (a)}{| π |} + \int_{0}^{+ \infty} \frac{\sum_{a ϵ π} L D (C C) (a)}{| π |}

E V (\frac{L D (C C)}{π}, ∆ T) = \sum_{d \in D} d \int_{- \infty}^{+ \infty} \frac{\sum_{a ϵ π} L D (C C) (a)}{| π |}

E V (\frac{L D (C C)}{π}, ∆ T) = \int_{- \infty}^{+ \infty} \frac{E V (1 \times π \times \sum_{i = 1}^{i = I} \frac{C P ({f n}_{i})}{L o a d ({f n}_{i})} \times f (x (t), r (t), P))}{P (π)}

LD(CC) =

E V (\frac{L D (C C)}{Π}, T) ≅ I m b a l a n c e d

E V (\frac{L D (F_Q L)}{π}, ∆ T) = \int_{- \infty}^{0} \frac{\sum_{a ϵ π} L D (F_Q L) (a)}{| π |} + \int_{0}^{+ \infty} \frac{\sum_{a ϵ π} L D (F_Q L) (a)}{| π |}

E V (\frac{L D (F_Q L)}{π}, ∆ T) = \sum_{d \in D} d \int_{- \infty}^{+ \infty} \frac{\sum_{a ϵ π} L D (F_Q L) (a)}{| π |}

E V (\frac{L D (F_{Q L})}{π}, ∆ T) = \int_{- \infty}^{+ \infty} \frac{E V (1 \times π \times \sum_{i = 1}^{i = I} \frac{C P ({R e q}_{i})}{L o a d ({R e q}_{i})} \times Q (s, a) + \propto (R + γ \max Q^{'} (s, a) - Q (s, a))}{P (π)}

LD(F_QL) =

E V (\frac{L D (F_Q L)}{Π}, T) ≅ I m b a l a n c e d

OF4: Service Level Agreement Violation (SLAV(LLM_QL)): The EV(SLAV(LLM_QL)) during the infinite scenario is found to be consistently lower compared to EV(SLAV(LLM_QL)) during the finite scenario.

E V (\frac{S L A V (L L M_Q L)}{π}, ∆ T) = \int_{- \infty}^{0} \frac{\sum_{a ϵ π} S L A V (L L M_Q L) (a)}{| π |} + \int_{0}^{+ \infty} \frac{\sum_{a ϵ π} S L A V (L L M_Q L) (a)}{| π |}

E V (\frac{S L A V (L L M_Q L)}{π}, ∆ T) = \sum_{d \in D} d \int_{- \infty}^{+ \infty} \frac{\sum_{a ϵ π} S L A V (L L M_Q L) (a)}{| π |}

E (\frac{S L A V (L L M_Q L)}{π}, ∆ T) = \int_{- \infty}^{+ \infty} \frac{E (1 \times π \times \sum_{i = 1}^{i = n} P T ({R e q}_{i}, ∆ T))}{P (π)} = \sum_{d \in D} d \int_{- \infty}^{+ \infty} M i n i m i z e \{\begin{matrix} S L A V = Y e s, i f P T ({R e q}_{i}, ∆ T) > 0 \\ S L A V = N o, i f P T ({R e q}_{i}, ∆ T) \leq 0 \end{matrix} = \int_{- \infty}^{+ \infty} Q \times d P \frac{[S L A V = Y e s, \forall P T ({R e q}_{i}, ∆ T) > 0] \times [S L A V = N o, \forall P T ({R e q}_{i}, ∆ T) \leq 0]]}{π}

SLAV(LLM_QL):

E V (\frac{S L A V (L L M_Q L)}{Π}, T) ≅ L o w

E V (\frac{S L A V (B A Y_L)}{π}, ∆ T) = \int_{- \infty}^{0} \frac{\sum_{a ϵ π} S L A V (B A Y_L) (a)}{| π |} + \int_{0}^{+ \infty} \frac{\sum_{a ϵ π} S L A V (B A Y_L) (a)}{| π |}

E V (\frac{S L A V (B A Y_L)}{π}, ∆ T) = \sum_{d \in D} d \int_{- \infty}^{+ \infty} \frac{\sum_{a ϵ π} S L A V (B A Y_L) (a)}{| π |}

E (\frac{S L A V (B A Y_L)}{π}, ∆ T) = \int_{- \infty}^{+ \infty} \frac{E (1 \times π \times S L A V = Y e s, \forall P T ({R e q}_{i}, ∆ T) \geq 0)}{P (π)}

SLAV(BAY_L) =

E V (\frac{S L A V (B A Y_L)}{Π}, T) ≅ M e d i u m

E V (\frac{S L A V (C C)}{π}, ∆ T) = \int_{- \infty}^{0} \frac{\sum_{a ϵ π} S L A V (C C) (a)}{| π |} + \int_{0}^{+ \infty} \frac{\sum_{a ϵ π} S L A V (C C) (a)}{| π |}

E V (\frac{S L A V (C C)}{π}, ∆ T) = \sum_{d \in D} d \int_{- \infty}^{+ \infty} \frac{\sum_{a ϵ π} S L A V (L L M_Q L) (a)}{| π |}

E (\frac{S L A V (C C)}{π}, ∆ T) = \int_{- \infty}^{+ \infty} \frac{E (1 \times π \times \sum_{i = 1}^{i = n} S L A V = Y e s, \forall P T ({R e q}_{i}, ∆ T) \geq 0)}{P (π)}

SLAV(CC) =

E V (\frac{S L A V (C C)}{Π}, T) ≅ M e d i u m

E V (\frac{S L A V (F_Q L)}{π}, ∆ T) = \int_{- \infty}^{0} \frac{\sum_{a ϵ π} S L A V (F_Q L) (a)}{| π |} + \int_{0}^{+ \infty} \frac{\sum_{a ϵ π} S L A V (F_Q L) (a)}{| π |}

E V (\frac{S L A V (F_Q L)}{π}, ∆ T) = \sum_{d \in D} d \int_{- \infty}^{+ \infty} \frac{\sum_{a ϵ π} S L A V (F_Q L) (a)}{| π |}

E (\frac{S L A V (F_Q L)}{π}, ∆ T) = \int_{- \infty}^{+ \infty} \frac{E (1 \times π \times \sum_{i = 1}^{i = n} S L A V = Y e s, \forall P T ({R e q}_{i}, ∆ T) \geq 0)}{P (π)}

SLAV(F_QL) =

E V (\frac{S L A V (F_Q L)}{Π}, T) ≅ H i g h

6. Results and Discussion

The simulation of the proposed LLM_QL resource provisioning framework was carried out using iFogSim 3.3 simulator toolkit [18,19]. It enables the simulation and modeling of the resource provisioning framework for fog computing environments. For experimental purposes, the ChatGPT 3.5 classic model was used; this is a form of language model that provides a heuristic form of the Q value to accelerate the exploration process of the algorithm [20,21]. The LLM setup details are provided in Table 1.

Table 1. LLM setup.

The fog environment simulation parameters and ChatGPT annotated dataset are initialized as shown in Table 2 [22].

Table 2. Simulation parameters of ChatGPT dataset.

The user application parameters are initialized as shown in Table 3.

Table 3. User application parameters setup.

6.1. Processing Delay

Figure 2 shows the graph of the number of fog nodes versus processing delay. It is observed from the graph that the processing delay of LLM_QL is consistently less (50–200 ms) as it takes the optimal Q value at the step of learning through LLM guidance. Similarly, the processing delay of F_QL remains moderate (450–800 ms) as the learning process slows down and requires many episodes of training to arrive at the optimal solution. In contrast, the processing delay of BAY_L and CC remains very high with the increase in the number of fog of nodes (800–900 ms) as it involves a large number of parameters and posterior distributions are influenced by the prior distribution.

Figure 2. Number of fog nodes versus processing delay (ms).

6.2. Energy Consumption

A graph of the number of IoT requests versus energy consumption is shown in Figure 3. It is observed from the graph that the energy consumption of LLM_QL is consistently less (150–200 J) with the increase in the number of requests. The LLM-generated heuristic value is used for reward shaping, which helps in generating desired policy. The energy consumption of BAY_L and CC are found to be moderate (800–900 J). The centralized controller makes the system vulnerable and the presence of uncertainty is not handled in a generalized manner, whereas the energy consumption of F_QL is found to be very high (600–1000 J) with the increasing number of requests. With the increase in the state space, the Q-table becomes larger and maintenance becomes practically impossible.

Figure 3. Number of IoT requests versus energy consumption (J).

6.3. Load Imbalance

The graph of the number of IoT requests versus load imbalance is shown in Figure 4. It is observed from the graph that the imbalance is less (0.2–0.4) for LLM_QL over the increase in the number of requests. Because of LLM guidance, there is no need for hyperparameter tuning and it can adapt quickly to different fog environment settings. The load imbalances of CC and F_QL are moderate (0.2–0.5) as the lookup table of Q state is replaced with the fuzzy system, which represents the state and action pair for each Q state. In contrast, the load balance of BAY_QL is found to be consistently very high (0.8–0.9) over the increase in the number of requests as it incorporates the previous beliefs, which results in an urge to converge to suboptimal solutions.

Figure 4. Number of IoT requests versus load imbalance (0–1).

6.4. Service Level Agreement Violation

A graph of time versus SLA violation is shown in Figure 5. It is observed from the graph that the SLA violation of LLM_QL is consistently less (5–10%) with respect to time as it prevents the overestimation and underestimation of Q values by nullifying the effect of hallucinations. The SLA violation of CC is moderate (20–80%) as it is properly streamlined and has a well-defined hierarchy system to ensure proper policy decisions. In contrast, the SLA violations of BAY_L and F_QL are high (20–90%) with respect to time because the rules are updated using the maximum operator, which is subject to positive bias and affects the learning process. Also, the computation cost is higher and the accuracy of the results varies depending on the value of the random seed used.

Figure 5. Time (ms) versus SLA violation (%).

7. Conclusions

This paper presents a novel LLM-guided Q-learning framework for resource provisioning in fog computing. The uncertainty in fog computing is modeled using an LLM model. The heuristic value of the LLM model is used to provide guidance to the Q-learning agent. Over and under provisioning of resources is prevented through proper exploration of the state space using the maximum entropy solution. The expected value analysis of the framework under finite and infinite fog computing scenarios is performed. The experimental results obtained by the iFogSim 3.3. simulator are good with respect to processing delay, load balancing, and SLA violations. As future work, detailed analytical modeling and analysis of the framework will be carried out in terms of resource migration and placement. Extension of the framework will be conducted pertaining to fault tolerance, auto recovery, and cost optimization.

Author Contributions

Conceptualization, B.K. and S.G.S.; methodology, B.K.; software, S.G.S.; validation, B.K. and S.G.S.; formal analysis, B.K.; investigation, S.G.S.; resources, B.K.; data curation, B.K.; writing—original draft preparation, B.K.; writing—review and editing, S.G.S.; visualization, B.K.; supervision, S.G.S.; project administration, B.K.; funding acquisition, S.G.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to corresponding authors.

Conflicts of Interest

The authors declare no conflict of interest.

References

Srirama, S.N. A decade of research in fog computing: Relevance, challenges, and future directions. Softw. Pract. Exp. 2024, 54, 3–23. [Google Scholar] [CrossRef]
Ali, S.; Alubady, R. A Survey of Fog Computing-Based Resource Allocation Approaches: Overview, Classification, and Opportunity for Research. Iraqi J. Sci. 2024, 65, 4008–4029. [Google Scholar] [CrossRef]
Das, R.; Inuwa, M.M. A review on fog computing: Issues, characteristics, challenges, and potential applications. Telemat. Inform. Rep. 2023, 10, 100049. [Google Scholar] [CrossRef]
Sabireen, H.; Neelanarayanan, V.J.I.E. A review on fog computing: Architecture, fog with IoT, algorithms and research challenges. Ict Express 2021, 7, 162–176. [Google Scholar]
Clifton, J.; Laber, E. Q-learning: Theory and applications. Annu. Rev. Stat. Its Appl. 2020, 7, 279–301. [Google Scholar] [CrossRef]
Hansen-Estruch, P.; Kostrikov, I.; Janner, M.; Kuba, J.G.; Levine, S. IDQL: Implicit q-learning as an actor-critic method with diffusion policies. arXiv 2023, arXiv:2304.10573. [Google Scholar]
Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Wen, J.R. A survey of large language models. arXiv 2023, arXiv:2303.18223. [Google Scholar]
Chang, Y.; Wang, X.; Wang, J.; Wu, Y.; Yang, L.; Zhu, K.; Xie, X. A survey on evaluation of large language models. ACM Trans. Intell. Syst. Technol. 2024, 15, 1–45. [Google Scholar] [CrossRef]
Du, Y.; Watkins, O.; Wang, Z.; Colas, C.; Darrell, T.; Abbeel, P.; Andreas, J. Guiding pretraining in reinforcement learning with large language models. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; PMLR: New York, NY, USA, 2023; Volume 202, pp. 8657–8677. [Google Scholar]
Ma, R.; Luijkx, J.; Ajanovic, Z.; Kober, J. ExploRLLM: Guiding exploration in reinforcement learning with large language models. arXiv 2024, arXiv:2403.09583. [Google Scholar]
Abu-Amssimir, N.; Al-Haj, A. A QoS-aware resource management scheme over fog computing infrastructures in IoT systems. Multimed. Tools Appl. 2023, 82, 28281–28300. [Google Scholar] [CrossRef]
Etemadi, M.; Ghobaei-Arani, M.; Shahidinejad, A. Resource provisioning for IoT services in the fog computing environment: An autonomic approach. Comput. Commun. 2020, 161, 109–131. [Google Scholar] [CrossRef]
Nguyen, N.D.; Phan, L.A.; Park, D.H.; Kim, S.; Kim, T. ElasticFog: Elastic resource provisioning in container-based fog computing. IEEE Access 2020, 8, 183879–183890. [Google Scholar] [CrossRef]
Mseddi, A.; Jaafar, W.; Elbiaze, H.; Ajib, W. Centralized and collaborative RL-based resource allocation in virtualized dynamic fog computing. IEEE Internet Things J. 2023, 10, 14239–14253. [Google Scholar] [CrossRef]
Faraji-Mehmandar, M.; Jabbehdari, S.; Javadi, H.H.S. Fuzzy Q-learning approach for autonomic resource provisioning of IoT applications in fog computing environments. J. Ambient. Intell. Humaniz. Comput. 2023, 14, 4237–4255. [Google Scholar] [CrossRef]
Etemadi, M.; Ghobaei-Arani, M.; Shahidinejad, A. A learning-based resource provisioning approach in the fog computing environment. J. Exp. Theor. Artif. Intell. 2021, 33, 1033–1056. [Google Scholar] [CrossRef]
Sumona, S.T.; Hasan, S.S.; Tamzid, A.Y.; Roy, P.; Razzaque, M.A.; Mahmud, R. A Deep Q-Learning Framework for Enhanced QoE and Energy Optimization in Fog Computing. In Proceedings of the 2024 20th International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT), Abu Dhabi, United Arab Emirates, 29 April–1 May 2024; IEEE: New York, NY, USA, 2024; pp. 669–676. [Google Scholar]
Baneshi, S.; Varbanescu, A.L.; Pathania, A.; Akesson, B.; Pimentel, A. Estimating the energy consumption of applications in the computing continuum with ifogsim. In Proceedings of the International Conference on High Performance Computing, Denver, CO, USA, 12–17 November 2023; Springer Nature: Cham, Switzerland, 2023; pp. 234–249. [Google Scholar]
Mahmud, R.; Pallewatta, S.; Goudarzi, M.; Buyya, R. Ifogsim2: An extended ifogsim simulator for mobility, clustering, and microservice management in edge and fog computing environments. J. Syst. Softw. 2022, 190, 111351. [Google Scholar] [CrossRef]
Kocon, J.; Cichecki, I.; Kaszyca, O.; Kochanek, M.; Szydło, D.; Baran, J.; Kazienko, P. ChatGPT: Jack of all trades, master of none. Inf. Fusion 2023, 99, 101861. [Google Scholar] [CrossRef]
Vujinović, A.; Luburić, N.; Slivka, J.; Kovačević, A. Using ChatGPT to annotate a dataset: A case study in intelligent tutoring systems. Mach. Learn. Appl. 2024, 16, 100557. [Google Scholar] [CrossRef]
Pandey, C.; Tiwari, V.; Rathore, R.S.; Jhaveri, R.H.; Roy, D.S.; Selvarajan, S. Resource-efficient synthetic data generation for performance evaluation in mobile edge computing over 5G networks. IEEE Open J. Commun. Soc. 2023, 4, 1866–1878. [Google Scholar] [CrossRef]

Figure 1. The high-level architecture of LLM-guided Q-learning framework for resource provisioning.

Figure 2. Number of fog nodes versus processing delay (ms).

Figure 3. Number of IoT requests versus energy consumption (J).

Figure 4. Number of IoT requests versus load imbalance (0–1).

Figure 5. Time (ms) versus SLA violation (%).

Table 1. LLM setup.

Features	Specification
Host machine	48 Core CPU, 24 GB of video memory, and 120 GB of RAM.
LLM type	ChatGPT classic
Start time step	30,000
Evaluation frequency	6000
Exploration noise	0.1
Batch size	256
Discount factor	0.9
Target network update rule	0.005
Policy noise	0.5
Policy update frequency	Intermediate
Hidden layer size	512 Humanoid

Table 2. Simulation parameters of ChatGPT dataset.

Features	Specification
Configuration file format	JSON
Number of fog nodes	10,000–50,000
Number of fog instances	500
CPU capacity of fog node	80–110
RAM capacity of fog node	150–250
Average MIPS of the fog node	350–500
Bandwidth between fog node and fog instance	350–450 Mbps
Bandwidth between fog instance and cloud	500–1500 Mbps
Bandwidth among fog instances	500–1000 Mbps
Latency between fog node and fog instance	50–120 ms
Latency between fog instance and cloud	120–220 ms
Latency among fog instances	110–220 ms
Number of hops	2

Table 3. User application parameters setup.

Features	Specification
Configuration file format	JSON
Number of task requests	6000–10,000
Maximum number of task requests	100,000
Number of vCPU required for each task	1–10
RAM required for each task	100–2000 MB
Makespan time for each task	20–2000
Priority of every task	1–10
Fog bandwidth demand	100–2000 Mbps
Fog latency demand	10–100 ms
Convergence time for task requests	2–4 h

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Scalable Resource Provisioning Framework for Fog Computing Using LLM-Guided Q-Learning Approach

Abstract

1. Introduction

2. Related Work

3. System Model

4. Proposed Work

5. Expected Value Analysis

5.1. Finite Fog Computing Scenario

5.2. Infinite Fog Computing Scenario

6. Results and Discussion

6.1. Processing Delay

6.2. Energy Consumption

6.3. Load Imbalance

6.4. Service Level Agreement Violation

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics