You are currently viewing a new version of our website. To view the old version click .
Algorithms
  • Article
  • Open Access

17 April 2025

Scalable Resource Provisioning Framework for Fog Computing Using LLM-Guided Q-Learning Approach

and
1
Department of Computer Science and Engineering, Siddaganga Institute of Technology, Tumakuru 572103, Karnataka, India
2
Department of Computer Science, University of Memphis, Memphis, TN 38152, USA
*
Author to whom correspondence should be addressed.
This article belongs to the Section Algorithms for Multidisciplinary Applications

Abstract

Fog computing is one of the growing distributed computing platforms incorporated by Industries today as it performs real-time data analysis closer to the edge of the IoT network. It offers cloud capabilities at the edge of the fog networks through improved efficiency and flexibility. As the demands of Internet of Things (IoT) devices keep varying, it is important to rapidly modify the resource allocation policies to satisfy them. Constant fluctuation of the demands leads to over or under provisioning of resources. The computing capability of the fog nodes is small, and hence there is a necessity to develop resource provisioning policies that reduce the delay and bandwidth consumption. In this paper, a novel large language model (LLM)-guided Q-learning framework is designed and developed. The uncertainty in the fog environment in terms of delay incurred, bandwidth usage, and heterogeneity of fog nodes is represented using the LLM model. The reward shaping of a Q-learning agent is enriched by considering the heuristic value of the LLM model. The experimental results ensure that the proposed framework is good with respect to processing delay, energy consumption, load balancing, and service level agreement violation under a finite and infinite fog computing environment. The results are further validated through the expected value analysis statistical methodology.

1. Introduction

Fog computing is an extended part of cloud computing that uses edge devices to carry out computation, communication, and storage. The data, applications, and resources of the cloud are moved closer to the end users. Even the workload generated by IoT devices is decentralized to reduce the latency incurred and to increase the efficiency of operation. The fog computing market is increasing as there is a need for developing low-latency solutions and real-time processing of applications. The market size of fog computing at the global level is expected to reach USD 12,206 million by 2033. The growing market size is because of the increasing number of IoT devices which demand the need to process and analyze data closer to the source of the applications [1,2].
The distributed nature of fog computing poses significant challenges with respect to accessibility, resource provisioning, security, privacy, load balancing, and many other factors. Fog computing extends support for millions of IoT requests, and their demands are heterogeneous and stringent in nature. Resource provisioning is one of the prominent challenges in a fog environment as the user exhibits high demands for resources like bandwidth, throughput, response time, and availability. Improper provisioning of resources has a greater impact on performance by increasing the latency and also causes inefficiency for services within the fog network. Hence, there is a need to formulate a proper resource management policy for a better user experience [3,4].
Q learning is a basic form of reinforcement learning that determines the optimal action policy for a finite Markov Decision Process through interaction with the environment in repetition. In a computing environment with a large state space, the learning process is slow and often requires several episodes of training and ends up with suboptimal solutions. With the increase in the state space, the size of the Q-table keeps growing, which leads to high memory usage and results in inefficient exploitation of the state space. The success of the Q-learning model is determined by the reward function, which reflects the ability of the Q-learning agent to reach the goal state with minimum episodes of training. Proper reward shaping enhances the effectiveness of the Q-learning algorithm [5,6].
A typical large language model (LLM) represents a world model for planning and control operation. It enables autonomous state exploration and directly outputs rewards and states. LLMs are lightweight in nature, which makes them suitable for deployment in fog environments. The models are further optimized to reduce the size of the model and speed up the arrival of inference using the Generative Pre-trained Transformer Quantized (GPTQ) method. The use of edge accelerators like NVIDIA JavaScript Object Notation (JSON) and a Tensor Processing Unit (TPU) helps in efficient execution of LLM models at edge devices by preserving the privacy of the data. The LLM-guided Q-learning algorithm uses the LLM’s action probability as a heuristic value to influence the Q function. Here, the LLM model is employed to guide the Q-learning agent for reward shaping using the heuristic value. It modulates the Q-value function by implicitly inducing the desired resource provisioning policy in it. The reshaped Q function prevents the over and under estimation of the policies and converges to an optimal solution [7,8].
In this paper, a novel LLM-guided Q-learning framework is designed to address the resource provisioning problem. The action bonus form of the heuristic, i.e., maximum entropy, is generated by the LLM model, which encourages exploration of the large state space fog environment. The heuristic value generated by the LLM provides navigation guidance to a Q-learning agent at every iteration step, which results in the formulation of the desired policy in the upcoming iterations of training. The expected value analysis under a finite and infinite fog environment is performed towards objective functions to arrive at the desired resource provisioning policies [9,10].
The objectives of this paper are as follows:
  • Mathematical representation of the fog computing system model and objective functions considered for evaluation purposes.
  • Design of a novel framework for resource provisioning using an LLM-guided Q-learning model.
  • Illustration of algorithms for each of the components in the LLM-guided Q-learning model.
  • Expected value analysis of the proposed framework in a finite and infinite fog environment.
  • Simulation of the proposed framework using the iFogSim 3.3 simulator by considering the ChatGPT classic model as a large language model and synthetic workload.
The remaining sections of the paper are organized as follows: Section 2 deals with the related work, Section 3 presents the system model, Section 4 discusses the proposed framework, Section 5 presents expected value analysis of the proposed framework under a finite and infinite fog environment, Section 6 discusses the results, and finally Section 7 arrives at the conclusion.

3. System Model

The system model inputs the set of requests coming from various IoT devices that are capable of being executed among the fog nodes in the fog tier.
I o T R E Q = { R e q 1 , R e q 2 , R e q 3 , R e q 4 , , R e q n }
Every request is composed of varying resource requirements in terms of Deadline (Ddl), Computational power (Cp), Memory (Mem), Storage (St), and Bandwidth (Bw).
R e q i = { D d l ( R e q i ) , C p ( R e q i ) , M e m ( R e q i ) , S t ( R e q i ) , B w ( R e q i ) }
The set of fog nodes is distributed across the fog tier.
F N = { F n 1 , F n 2 , F n 3 , F n 4 , , F n n }
The following end Objective Functions (OFs) are set for the proposed LLM-guided Q-learning (LLM_QL) framework.
OF1: Processing Delay (PD(LLM_QL)): This is the time required to service the requests sent by the IoT devices over the fog layer.
P D L L M _ Q L = M i n i m i z e i I 1 P R R e q i × S R e q i
where P R R e q i stands for requests processing rate across fog nodes (million instructions per second), S R e q i represents the size of the requests (million instructions), and i represents the index term to traverse through the number of requests. The objective of the LLM_QL is to minimize the processing delay involved in servicing the requests.
OF2: Energy Consumption (EC(LLM_QL)): This is the amount of energy required for input request pre-processing, computation, and transmission over the fog layer.
E C L L M _ Q L = M i n i m i z e i I P C R e q i W L R e q i C P f n i  
where P C R e q i represents the power required to process the input requests, W L R e q i is the total workload allocated, which is measured in terms of Floating Point Operations Per Second (FLOPS), C P f n i is the computational capacity of the fog node in Operations Per Second (OPS), and i represents the index term to traverse through the number of requests.
OF3: Load (LD(LLM_QL)): This is the measure of a large quantity of work put on fog nodes for execution.
L D L L M _ Q L = B a l a n c e i I M e a n N ( f n i ) N ( R e q i ) N f n i C C ( f n i )    
where M e a n N ( f n i ) represents the mean number of fog nodes that process the requests from IoT devices for their successful execution, N ( R e q i ) is the number of requests from IoT devices which are up and running, N ( f n i ) is the total number of fog nodes, C C ( f n i ) is the computational capacity of the fog node measured in terms of Million Instructions Per Second (MIPS), and i represents the index term to traverse through the number of fog nodes.
OF4: Service Level Agreement Violation (SLAV(LLM_QL)): This is the measure of violation that occurs when the agreed standards of service are not provided by the fog service provider.
S L A V L L M _ Q L = M i n i m i z e S L A V = Y e s ,     i f   P T R e q i , T > 0 S L A V = N o ,     i f   P T R e q i , T 0    
The SLAV has happened when the P T R e q i , T value exceeds zero, and similarly the SLAV has not happened when P T R e q i , T becomes lesser than or equal to zero.

4. Proposed Work

The high-level architecture of the proposed LLM-guided Q-learning framework for task scheduling is shown in Figure 1. The architecture is divided into three compartments: IoT requests, the fog node resource pool, and the LLM-powered Q-learning agent. The resource requirements of the IoT devices include temporary storage, bandwidth, processing power, and energy. The fog nodes represent an essential component of the architecture, and include virtual computers, switches, gateways, and routers. These nodes are tightly coupled and provide a rich set of computing resources to the IoT devices as per their requirement. Basically, the Q-learning agent undertakes a representative form of learning, which keeps continuously improving the agent’s action according to the response of the operating environment. One of the significant challenges encountered by the Q-learning agent is overestimation of the bias as it tries to approximate the action value by considering the maximum estimated value of the action. The computed Q value is greater than the true value of one or more actions. The maximum operator applied over the action’s value is more likely to be inclined towards the overestimated Q value. The effectiveness of the learning is limited as it does not provide the correct heuristic value for each state action pair. Hence, in this framework the Q-learning is enriched with the heuristics of the LLM model, leading to high inference speed and also overcomes the impact of bias introduced.
Figure 1. The high-level architecture of LLM-guided Q-learning framework for resource provisioning.
The detailed working of the LLM_QL framework is as follows. The state space of the Q-learning agent represents a set of all possible states that a Q-learning agent encounters in the environment. It is an important component of Q-learning agent as it learns to formulate the optimal action depending on the prevailing states. The state space indicates all possible combinations of configurations that the environment is composed of. Each state is used to provide a unique experience for the agent. The state typically represents the condition of the system, i.e., resource usage, task load, latency, and energy consumption. The fog nodes are made to learn best resource allocation policies by consequently interacting with the environment. The Q values are updated using the Bellman equation. The agent performs actions, which include allocation of resources, migration of tasks from fog node to the cloud, and queuing of the task for execution at later stages. A positive reward is provided for actions such as less processing delay, effective load balance, and lesser service level violations. Similarly, a negative reward is provided for actions such as higher processing delay, frequent load imbalance, and higher service level violations. Traditional Q-learning agents suffer from dimensionality problems and they cannot update the state action pairs for all possible combinations. However, the Q-learning agent enriched with the heuristic guidance of LLM helps in easy generalization and can operate efficiently in large and continuous state space environments.
The environment setup for the Q-learning agent involves the state space, action space, reward function, and transition dynamics. A pre-built environment is used for implementation using Application Programming Interfaces (APIs) like OpenAI Gym and unity machine learning agents. The Q-learning process inputs the IoT device requests and processes them using the Bellman equation to determine the optimal fog node for servicing.
Algorithm 1 provides the working of the LLM-guided Q-learning framework in detail. The IoT device requests are mapped onto the best fit fog nodes by considering the action value computed by the LLM-guided Q-learning agent. The heuristic value encourages the thinking ability of the agent in reaching the goal state. While navigating the Q-learning agent, the landmark positions are given highest heuristic values, which makes the traversal easier. It converges rapidly to appropriate promising solutions instead of searching for the perfect solution. By performing limited sampling and putting constraints over the responses, LLMs are made capable enough to perform zero-shot learning in varying computing scenarios of the fog environment. The actor–critic Q-learning algorithm usually finds it difficult to derive the policies through the implicit exploration process. Just by applying sample efficiency, proper reward reshaping is not possible and often ends up with the wrong heuristic value. Hence, the use of LLMs represents a promising approach for reward reshaping through autonomous state space exploration for Q-learning agents. Any possible inaccuracies in the learning steps are handled through fine tuning of the learning steps, which enhances the speed of inference and minimizes the impact of hallucination.
Algorithm 1: Working of LLM-guided Q-learning framework
1: Start
2: Input: IoT device request set: I o T R E Q = { R e q 1 , R e q 2 , R e q 3 , R e q 4 , , R e q n } ,
Markov Decision Process MDP, Large Language Model G, Prompt P
3: Output: Resource provisioning policies i.e., R P P = { R p p 1 , R p p 2 , R p p 3 , R p p 4 , , R p p n }
4: Training phase of LLM_QL
5:    Initialize the Heuristic Q buffer: MDP(G(P)), actor-critic μ , Q θ 1 , Q θ 2 = N U L L , and target actor-critic μ , Q θ 1 , Q θ 2 = N U L L
6:     Generate Q buffer: D g { ( S i , a i , Q i ) | S i , a i , Q i M D P G P , i = 1,2 , , n } , where S i = state of the agent at time step i, a i = action performed at time step i, and Q i = Q value for action a i in state S i .
7:    Q Bootstrapping θ = θ α θ L B o o t s t r a p
8:        L B o o t s t r a p θ = E S i , a i , Q i D g ( Q i q θ Λ S i , a i ) 2
9:   For each IoT device request in request task set R e q i I o T R E Q  do
10:    Perform sampling of the Q state S i , a i , r i , S i from the fog state space, r i = reward received for the action a i
11:     Compute Q buffer D g = D g S i , a i , r i , S
12:    Perform sampling over the computed value of the Q buffer D g
13:     α ^ = μ S i + , where = c l i p ( N 0 , σ , c , c )
14:     y S i = ( r + γ m i n ( q θ ( S i , a ^ ))
15:    Update the critic θ i = a r g m i n θ i × L m a j o r ( θ i )
16:    if t mod D g == 0 then
17:     Update actor = a r g m i n * L a c t o r ( )
18:     Update the target networks
19:        θ i = τ θ i + ( 1 τ ) θ i
20:        = τ + ( 1 τ )
21:      End if
22:   End for
23: End Training phase of LLM_QL
24: Testing phase of LLM_QL
25:     For each testing iteration t of IoT device request in request task set R e q i I o T R E Q  do
26:    Re-compute Q buffer D g = D g S i , a i , r i , S , where D g represent the memory buffer
27:    Compute the updated value of the critic θ i = a r g m i n θ i L m a j o r ( θ i )
28:    Update the critic θ i = a r g m i n θ i L m a j o r ( θ i )
29:    Update the target networks
30:        θ i = τ θ i + ( 1 τ ) θ i
31:        = τ + ( 1 τ )
32:    End For
33: End Testing phase of LLM_QL
34: Output R P P = { R p p 1 , R p p 2 , R p p 3 , R p p 4 , , R p p n }
35: Stop

5. Expected Value Analysis

The expected value analysis of the proposed LLM_QL is performed by considering four objective functions, i.e., processing delay, energy consumption, load imbalance, and service level agreement violation. Two types of fog computing scenarios are considered for analysis purposes, i.e., finite fog computing scenarios and infinite fog computing scenarios. The four objective functions (OFs) considered for analysis purpose are Processing Delay (PD(LLM_QL)), Energy Consumption (EC(LLM_QL)), Load (LD(LLM_QL)), and Service Level Agreement Violation (SLAV(LLM_QL)). The performance of the proposed LLM_QL is compared with three of the recent existing works, i.e., Bayesian Learning (BAY_L) [12], Centralized Controller (CC) [14], and Fuzzy Q Learning (F_QL) [15].

5.1. Finite Fog Computing Scenario

The finite fog computing scenario consists of N number of IoT device requests R E Q = { R e q 1 ,   R e q 2 , R e q 3 , , R e q n } , K fog computing nodes F N = { f n 1 ,   f n 2 , f n 3 , , f n n } , and P resource provisioning policies π = < π 1 ,   π 2 , π 3 , , π p > . The probability of the outcome of the objective functions varies between low, medium, or high P O F = < L o w , M e d i u m , H i g h > .
OF1: Processing Delay (PD(LLM_QL)): The expected value of the processing delay of the proposed LLM_QL, i.e., EV(PD(LLM_QL)) is influenced by the expected value of the processing rate of the requests E V ( P R R e q i ) and the expected value of the size of the requests E V ( S R e q i ) .
E V P D L L M _ Q L π ,   T = x y a ϵ π P D L L M _ Q L ( a ) | π |
E V P D L L M _ Q L π ,   T = d D d x y a ϵ π P D L L M _ Q L ( a ) | π |
E V P D L L M _ Q L π ,   T = E V 1 × π × i = 1 i = I 1 P R R e q i × S R e q i π = d D d q Q 1 P R R e q i   × S R e q i ] = q Q Q × d P   1 P R R e q i × S R e q i ] π
PD(LLM_QL):  E V P D ( L L M _ Q L ) Π , T l o w
E V P D B A Y _ L π ,   T = x y a ϵ π P D B A Y _ L ( a ) | π |
E V P D B A Y _ L π ,   T = d D d x y a ϵ π P D B A Y _ L ( a ) | π |
E V P D L L M _ Q L π ,   T = E V 1 × π × i = 1 i = I P d a t a θ × P ( θ ) P ( d a t a ) π
PD(BAY_L) = E V P D B A Y _ L Π , T H i g h
E V P D C C π ,   T = x y a ϵ π P D C C ( a ) | π |
E V P D C C π ,   T = d D d x y a ϵ π P D C C ( a ) | π |
E V P D C C π ,   T = E V 1 × π × i = 1 i = I f ( x t , r t , P ) π
where f(x(t),r(t),P) = System state, Desired setpoint, and system parameters.
PD(CC) = E V P D C C Π , T M e d i u m
E V P D F _ Q L π ,   T = x y a ϵ π P D F _ Q L ( a ) | π |
E V P D F _ Q L π ,   T = d D d x y a ϵ π P D F _ Q L ( a ) | π |
E V P D F _ Q L π ,   T = E V 1 π Q s , a + ( R + γ max Q s , a Q ( s , a ) π
PD(F_QL) = E V P D F _ Q L Π , T H i g h
OF2: Energy Consumption (EC(LLM_QL)): The expected value of the energy consumption of the proposed LLM_QL, i.e., EV(EC(LLM_QL)) is influenced by the expected value of the power required to process the input requests E V ( P C R e q i ) , workload allocated E V ( W L R e q i ) , and computational capacity of the fog node E V ( C P f n i ) .
E V E C L L M _ Q L π ,   T = x y a ϵ π E C L L M _ Q L ( a ) | π |
E V E C L L M _ Q L π ,   T = d D d x y a ϵ π E C L L M _ Q L ( a ) | π |
E V E C L L M _ Q L π ,   T = E 1 × π × i = 1 i = I P C R e q i × W L R e q i C P ( f n i P ( π ) = d D d q Q [ P C R e q i × W L R e q i C P ( f n i ] = q Q Q × d P   P C R e q i × W L R e q i C P ( f n i ] π
EC(LLM_QL):  E V E C ( L L M _ Q L ) Π , T L o w
E V E C B A Y _ L π ,   T = x y a ϵ π E C L L M _ Q L ( a ) | π |
E E C B A y _ L π ,   T = d D d x y a ϵ π E C L L M _ Q L ( a ) | π |
E E C B A Y _ L π ,   T = E 1 × π × i = 1 i = I P C R e q i × P d a t a θ × P ( θ ) P ( d a t a ) P ( π )
EC(BAY_L) = E V E C B A Y _ L Π , T M e d i u m
E V E C C C π ,   T = x y a ϵ π E C L L M _ Q L ( a ) | π |
E E C C C π ,   T = d D d x y a ϵ π E C L L M _ Q L ( a ) | π |
E E C C C π ,   T = E 1 × π i = 1 i = I P C R e q i × f ( x t , r t , P ) P ( π )
EC(CC) = E V E C C C Π , T H i g h
E V E C F _ Q L π ,   T = x y a ϵ π E C F _ Q L ( a ) | π |
E V E C F _ Q L π ,   T = d D d x y a ϵ π E C F _ Q L ( a ) | π |
E V E C F _ Q L π ,   T = E V 1 × π × i = 1 i = I P C R e q i × Q s , a + ( R + γ max Q s , a Q ( s , a ) ) P ( π )
EC(F_QL) =   E V E C F _ Q L Π , T M e d i u m
OF3: Load (LD(LLM_QL)): The expected value of the load on the proposed LLM_QL, i.e., EV(LD(LLM_QL)) is influenced by the expected value of the request processing time E V ( P T ( R e q i )) and number of requests generated E V ( N R e q i , T ) .
E V L D L L M _ Q L π ,   T = x y a ϵ π L D L L M _ Q L ( a ) | π |
E V L D L L M _ Q L π ,   T = d D d x y a ϵ π L D L L M _ Q L ( a ) | π |
E V L D ( L L M _ Q L ) π ,   T = E V 1 × π × i = 1 i = I M e a n N ( f n i ) × N ( R e q i ) N f n i × C C ( f n i ) P ( π ) = d D d q Q M e a n N ( f n i ) × N ( R e q i ) N f n i × C C ( f n i ) = q Q Q × d P M e a n N ( f n i ) × N ( R e q i ) N f n i × C C ( f n i ) π
LD(LLM_QL):  E V L D ( L L M _ Q L ) Π , T B a l a n c e d
E V L D B A Y _ L π ,   T = x y a ϵ π L D B A Y _ L ( a ) | π |
E V L D B A Y _ L π ,   T = d D d x y a ϵ π L D B A Y _ L ( a ) | π |
E V L D ( B A Y _ L ) π ,   T = E V 1 × π × i = 1 i = I M e a n _ L o a d P e a k _ L o a d P ( π )
LD(BAY_L) = E V L D B A Y _ L Π , T P a r t i a l l y   B a l a n c e d
E V L D C C π ,   T = x y a ϵ π L D C C ( a ) | π |
E V L D C C π ,   T = d D d x y a ϵ π L D C C ( a ) | π |
E V L D ( C C ) π ,   T = E V 1 × π × i = 1 i = I C P ( f n i ) L o a d ( f n i ) × f ( x t , r t , P ) P ( π )
LD(CC) = E V L D C C Π , T I m b a l a n c e d
E V L D F _ Q L π ,   T = x y a ϵ π L D F _ Q L ( a ) | π |
E V L D F _ Q L π ,   T = d D d x y a ϵ π L D F _ Q L ( a ) | π |
E V L D ( F _ Q L ) π ,   T = E V 1 × π × i = 1 i = I C P R e q i L o a d R e q i × Q s , a + ( R + γ max Q s , a Q s , a ) P ( π )
LD(F_QL) = E V L D F _ Q L Π , T I m b a l a n c e d
OF4: Service Level Agreement Violation (SLAV(LLM_QL)): The expected value of the Service Level Agreement Violation of the proposed LLM_QL, i.e., EV(SLAV(LLM_QL)) is influenced by the expected value of the request processing time E V ( P T ( R e q i )).
E V S L A V L L M _ Q L π ,   T = x y a ϵ π S L A V L L M _ Q L ( a ) | π |
E V S L A V L L M _ Q L π ,   T = d D d x y a ϵ π S L A V L L M _ Q L ( a ) | π |
E S L A V L L M _ Q L π ,   T = E 1 × π × i = 1 i = n P T R e q i , T P ( π ) = d D d q Q M i n i m i z e S L A V = Y e s ,     i f   P T R e q i , T > 0 S L A V = N o ,     i f   P T R e q i , T 0 = q Q Q × d P   S L A V = N o ,     P T R e q i , T 0 ] π
SLAV(LLM_QL):  E V S L A V ( L L M _ Q L ) Π , T L o w
E V S L A V B A Y _ L π ,   T = x y a ϵ π S L A V B A Y _ L ( a ) | π |
E V S L A V B A Y _ L π ,   T = d D d x y a ϵ π S L A V B A Y _ L ( a ) | π |
E S L A V B A Y _ L π ,   T = d D d q Q S L A V = Y e s ,     P T R e q i , T 0
SLAV(BAY_L) = E V S L A V B A Y _ L Π , T M e d i u m
E V S L A V C C π ,   T = x y a ϵ π S L A V C C ( a ) | π |
E V S L A V C C π ,   T = d D d x y a ϵ π S L A V C C ( a ) | π |
E S L A V C C π ,   T = d D d q Q S L A V = Y e s ,     P T R e q i , T 0
SLAV(CC) = E V S L A V C C Π , T H i g h
E V S L A V F _ Q L π ,   T = x y a ϵ π S L A V F _ Q L ( a ) | π |
E V S L A V F _ Q L π ,   T = d D d x y a ϵ π S L A V F _ Q L ( a ) | π |
E S L A V F _ Q L π ,   T = d D d q Q S L A V = Y e s ,     P T R e q i , T 0
SLAV(F_QL) = E V S L A V F _ Q L Π , T H i g h

5.2. Infinite Fog Computing Scenario

The infinite fog computing scenario consists of an infinite number of IoT device requests R E Q = { R e q 1 ,   R e q 2 , R e q 3 , , R e q } , K fog computing nodes F N = { f n 1 ,   f n 2 , f n 3 , , f n } , and resource provisioning policies π = < π 1 ,   π 2 , π 3 , , π > .
OF1: Processing Delay (PD(LLM_QL)): The EV(PD(LLM_QL)) during the infinite scenario remained less compared to EV(PD(LLM_QL)) during the finite scenario.
E V P D L L M _ Q L π ,   T = 0 a ϵ π P D L L M _ Q L ( a ) | π | + 0 + a ϵ π P D L L M _ Q L ( a ) | π |
E V P D L L M _ Q L π ,   T = d D d 0 a ϵ π P D L L M _ Q L ( a ) | π | + 0 a ϵ π P D L L M _ Q L ( a ) | π |
E V P D L L M _ Q L π ,   T = + E V 1 × π × i = 1 i = I 1 P R R e q i × S R e q i π = d D d + 1 P R R e q i   × S R e q i ] = + Q × d P   1 P R R e q i × S R e q i ] π
PD(LLM_QL):  E V P D ( L L M _ Q L ) Π , T l o w
E V P D B A Y _ L π ,   T = 0 a ϵ π P D B A Y _ L ( a ) | π | + 0 + a ϵ π P D B A Y _ L ( a ) | π |
E V P D B A Y _ L π ,   T = d D d 0 a ϵ π P D B A Y _ L ( a ) | π | + 0 a ϵ π P D B A Y _ L ( a ) | π |
E V P D B A Y _ L π ,   T = + E V 1 × π × i = 1 i = I P d a t a θ P ( θ ) P ( d a t a ) π
PD(BAY_L) = E V P D B A Y _ L Π , T H i g h
E V P D C C π ,   T = 0 a ϵ π P D C C ( a ) | π | + 0 + a ϵ π P D C C ( a ) | π |
E V P D C C π ,   T = d D d 0 a ϵ π P D C C ( a ) | π | + 0 a ϵ π P D C C ( a ) | π |
E V P D L L M Q L π ,   T = + E V 1 × π × i = 1 i = I f ( x t , r t , P ) π
PD(CC) = E V P D C C Π , T H i g h
E V P D F _ Q L π ,   T = 0 a ϵ π P D F _ Q L ( a ) | π | + 0 + a ϵ π P D F _ Q L ( a ) | π |
E V P D F _ Q L π ,   T = d D d 0 a ϵ π P D F _ Q L ( a ) | π | + 0 a ϵ π P D F _ Q L ( a ) | π |
E V P D F _ Q L π ,   T = + E V 1 × π × i = 1 i = I Q s , a + ( R + γ max Q s , a Q ( s , a ) π
PD(F_QL) = E V P D F _ Q L Π , T H i g h
OF2: Energy Consumption (EC(LLM_QL)): The EV(EC(LLM_QL)) during the infinite scenario is found to be less compared to EV(EC(LLM_QL)) during the finite scenario.
E V E C L L M _ Q L π ,   T = 0 a ϵ π E C L L M _ Q L ( a ) | π | + 0 + a ϵ π E C L L M _ Q L ( a ) | π |
E E C L L M _ Q L π ,   T = d D d 0 a ϵ π E C L L M _ Q L ( a ) | π | + 0 + a ϵ π E C L L M _ Q L ( a ) | π |
E E C L L M _ Q L π ,   T = + E 1 × π × i = 1 i = I P C R e q i × W L R e q i C P f n i P ( π ) = d D d + [ P C R e q i × W L R e q i C P f n i ] = + Q × d P   P C R e q i W L R e q i C P f n i ] π
EC(LLM_QL):  E V E C ( L L M _ Q L ) Π , T L o w
E V E C B A Y _ L π ,   T = 0 a ϵ π E C B A Y _ L ( a ) | π | + 0 + a ϵ π E C B A Y _ L ( a ) | π |
E E C B A Y _ L π ,   T = d D d 0 a ϵ π E C B A Y _ L ( a ) | π | + 0 + a ϵ π E C B A Y _ L ( a ) | π |
E E C B A Y _ L π ,   T = + E 1 × π × P d a t a θ × P ( θ ) P ( d a t a ) P ( π )
EC(BAY_L) = E V E C E X W 1 Π , T M e d i u m
E V E C C C π ,   T = 0 a ϵ π E C C C ( a ) | π | + 0 + a ϵ π E C C C ( a ) | π |
E E C C C π ,   T = d D d 0 a ϵ π E C C C ( a ) | π | + 0 + a ϵ π E C C C ( a ) | π |
E E C C C π ,   T = + E 1 × π × i = 1 i = I P C R e q i × f ( x t , r t , P ) P ( π )
EC(CC) = E V E C E X W 2 Π , T H i g h
E V E C F _ Q L π ,   T = 0 a ϵ π E C F _ Q L ( a ) | π | + 0 + a ϵ π E C F _ Q L ( a ) | π |
E E C F _ Q L π ,   T = d D d 0 a ϵ π E C F _ Q L ( a ) | π | + 0 + a ϵ π E C F _ Q L ( a ) | π |
E E C F _ Q L π ,   T = + E 1 × π × i = 1 i = I P C R e q i × Q s , a + ( R + γ max Q s , a Q ( s , a ) ) P ( π )
EC(F_QL) = E V E C E X W 3 Π , T M e d i u m
OF3: Load (LD(LLM_QL)): The EV(LD(LLM_QL)) during the infinite scenario is very much balanced compared to EV(LD(LLM_QL)) during the finite scenario.
E V L D L L M _ Q L π ,   T = 0 a ϵ π L D L L M _ Q L ( a ) | π | + 0 + a ϵ π L D L L M _ Q L ( a ) | π |
E V L D L L M _ Q L π ,   T = d D d + a ϵ π L D L L M _ Q L ( a ) | π |
E V L D ( L L M _ Q L ) π ,   T = + E V 1 × π × i = 1 i = I M e a n N ( f n i ) × N ( R e q i ) N f n i × C C ( f n i ) P ( π ) = d D d + M e a n N ( f n i ) × N R e q i N f n i × C C f n i = + Q × d P M e a n N ( f n i ) × N ( R e q i ) N f n i × C C ( f n i ) π
LD(LLM_QL): E V L D ( L L M _ Q L ) Π , T B a l a n c e d
E V L D B A Y _ L π ,   T = 0 a ϵ π L D B A Y _ L ( a ) | π | + 0 + a ϵ π L D B A Y _ L ( a ) | π |
E V L D B A Y _ L π ,   T = d D d + a ϵ π L D B A Y _ L ( a ) | π |
E V L D ( B A Y _ L ) π ,   T = + E V 1 × π × i = 1 i = I C P ( f n i ) L o a d ( f n i ) × f ( x t , r t , P ) P ( π )
LD(BAY_L) = E V L D B A Y _ L Π , T P a r t i a l l y   B a l a n c e d
E V L D C C π ,   T = 0 a ϵ π L D C C ( a ) | π | + 0 + a ϵ π L D C C ( a ) | π |
E V L D C C π ,   T = d D d + a ϵ π L D C C ( a ) | π |
E V L D ( C C ) π ,   T = + E V 1 × π × i = 1 i = I C P ( f n i ) L o a d ( f n i ) × f ( x t , r t , P ) P ( π )
LD(CC) = E V L D C C Π , T I m b a l a n c e d
E V L D F _ Q L π ,   T = 0 a ϵ π L D F _ Q L ( a ) | π | + 0 + a ϵ π L D F _ Q L ( a ) | π |
E V L D F _ Q L π ,   T = d D d + a ϵ π L D F _ Q L ( a ) | π |
E V L D F Q L π ,   T = + E V 1 × π × i = 1 i = I C P R e q i L o a d R e q i × Q s , a + ( R + γ max Q s , a Q s , a P ( π )
LD(F_QL) = E V L D F _ Q L Π , T I m b a l a n c e d
OF4: Service Level Agreement Violation (SLAV(LLM_QL)): The EV(SLAV(LLM_QL)) during the infinite scenario is found to be consistently lower compared to EV(SLAV(LLM_QL)) during the finite scenario.
E V S L A V L L M _ Q L π ,   T = 0 a ϵ π S L A V L L M _ Q L ( a ) | π | + 0 + a ϵ π S L A V L L M _ Q L ( a ) | π |
E V S L A V L L M _ Q L π ,   T = d D d + a ϵ π S L A V L L M _ Q L ( a ) | π |
E S L A V L L M _ Q L π ,   T = + E 1 × π × i = 1 i = n P T R e q i , T P ( π ) = d D d + M i n i m i z e S L A V = Y e s ,     i f   P T R e q i , T > 0 S L A V = N o ,     i f   P T R e q i , T 0 = + Q × d P   S L A V = Y e s ,     P T R e q i , T > 0 ×   S L A V = N o ,     P T R e q i , T 0 ] π
SLAV(LLM_QL): E V S L A V ( L L M _ Q L ) Π , T L o w
E V S L A V B A Y _ L π ,   T = 0 a ϵ π S L A V B A Y _ L ( a ) | π | + 0 + a ϵ π S L A V B A Y _ L ( a ) | π |
E V S L A V B A Y _ L π ,   T = d D d + a ϵ π S L A V B A Y _ L ( a ) | π |
E S L A V B A Y _ L π ,   T = + E 1 × π × S L A V = Y e s ,     P T R e q i , T 0 P ( π )
SLAV(BAY_L) = E V S L A V B A Y _ L Π , T M e d i u m
E V S L A V C C π ,   T = 0 a ϵ π S L A V C C ( a ) | π | + 0 + a ϵ π S L A V C C ( a ) | π |
E V S L A V C C π ,   T = d D d + a ϵ π S L A V L L M _ Q L ( a ) | π |
E S L A V C C π ,   T = + E 1 × π × i = 1 i = n S L A V = Y e s ,     P T R e q i , T 0 P ( π )
SLAV(CC) = E V S L A V C C Π , T M e d i u m
E V S L A V F _ Q L π ,   T = 0 a ϵ π S L A V F _ Q L ( a ) | π | + 0 + a ϵ π S L A V F _ Q L ( a ) | π |
E V S L A V F _ Q L π ,   T = d D d + a ϵ π S L A V F _ Q L ( a ) | π |
E S L A V F _ Q L π ,   T = + E 1 × π × i = 1 i = n S L A V = Y e s ,     P T R e q i , T 0 P ( π )
SLAV(F_QL) = E V S L A V F _ Q L Π , T H i g h

6. Results and Discussion

The simulation of the proposed LLM_QL resource provisioning framework was carried out using iFogSim 3.3 simulator toolkit [18,19]. It enables the simulation and modeling of the resource provisioning framework for fog computing environments. For experimental purposes, the ChatGPT 3.5 classic model was used; this is a form of language model that provides a heuristic form of the Q value to accelerate the exploration process of the algorithm [20,21]. The LLM setup details are provided in Table 1.
Table 1. LLM setup.
The fog environment simulation parameters and ChatGPT annotated dataset are initialized as shown in Table 2 [22].
Table 2. Simulation parameters of ChatGPT dataset.
The user application parameters are initialized as shown in Table 3.
Table 3. User application parameters setup.

6.1. Processing Delay

Figure 2 shows the graph of the number of fog nodes versus processing delay. It is observed from the graph that the processing delay of LLM_QL is consistently less (50–200 ms) as it takes the optimal Q value at the step of learning through LLM guidance. Similarly, the processing delay of F_QL remains moderate (450–800 ms) as the learning process slows down and requires many episodes of training to arrive at the optimal solution. In contrast, the processing delay of BAY_L and CC remains very high with the increase in the number of fog of nodes (800–900 ms) as it involves a large number of parameters and posterior distributions are influenced by the prior distribution.
Figure 2. Number of fog nodes versus processing delay (ms).

6.2. Energy Consumption

A graph of the number of IoT requests versus energy consumption is shown in Figure 3. It is observed from the graph that the energy consumption of LLM_QL is consistently less (150–200 J) with the increase in the number of requests. The LLM-generated heuristic value is used for reward shaping, which helps in generating desired policy. The energy consumption of BAY_L and CC are found to be moderate (800–900 J). The centralized controller makes the system vulnerable and the presence of uncertainty is not handled in a generalized manner, whereas the energy consumption of F_QL is found to be very high (600–1000 J) with the increasing number of requests. With the increase in the state space, the Q-table becomes larger and maintenance becomes practically impossible.
Figure 3. Number of IoT requests versus energy consumption (J).

6.3. Load Imbalance

The graph of the number of IoT requests versus load imbalance is shown in Figure 4. It is observed from the graph that the imbalance is less (0.2–0.4) for LLM_QL over the increase in the number of requests. Because of LLM guidance, there is no need for hyperparameter tuning and it can adapt quickly to different fog environment settings. The load imbalances of CC and F_QL are moderate (0.2–0.5) as the lookup table of Q state is replaced with the fuzzy system, which represents the state and action pair for each Q state. In contrast, the load balance of BAY_QL is found to be consistently very high (0.8–0.9) over the increase in the number of requests as it incorporates the previous beliefs, which results in an urge to converge to suboptimal solutions.
Figure 4. Number of IoT requests versus load imbalance (0–1).

6.4. Service Level Agreement Violation

A graph of time versus SLA violation is shown in Figure 5. It is observed from the graph that the SLA violation of LLM_QL is consistently less (5–10%) with respect to time as it prevents the overestimation and underestimation of Q values by nullifying the effect of hallucinations. The SLA violation of CC is moderate (20–80%) as it is properly streamlined and has a well-defined hierarchy system to ensure proper policy decisions. In contrast, the SLA violations of BAY_L and F_QL are high (20–90%) with respect to time because the rules are updated using the maximum operator, which is subject to positive bias and affects the learning process. Also, the computation cost is higher and the accuracy of the results varies depending on the value of the random seed used.
Figure 5. Time (ms) versus SLA violation (%).

7. Conclusions

This paper presents a novel LLM-guided Q-learning framework for resource provisioning in fog computing. The uncertainty in fog computing is modeled using an LLM model. The heuristic value of the LLM model is used to provide guidance to the Q-learning agent. Over and under provisioning of resources is prevented through proper exploration of the state space using the maximum entropy solution. The expected value analysis of the framework under finite and infinite fog computing scenarios is performed. The experimental results obtained by the iFogSim 3.3. simulator are good with respect to processing delay, load balancing, and SLA violations. As future work, detailed analytical modeling and analysis of the framework will be carried out in terms of resource migration and placement. Extension of the framework will be conducted pertaining to fault tolerance, auto recovery, and cost optimization.

Author Contributions

Conceptualization, B.K. and S.G.S.; methodology, B.K.; software, S.G.S.; validation, B.K. and S.G.S.; formal analysis, B.K.; investigation, S.G.S.; resources, B.K.; data curation, B.K.; writing—original draft preparation, B.K.; writing—review and editing, S.G.S.; visualization, B.K.; supervision, S.G.S.; project administration, B.K.; funding acquisition, S.G.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to corresponding authors.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Srirama, S.N. A decade of research in fog computing: Relevance, challenges, and future directions. Softw. Pract. Exp. 2024, 54, 3–23. [Google Scholar] [CrossRef]
  2. Ali, S.; Alubady, R. A Survey of Fog Computing-Based Resource Allocation Approaches: Overview, Classification, and Opportunity for Research. Iraqi J. Sci. 2024, 65, 4008–4029. [Google Scholar] [CrossRef]
  3. Das, R.; Inuwa, M.M. A review on fog computing: Issues, characteristics, challenges, and potential applications. Telemat. Inform. Rep. 2023, 10, 100049. [Google Scholar] [CrossRef]
  4. Sabireen, H.; Neelanarayanan, V.J.I.E. A review on fog computing: Architecture, fog with IoT, algorithms and research challenges. Ict Express 2021, 7, 162–176. [Google Scholar]
  5. Clifton, J.; Laber, E. Q-learning: Theory and applications. Annu. Rev. Stat. Its Appl. 2020, 7, 279–301. [Google Scholar] [CrossRef]
  6. Hansen-Estruch, P.; Kostrikov, I.; Janner, M.; Kuba, J.G.; Levine, S. IDQL: Implicit q-learning as an actor-critic method with diffusion policies. arXiv 2023, arXiv:2304.10573. [Google Scholar]
  7. Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Wen, J.R. A survey of large language models. arXiv 2023, arXiv:2303.18223. [Google Scholar]
  8. Chang, Y.; Wang, X.; Wang, J.; Wu, Y.; Yang, L.; Zhu, K.; Xie, X. A survey on evaluation of large language models. ACM Trans. Intell. Syst. Technol. 2024, 15, 1–45. [Google Scholar] [CrossRef]
  9. Du, Y.; Watkins, O.; Wang, Z.; Colas, C.; Darrell, T.; Abbeel, P.; Andreas, J. Guiding pretraining in reinforcement learning with large language models. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; PMLR: New York, NY, USA, 2023; Volume 202, pp. 8657–8677. [Google Scholar]
  10. Ma, R.; Luijkx, J.; Ajanovic, Z.; Kober, J. ExploRLLM: Guiding exploration in reinforcement learning with large language models. arXiv 2024, arXiv:2403.09583. [Google Scholar]
  11. Abu-Amssimir, N.; Al-Haj, A. A QoS-aware resource management scheme over fog computing infrastructures in IoT systems. Multimed. Tools Appl. 2023, 82, 28281–28300. [Google Scholar] [CrossRef]
  12. Etemadi, M.; Ghobaei-Arani, M.; Shahidinejad, A. Resource provisioning for IoT services in the fog computing environment: An autonomic approach. Comput. Commun. 2020, 161, 109–131. [Google Scholar] [CrossRef]
  13. Nguyen, N.D.; Phan, L.A.; Park, D.H.; Kim, S.; Kim, T. ElasticFog: Elastic resource provisioning in container-based fog computing. IEEE Access 2020, 8, 183879–183890. [Google Scholar] [CrossRef]
  14. Mseddi, A.; Jaafar, W.; Elbiaze, H.; Ajib, W. Centralized and collaborative RL-based resource allocation in virtualized dynamic fog computing. IEEE Internet Things J. 2023, 10, 14239–14253. [Google Scholar] [CrossRef]
  15. Faraji-Mehmandar, M.; Jabbehdari, S.; Javadi, H.H.S. Fuzzy Q-learning approach for autonomic resource provisioning of IoT applications in fog computing environments. J. Ambient. Intell. Humaniz. Comput. 2023, 14, 4237–4255. [Google Scholar] [CrossRef]
  16. Etemadi, M.; Ghobaei-Arani, M.; Shahidinejad, A. A learning-based resource provisioning approach in the fog computing environment. J. Exp. Theor. Artif. Intell. 2021, 33, 1033–1056. [Google Scholar] [CrossRef]
  17. Sumona, S.T.; Hasan, S.S.; Tamzid, A.Y.; Roy, P.; Razzaque, M.A.; Mahmud, R. A Deep Q-Learning Framework for Enhanced QoE and Energy Optimization in Fog Computing. In Proceedings of the 2024 20th International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT), Abu Dhabi, United Arab Emirates, 29 April–1 May 2024; IEEE: New York, NY, USA, 2024; pp. 669–676. [Google Scholar]
  18. Baneshi, S.; Varbanescu, A.L.; Pathania, A.; Akesson, B.; Pimentel, A. Estimating the energy consumption of applications in the computing continuum with ifogsim. In Proceedings of the International Conference on High Performance Computing, Denver, CO, USA, 12–17 November 2023; Springer Nature: Cham, Switzerland, 2023; pp. 234–249. [Google Scholar]
  19. Mahmud, R.; Pallewatta, S.; Goudarzi, M.; Buyya, R. Ifogsim2: An extended ifogsim simulator for mobility, clustering, and microservice management in edge and fog computing environments. J. Syst. Softw. 2022, 190, 111351. [Google Scholar] [CrossRef]
  20. Kocon, J.; Cichecki, I.; Kaszyca, O.; Kochanek, M.; Szydło, D.; Baran, J.; Kazienko, P. ChatGPT: Jack of all trades, master of none. Inf. Fusion 2023, 99, 101861. [Google Scholar] [CrossRef]
  21. Vujinović, A.; Luburić, N.; Slivka, J.; Kovačević, A. Using ChatGPT to annotate a dataset: A case study in intelligent tutoring systems. Mach. Learn. Appl. 2024, 16, 100557. [Google Scholar] [CrossRef]
  22. Pandey, C.; Tiwari, V.; Rathore, R.S.; Jhaveri, R.H.; Roy, D.S.; Selvarajan, S. Resource-efficient synthetic data generation for performance evaluation in mobile edge computing over 5G networks. IEEE Open J. Commun. Soc. 2023, 4, 1866–1878. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.