2. Related Work
Recently, with the advancement of virtualization technology, many researches have focused on optimal VNF placement and traffic routing, which are typical resource management problems. We divide such technology into traditional approaches and intelligent algorithms utilizing artificial intelligence.
The first class of techniques can be further divided into those that use mathematical optimization and those that use heuristics. Most research models the optimization problem mathematically according to different optimization objectives and resource constraints. Deterministic methods such as binary integer programming (BIP) [
10,
20,
21] or mixed ILP (MILP) [
22] are generally used. However, since the dynamic SFC placement is an NP-hard problem, the optimal solutions of these methods are not obtained efficiently on a large network scale.
Some research has considered different approximate strategies to reduce computation time [
9,
12,
22,
23]. For example, in order to meet the demand for low-latency services, the joint service placement and request routing problem in multicell MEC networks, aiming to minimize the load of the centralized cloud, was formulated by Poularakis et al. [
23]. However, they did not consider load balancing between the cellular BSs. Similarly, the problem of placing VNFs on edge and public clouds and routing the traffic among adjacent VNF pairs was investigated by Yang et al. [
12]. They defined this problem as the delay-aware VNF placement and routing (DVPR) problem. They also proved that the DVPR problem is NP-hard and presented a randomized rounding approximation algorithm to solve it in polynomial time. However, they did not consider the resource consumption cost when instantiating VNFs. Considering the SFC ordering constraints of all the network flows, Tomassilli et al. [
9] addressed the problem of SFC placement using an efficient algorithm based on linear programming (LP) rounding, which achieved a logarithmic approximation factor.
As mathematical programming approaches have a high time complexity, heuristics have been employed to address the challenges associated with VNF placement and traffic routing problems in the majority of research papers [
2,
10,
13,
24]. Such methods are generally effective and achieve good performance in terms of execution time. For example, Liu et al. [
2] accounted for multiple resources and QoS constraints in NFV-enabled SDNs to achieve SFC request flow (SR) routing. A VNF-split multistage edge-weight graph and a cost model considering the resources of nodes and links were innovatively constructed by the researchers. In [
24], the impact of user association on SFC placement was investigated by Behravesh et al., who proposed scalable heuristics to efficiently find a near-optimal solution within polynomial time. A heuristic algorithm was devised by the authors of [
10] to formulate the problem of network throughput maximization, considering both horizontal and vertical scaling. Nevertheless, the delay of each link was negligible in their work.
Instead of studying VNF placement and traffic routing at the same time, some works resort to a two-stage method [
10,
13,
25], wherein VNFs are placed first, then links are established. In [
13], in order to make a tradeoff between the resource consumption of servers and links, the authors proposed a two-stage VNF placement scheme where a constrained depth-first search algorithm (CDFSA) is employed to find all feasible paths between the source and the destination and a path-based greedy algorithm to sequentially assign VNFs with minimum resource consumption. In [
25], the authors proposed a new concept, hybrid SFC, and used a heuristic algorithm to solve the dynamical SFC embedding. Unfortunately, as they adopted the shortest path or a greedy algorithm, it is difficult to guarantee the load balancing of the network.
In recent years, some researchers have applied machine learning to solve multiobjective optimization problems [
18,
19], such as computing offloading [
26], traffic routing, and SFC orchestration. For example, Pei et al. [
27] used two types of deep belief network (DBN) to achieve the selection of VNFs and chain them. For each type of SFC request, they trained a DBN. However, this method requires a large number of labeled data to train the DBN, and the samples used for training a DBN are generally difficult to obtain. To cope with this challenge, in [
7,
28], the authors proposed DRL-based approaches to solve SFC orchestration. However, the authors of [
7] did not consider the resource consumption cost and [
28] did not optimize the traffic acceptance rate. Quang et al. [
29] and Fu et al. [
30] studied the problem of VNF forwarding graph (VNF-FG) embedding and proposed different DRL algorithms. Fu et al. [
30] used DDQN to solve the problem of VNF placement and the shortest path first protocol to achieve traffic routing. However, their method considers neither resource usage on the link and node or network load balancing. Some research only considers VNF placement, without addressing traffic routing [
8,
31,
32]. For example, a serialization and backtracking method was proposed by Xiao et al. [
32] to address the challenge of handling large discrete action space for the placement of SFCs.
Although DRL can quickly make optimal decisions for the current task and scenario, each time the IoT-MEC environment changes, many DRL algorithms suffer from low sample efficiency and slow convergence issues, i.e., when confronted with a new learning environment. Such DRL-based algorithms require a long retraining of the model. To address this challenge, meta learning has attracted the attention of many researches and is widely used in task offloading and caching problem [
33,
34,
35,
36,
37,
38]. A general framework was proposed by He et al. [
33] to combine meta learning with hierarchical reinforcement learning, enabling rapid adaptive resource allocation for dynamic vehicular networks. To improve scheduling robustness, Liu et al. [
34] design a meta-gradient reinforcement learning algorithm for time-critical task scheduling. Refs. [
36,
38] solved the task offloading problem using dynamic MEC environment-based deep meta learning and multiple parallel DNNs; the latter considered mobile cloud computing, whereas the former did not. The offloading decision process was transformed into a sequence prediction process by Jin et al. [
37], who proposed a custom seq2seq neural network to model the offloading policy. However, few works have applied meta-RL to the SFC placement problem in an IoT-MEC environment. Inspired by the literature mentioned above, we consider the dynamics of MEC scenarios and apply meta-RL to the dynamic SFC placement problem.
3. System Model and Problem Formulation
In this section, we begin with the system model, then provide the definition of the dynamic SFC placement problem. Finally, we formulate the problem in detail. The major symbols and variables used in this paper are listed in
Table 1.
3.1. Physical Network
In
Figure 1,
, an undirected graph is utilized to represent IoT-MEC networks, where
represents the set of network nodes and comprises cloudlets (
) and access point (AP) nodes, while
denotes the set of edges. Two nodes, denoted as
u and
v (
), are connected by link
(
). Due to constrained computational resources in cloudlets, a predetermined quantity of VNFs may be instantiated exclusively within the VMs operating within the cloudlets. The set of all VNF instances is denoted by
M in our study. Additionally,
is utilized to indicate the set type of VNFs. Each VNF type is indicated by the variable
p (
), including FW, DPI, load balance, HTTP, etc. The blue line is an example of SFC; when the IoT terminal makes a request, it requires traverse an ordered chain, i.e., SFC, which include a source node; VNFs; and destination node. In the network, APs are in charge of transforming the IoT-SRs to cloudlets, and an SDN controller is used to receive IoT-SRs from APs, i.e., the request from IoT terminals, and placing the SFC.
Figure 1 is an example of the network.
We use and to denote the CPU capability and process delay of cloudlet c (). The bandwidth capacity, packet loss rate, and delay of link () are represented by , , and , respectively. For each cloudlet (), is used to indicate the count of VNF type p instantiated in c. Furthermore, the availability ratio of CPU for c and the availability ratio of bandwidth for are represented as and , respectively.
3.2. SFC Requests
The set of IoT-SRs is denoted by R. The IoT-SR is denoted by IoT-SR, corresponding to a 7-tuple . In the 7-tuple, and represent the source node and the destination node, respectively. is the set of VNFs requested by IoT-SR in which represent the VNF requirement, and , is the length of IoT-SR, i.e., the total number of VNF instances of IoT-SR. represent the CPU consumption of VNF p, represents the bandwidth consumption of IoT-SR , and and represent the maximum tolerated delay of IoT-SR and the maximum tolerated packet loss rate of IoT-SR, respectively. We use a binary variable () to indicate whether VNF type is deployed on node . Specifically, if there is a VNF instance on the cloudlet node (), equals to 1 and 0 otherwise. Note that we consider the packet loss rate as 0 when a packet traverses across a VNF, since the performance of software and hardware is increasing.
3.3. Problem Description
In IoT-MEC networks, the dynamic SFC placement problem involves looking for or constructing a route path for IoT-SR so that it can minimize the implementation cost based on resource consumption and end-to-end delay of IoT-SR. It is important to emphasize that routing IoT-SRs to cloudlets in close proximity to IoT terminals can effectively reduce end-to-end latency. Since IoT networks change dynamically and IoT-SRs arrive stochastically, the difficulty of dealing with the dynamic SFC placement problem increases, including the placement of VNFs and selection of a traffic path. Based on NFV technology, VNFs can be flexibly and efficiently realized on VMs in the cloudlets. There are usually multiple VNF instances for the same type of VNF in different cloudlets in the network (G), leading to several combinations of candidate VNF instances with different performance for each IoT-SR. To meet the end-to-end delay requirement and efficiently utilize resources, we decompose the dynamic SFC placement problem into two parts: (i) determine the placement of VNFs and (ii) select the optimal path for each adjacent VNF instance pair.
3.4. ILP Model
In this subsection, we formulate the dynamic SFC placement problem as an ILP model.
3.4.1. Resource Constraints
We ensure sufficient cloudlet resources to host VNFs and that the bandwidth of links are sufficient to route traffic for all IoT-SRs (
R):
where
indicates whether link
is traversed by IoT-SR
and equals 1 if the traffic of IoT-SR
traverses physical link
and 0 otherwise.
For each cloudlet, we ensure the following resource constraint:
where
is the CPU consumption of VNF type
p.
3.4.2. Delay Constraint
We use
to denote the delay of IoT-SR
i:
where the first part is the processing delay of VNFs type of
p on cloudlets (
c), which is computed in Equation (
5), and the second part is the propagation delay of the links (
).
where
indicates the processing time required by a cloudlet to handle a unit packet (
c) [
39].
If an IoT-SR
is admitted, its total end-to-end delay in a path cannot exceed the maximum tolerated delay. We use a binary variable (
) to show whether IoT-SR
is accepted or not:
3.4.3. Packet Loss Rate Constraint
This consideration is limited to the packet loss rate on the links, excluding VNF processing. It is required that the selected path for IoT-SR
satisfy the constraint on the end-to-end packet loss rate:
where
is the packet loss rate of IoT-SR
.
As the equation presented above is non-linear, we can linearize it by taking a logarithm on both sides.
3.4.4. Other Constraints
The following equation ensures that the route path for IoT-SR
is consecutive and cannot be split:
Equation (
10) ensures the a VNF (
m) only belongs to VNF type
p:
3.4.5. Optimization Objectives
Finally, we optimize two objectives: the service response delay and resource consumption cost of an IoT-SR. We formulate the resource consumption cost, which consists of VNF placement cost, resource consumption cost, and the penalty for rejecting IoT-SR.
Considering the inflation of marginal costs associated with resource usage as the resource load increases, the relationship between resource utilization and cost is characterized by an exponential function in our study. This relationship can be defined as follows [
2]:
In this equation, the constants and are utilized, both of which satisfy the condition . It is worth noting that as the values of and increase, the resource usage cost also increases proportionally.
The placement cost of VNF of type
p is formulated as follows:
The activation cost of VNFs of type p is represented as . The selection of cloudlet c by IoT-SR for placement of the required VNF is indicated by the binary variable , which equals 1 if cloudlet c is the node for VNF placement and 0 otherwise.
In the event that the service request of IoT-SR
is declined, it incurs a penalty denoted as
:
The indicator function () is defined on the basis of set X, that is, if x is true, it equals 1 and and 0 otherwise.
Therefore, the total cost for routing IoT-SR
can be represented as follows:
The dynamic SFC placement problem, which aims to minimize the formulation of the weighted sum of the resource consumption cost and the end-to-end delay of every IoT-SR, is expressed as Equation (
15):
Parameters and are the weighting coefficients that reflect the importance level of minimizing the delay of forwarding and minimizing the resource consumption cost provisioning of IoT-SR, respectively, where + = 1 and , .
Solving the aforementioned optimization problem with a series of constraints is a non-trivial task, as it has been proven to be NP-hard. Traditional approaches such as ILP solvers or heuristic algorithms require extensive iterations and computations to achieve dynamic VNF placement and traffic routing. In order to address this challenge, we employ an intelligent method known for meta reinforcement learning-based SFC placement (MRL-SFCP) to facilitate real-time network transitions and optimize the placement of dynamic SFCs.
4. Dynamic Placement of SFC Based on Meta Reinforcement Learning and Fuzzy Logic
In this section, we first provide an overview of the proposed solution and describe the Markov property of the dynamic IoT-MEC networks. Then, we formulate the problem as an MDP model, which is defined as a triple . Finally, we describe the MRLF-SFCP in stages in detail.
4.1. Overview
As IoT-MEC scenarios are rapidly changing and often affected by many factors and traditional intelligent schemes cannot adapt to changes in the environment and need to retrain the neural network, we utilize meta reinforcement learning to tackle the above challenges. It comprises an outer model and a inner model. The former is used to learn the initial parameters of the inner model. When the environment undergoes changes, such as variations in cloudlet performance or link bandwidth, the parameters of the neural network in the inner model can be adjusted by the outer model. The latter uses DDQN and fuzzy logic to realize an SFC placement decision model by receiving the SFCRs and initial parameters.
Figure 2 shows an illustration of the MRLF-SFCP.
is used to adjust the parameters of the VNF placement neural network.
is used to adjust the parameters of the traffic routing neural network. To achieve optimal SFC placement, first, we use
to acquire the initial parameters of neural network. Then, the optimal combination of VNF instances is acquired using
(DDQN1). Subsequently, a verification is conducted to ascertain the placement status of the necessary VNF instances on the designated cloudlets. If the required VNF instances do not exist, they are dynamically placed onto the corresponding cloudlets. Then, we utilize
to acquire the initial parameters of
(DDQN2). Additionally, to establish connectivity between the source and destination node, as well as the selected cloudlets generated by
(DDQN1), MRLF-SFCP employs fuzzy logic for preliminary link evaluation and utilizes the DDQN algorithm to determine the optimal route. For instance, in
Figure 1, there are four paths that need to be produced in a predefined order, namely, (IoT terminal,
), (
,
), (
,
), and (
, src). We need to perform fuzzy logic and DDQN four times for traffic routing. In particular, the evaluation of link quality, considering factors such as available bandwidth, link and node delays, and packet loss rate, is performed using fuzzy logic. The evaluation result is input to
(DDQN2) for routing selection.
4.2. MDP Model
In a real network scenario, it is difficult to find an optimal routing path in a dynamic network as a result of the changing of the network and the stochastic arrival of IoT-SRs. The next SFC placement is only relevant to the state of the network at the moment. Thus, we can use an MDP model to model the dynamic network.
First, we depict the input of the VNF placing network, i.e., the network state (), which includes the following several elements:
The ratio of the remaining CPU resource in the cloudlets , where is the cardinality of set ;
The CPU resource consumption () of VNF type of IoT-SR;
The requirement for end-to-end delay () of IoT-SR;
Subsequently, a vector can be utilized to characterize the network state
:
The input of SPFN is described as follows:
The ratio of the remaining bandwidth in the links is , where is the number of links in IoT-MEC networks;
Binary codes are used to represent the path in IoT-SR: , where . These codes encode the node location information using binary coding. For instance, when considering a network comprising 60 nodes and assuming the second node as the source node and the fifteenth node as the destination node, their binary codes would be and , respectively. The length of the binary code a is ;
The bandwidth requirement of IoT-SR: ;
The requirement for end-to-end delay in IoT-SR: .
The maximum tolerated packet loss rate of IoT-SR: .
Consequently, a vector formulation can be used to represent the network state
:
Second, the definition of the action space is presented.
The action set () is defined as the collection of all possible combinations of cloudlets. Specifically, in IoT-MEC networks with N cloudlets and an IoT-SR requiring traversal of K types of VNF instances, the task involves selecting three VNF instances among the N cloudlets. Once an action is chosen, we must check whether there is a corresponding VNF instance on the cloudlet; if not, the VNF instance is dynamically placed immediately.
Subsequently, the action is denoted as the pathway employed to establish connectivity between the initial and terminal nodes. Since paths needed to be generated to connect the source and destination nodes and the nodes chosen by the VNF placing network in predefined order, the agent, i.e., (DDQN2), must determine the path iteratively.
Third, the reward function of the VNF placing network is defined as follows:
The selected set of cloudlets, denoted as , holds significance in the formulation of the reward function. In order to maintain algorithm stability, scaling factors and are employed to equalize the magnitudes of the two rewards.
Prior to computing the reward based on the given state and selected action, it is necessary to assess whether the chosen cloudlets possess adequate resources to accommodate VNF placement. If sufficient resources are available, the reward is calculated using Equation (
18); otherwise, the reward is set to
. Thus, the agent would attempt to select the cloudlet with the highest remaining resource ratio to enhance the acceptance rate of IoT-SRs.
Finally, our objective is to jointly optimize the cost of resource usage and end-to-end delay. To achieve this objective, the reward function of the SPFN is formulated as the negative weighted summation of the cost associated with resource consumption and the delay experienced in end-to-end communication. When there is no practicable path for IoT-SR
, the reward of SPFN remains consistently fixed at
and can be formulated as follows:
The path set () represents the collection of paths that establish connections among the initial node, the terminal node, and the chosen cloudlets. To maintain algorithm stability, we introduce weighted factors and to ensure comparable magnitudes between the two rewards. Since is a small value relative to the resource consumption cost on the link, it is necessary to ensure that is greater than to ensure stable training.
4.3. Fuzzy Logic
In classical Boolean logic, “false” is usually denoted by 0, and “true” is denoted by 1; a proposition is either true or false. In fuzzy logic, a proposition is no longer either true or false; it can be thought of as “partially true”. Fuzzy logic uses non-numeric linguistic variables to express facts and further deals with approximate data in a manner similar to human reasoning. It comprises three main procedures (see
Figure 3): first, convert the input values to fuzzy values, i.e., fuzzification. The degree to which the bandwidth of a link is part of the large, medium, and small categories is calculated using the bandwidth membership function. Similarly, the degree to which the delay belongs to the large, medium, and small categories and the degree to which the packet loss rate belongs to the large, medium, and small categories need to be calculated using the corresponding membership function. Then, IF/THEN rules are used to obtain the fuzzy output values. Those fuzzy output values are defined as fire strength (FS) and cannot solve real problems. Therefore, defuzzification can be used to obtain certain numerical values and set a threshold (
) for evaluation of links.
4.4. Outer Model
The general training process of the outer model is depicted in
Figure 4. We can see that the acquisition of initial parameters includes the training of the task DNN and the training of the general DNN. The former entails conducting step-by-step optimization training for each scenario, while the latter involves periodically synchronizing updates on a batch of sampled scenarios. Specifically, we randomly select some IoT-MEC scenarios. For each chosen IoT-MEC scenario, we copy a general DNN named the task DNN and train it with a batch of trajectories randomly sampled from experience replay. In other words, the parameter update of each task DNN is performed based on the globally shared initialization of parameters that are inherited. After the completion of network parameter updates for each trajectory in the batch, the adapted policy’s capability for each IoT-MEC scenario can be evaluated by calculating the loss function of the query set. And the calculated loss functions are used to optimize the general DNN parameters. After completing the iteration over all selected IoT-MEC scenarios, the general DNN parameter is updated to incorporate the learning experience gained from policies associated with known learning scenarios. The algorithmic process of meta learning is shown in Algorithm 1.
Algorithm 1 The general training process of the outer model |
Require: Different IoT-MEC scenarios, iteration times N |
Ensure: Trained general DNN model with better initial parameters which used in inner model |
- 1:
Randomly initialize the parameters of general DNN - 2:
Empty the experience replay - 3:
for i = 1, 2, 3, …, N do - 4:
Sample a batch of IoT-MEC scenarios without repetition - 5:
for each IoT-MEC scenario do - 6:
A copy of the general DNN is created as the task DNN - 7:
Sample K trajectories from experience replay at random - 8:
Compute adapted parameters of task DNN with gradient descent - 9:
Sample mini-batch trajectories from experience replay as query set - 10:
The gradient of the loss function on the query set is evaluated - 11:
end for - 12:
Update the general DNN parameter . - 13:
end for
|
4.5. Inner Model
Figure 5 shows an illustration of the proposed DDQN-based inner model. After we obtain the optimal initial parameters of the inner model, we can use them to train the inner model, which is directed at the new running IoT-MEC scenario. The inner model is used to make the placement and routing decisions.
In this paper, a neural network is utilized to approximate the value function. The distribution of the Q function can be approximated more precisely and achieve an effective SFC placement strategy through continuous training of the DDQN network. The experience replay pool is employed by the DDQN network to store empirical samples from each iteration, and network parameters are updated by randomly extracting data from the experience replay pool. This approach helps to break the correlation between data. As we use the DDQN model to solve the two subproblems, i.e., the VNF placing network and SPFN, we use a similar method to train the neural network for both. First, the parameters of the neural network of the inner model are initialized as the parameters of the trained general network. The agent interacts with the environment to generate trajectories stored in experience memory. Then, we extract a batch of trajectories to train the neural network of the inner model, which aims at achieving fast adaptation in the current IoT-MEC scenario within a few steps. Finally, we can use the trained inner model to make the placement/routing decisions. We present the algorithmic process of meta adapting in Algorithm 2.
Algorithm 2 Meta adaption procedure |
- 1:
Initialize a DQN and a target network with their weights and as . - 2:
Initialize the experience replay with a finite size of M, - 3:
for episode do - 4:
Generate experiences tuple by agent interacts with environment. - 5:
Store the experiences in the replay memory . - 6:
A mini-batch of experiences is sampled, and the parameters of the network are updated using gradient descent. - 7:
end for
|
After training the neural network, we can use the trained model to realize dynamic SFC placement in IoT-MEC networks, which can be broken down into three stages: (1) Acquire the current network state
using the SDN controller and send it to the agent; then, use
to select the optimal combination of VNF instances, i.e., determine the cloudlets for IoT-SRs to traverse. Then, dynamically place VNF instances according to the action with the highest reward. (2) Conduct a preliminary evaluation of each link by considering available bandwidth, delay, and packet loss rate using fuzzy logic. (3) Choose the top
k links in terms of performance and use
to further choose the routing action with the highest reward based on the output of
and the current state (
). The pseudocode of the online running algorithm is depicted in Algorithm 3.
Algorithm 3 Running algorithm of MRLF-SFCP |
Require: VNF placing network: ; |
The network for searching SFC path: ; |
Remaining resource ratios: |
and ; |
IoT-SRs set R. |
Ensure: The routing path of IoT-SR which is denoted as . |
- 1:
whiledo - 2:
Choose an IoT-SR from R which is represented as . - 3:
Initialize routing path . - 4:
The SDN controller is interacted with by the agent to obtain the network status and information, serving for the input for the VNF placing network. - 5:
Get the trained model based on Algorithm 2. - 6:
The output is obtained by applying to , and it is assigned to . - 7:
The action which has the highest Q value among is selected. - 8:
By executing action , the VNF instances are dynamically placed according to its specifications. This action yields a reward and leads to the acquisition of the subsequent status . The obtained sample is then stored within the experience replay buffer . - 9:
Get the trained model based on Algorithm 2. - 10:
The input of the SPFN is formed by obtaining the ingress and egress nodes , where e ranges from 1 to , along with the updated network state. - 11:
for do - 12:
Evaluate links by fuzzy logic and choose the top k links with better performance. - 13:
← Place into for getting the output. - 14:
Record into . - 15:
The new network status is observed by the agent. - 16:
Set . - 17:
end for - 18:
end while
|