Next Article in Journal
A Novel Two-Stage Power Conversion Method Suitable for 1MHz-LDC of Electric Vehicles
Previous Article in Journal
LSE-Net: Integrated Segmentation and Ensemble Deep Learning for Enhanced Lung Disease Classification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

DRL-Based Fast Joint Mapping Approach for SFC Deployment

School of Information and Communications Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(12), 2408; https://doi.org/10.3390/electronics14122408
Submission received: 9 May 2025 / Revised: 5 June 2025 / Accepted: 11 June 2025 / Published: 12 June 2025

Abstract

The rapid development of Network Function Virtualization (NFV) enables network operators to deliver customized end-to-end services through Service Function Chains (SFCs). However, existing two-stage deployment strategies fail to jointly optimize the placement of Virtual Network Functions (VNFs) and the routing of service traffic, resulting in inefficient resource utilization and increased service latency. This study addresses the challenge of maximizing the acceptance rate of service requests under resource constraints and latency requirements. We propose DRL-FJM, a novel dynamic SFC joint mapping orchestration algorithm based on Deep Reinforcement Learning (DRL). By holistically evaluating network resource states, the algorithm jointly optimizes node and link mapping schemes to effectively tackle the dual challenges of resource limitations and latency constraints in long-term SFC orchestration scenarios. Simulation results demonstrate that compared with existing methods, DRL-FJM improves total traffic served by up to 42.6%, node resource utilization by 17.3%, and link resource utilization by 26.6%, while achieving nearly 100% SFC deployment success. Moreover, our analysis reveals that the proposed algorithm demonstrates strong adaptability and robustness under diverse network conditions.

1. Introduction

The rapid advancement of information technology has presented significant challenges to traditional network architectures, largely due to the growing complexity and diversity of network communications. Network Function Virtualization (NFV) has emerged as a promising solution to address these challenges [1]. NFV enables the virtualization of network functions, allowing them to operate on general-purpose servers instead of relying on dedicated hardware. In the NFV paradigm, network operators often utilize Service Function Chains (SFCs) to deliver personalized network services to users [2]. An SFC consists of a series of Virtual Network Functions (VNFs) arranged in a specific sequence [3], providing flexibility in adding or removing VNFs as needed. VNFs are deployed in virtual machines hosted on physical servers, enabling efficient and flexible resource utilization, thereby maximizing the benefits for service providers.
The orchestration of SFCs has been a major focus of research, with numerous studies aiming to optimize network service deployment to minimize operational costs or reduce energy consumption [4]. However, the emergence of new applications, such as autonomous driving, AR/VR, and large-scale IoT, has significantly increased the demand for lower latency and higher quality network services. End-to-end latency constraints have thus become a critical factor in determining the quality of SFCs, presenting new challenges for their deployment.
Specifically, SFC deployment can be divided into two parts: VNF placement and traffic routing path selection. Current research typically addresses these two aspects sequentially, resulting in two distinct approaches, namely node mapping and link mapping. In the node mapping approach, optimal nodes are selected for VNF placement, after which the VNFs are connected to form the SFC, while this method effectively utilizes computational resources across the network, it often disregards latency, which may result in selecting longer routing paths that fail to meet latency requirements and lead to bandwidth wastage. Alternatively, the link mapping approach assigns a candidate path for traffic flow first, then deploys VNFs along that path. Although this method can achieve shorter routing paths, it may suffer from insufficient computational resources to fulfill the SFC requirements.
Since these approaches solve one aspect of the problem independently and subsequently use the result to address the other, they lack an integrated consideration of the multiple resources, elements, and constraints involved, leading to suboptimal performance in complex network environments. Thus, there is a need for a more comprehensive solution to this complex problem.
Fortunately, DRL has emerged as a powerful technique for solving complex decision-making problems, enabling efficient VNF embedding and traffic routing in dynamic networks. However, solving the SFC deployment problem in a single step remains challenging due to the massive combined action space, making agent training extremely difficult.
In this paper, we propose a Deep Reinforcement Learning-based Fast Joint Mapping (DRL-FJM) approach for SFC deployment, designed to improve the acceptance rate of SFCs while meeting latency constraints. First, we introduce a Comprehensive Resource-Aware Deployment and Routing (CRADR) method, which considers the overall impact of each SFC deployment plan to directly select an appropriate solution. We then propose a Proximal Policy Optimization (PPO)-based algorithm that leverages DRL’s ability to dynamically perceive network resources to select a long-term optimal SFC deployment plan. Our main contributions are as follows:
1.
We formulate the latency-constrained SFC deployment problem as a multi-constraint joint node-link mapping task, and model it as a long-term Markov Decision Process (MDP) with the objective of maximizing the SFC acceptance rate under resource and delay constraints;
2.
We propose DRL-FJM, a DRL-based Fast Joint Mapping Approach for SFC Deployment. We first design an evaluation metric for SFC deployment schemes to quantify the impact of network resource potential and joint mapping strategies. By directly searching this metric within the joint solution space of node and link mappings, we obtain a set of feasible end-to-end SFC deployment schemes. On this basis, we propose a method based on Proximal Policy Optimization (PPO) to dynamically capture network resource states and evaluate the long-term effects of deployment decisions, thereby selecting the optimal scheme. This joint design integrates model-driven candidate generation with learning-based global policy optimization;
3.
We conduct extensive experiments comparing DRL-FJM with existing DRL-based node mapping, link mapping algorithms. The results demonstrate that DRL-FJM achieves a higher acceptance rate under varying workloads.
The remainder of this paper is organized as follows: Section 2 reviews related work. Section 3 describes the system model and problem formulation for SFC deployment. Section 4 presents the DRL-FJM algorithm. Section 5 evaluates and analyzes the performance of DRL-FJM. Finally, Section 6 concludes the paper.

2. Related Work

In recent years, much research has focused on determining the placement of all VNFs before sequentially connecting them. In reference [5], the authors proposed a minimum residual heuristic algorithm to solve the VNF placement problem in multi-cloud environments with deployment cost constraints. In reference [6], the problem was formulated as a Markov Decision Process (MDP) to capture dynamic network state transitions, using an attention-based DRL model to jointly optimize latency and cost. A distributed reinforcement learning-based service chain energy management framework was introduced in [7], which dynamically adjusted VNF placement to reduce latency and improve energy efficiency. In reference [8], a heuristic algorithm based on complex network theory was proposed to minimize total resource usage cost while considering latency-aware and reliable VNF placement. Zeng et al. [9] developed a rule-based DRL algorithm that first generates a fractional solution for VNF deployment and then refines this solution using a rule-based algorithm to meet all constraints. In reference [10], Yue et al. focus on the VNF placement problem and propose an affinity-based heuristic mapping algorithm to achieve higher network throughput and resource utilization. Varasteh et al. [11] present an online heuristic framework that sequentially addresses the SFC deployment problem by first placing VNFs and then performing traffic routing. A Lagrangian relaxation-based aggregated cost heuristic algorithm is employed to solve the delay-constrained minimum-cost shortest-path routing subproblem. In reference [12], Qiu et al. investigate the placement of VNFs and their redundant backups in Fault-Prone Mobile Edge Cloud (FP-MEC) environments, and propose a two-stage online scheme to maximize the accepted request throughput while minimizing the acceptance cost. Fan et al. [13] proposed selecting VNF placement nodes based on global resource capacity probabilities and connecting nodes using the shortest path, with DRL used to select the final deployment plan from the generated set of solutions.
Several studies have adopted an alternative approach by determining the traffic routing path first, followed by choosing VNF placement along that path. In reference [14], due to the complexity of the SFC deployment problem, the authors proposed solving the two subproblems separately. They utilized a pointer network-based Actor–Critic algorithm for the traffic routing problem and then applied the resulting solution to the VNF placement problem. Tian et al. [15] presented a two-stage method in which a graph-based resource aggregation routing approach was used to identify candidate paths under latency constraints, followed by DRL to determine VNF placement. In reference [16], Wang et al. transform the network representation into a virtual layered graph that incorporates NFV processing delay, and then solve the problem using a traditional shortest-path algorithm. In reference [17], Pei et al. propose a novel graph-based routing algorithm to enable differentiated routing for SFC request flows, and design a relative cost metric to balance resource consumption and prevent congestion in the network. Duan et al. [18] focused on optimizing traffic routing paths by proposing an SFC path optimization strategy based on network function placement clustering, which employed a multi-head attention mechanism to optimize various types of SFC layouts.
However, the aforementioned approaches address VNF placement and traffic routing separately rather than jointly, resulting in a decoupled decision-making process. Due to the sequential nature of service chains, making placement decisions first and then routing VNFs via shortest paths can lead to inefficient detours, which increase link resource consumption and violate latency constraints. Conversely, determining traffic routing paths first may result in insufficient computing resources along the selected path to satisfy the VNF deployment requirements. In both cases, the second decision is overly constrained by the first, leading to suboptimal deployment results.
Moreover, these methods typically optimize one component while holding the other fixed, which fails to fully exploit the solution space. In contrast, our proposed DRL-FJM approach jointly considers node and link mapping combinations, evaluates multiple complete SFC deployment candidates, and employs a DRL agent to select the optimal plan by perceiving real-time network resource status. This unified framework enables global resource-awareness and dynamic adaptation to network conditions, overcoming the inherent limitations of decoupled strategies.

3. System Model Furthermore, Problem Formulation

3.1. System Model

As illustrated in Figure 1, our system model consists of NFV infrastructure, VNFs, and a centralized controller. The NFV infrastructure includes multiple servers and switching nodes, where VNFs can be instantiated on servers to provide services to users. Each SFC flow must traverse a series of VNFs in a specified order, from an ingress node to an egress node, with VNF placement and traffic routing paths determined by the controller. The controller has access to network resource states and can compute the end-to-end delay for SFCs. Upon the arrival of an SFC request, the centralized controller computes a mapping plan that satisfies both resource and latency requirements, deploying the SFC in the network. Once the SFC service is completed, its occupied resources are released.

3.2. Physical Network and SFCs

The underlying physical network is typically represented as an undirected graph G S = ( N S , E S ) , consisting of nodes with computing resource capacities denoted as C = { c n | n N S } and edges with bandwidth denoted as B = { b u , v | u , v N S , e u , v E S } , respectively. Here, u , v N S represents the nodes, while e u , v represents the bidirectional links between them.
All user service demands in the network can be represented as a set of Service Function Chains G V , where G i V G V represents the i-th SFC, which contains source and destination nodes as well as a set of ordered network function requirements, represented as a directed graph G i V = ( N i V , E i V ) . N i V = { s r c i , v i , 1 , v i , 2 , , v i , j , , d s t i } represents the set of virtual nodes, where s r c i and d s t i denote the source and destination nodes, respectively. V i v = v i , 1 , v i , 2 , , v i , | V i v | represents the ordered sequence of VNFs, with each type of VNF having its own computational resource requirements φ i , j c p u . Thus, an SFC request can be defined as r e q = { G i V , f i , t i a , t i h , t i d } , where t i a and f i denote the arrival time and traffic demand of the SFC, respectively. t i h represents the SFC’s lifetime, after which the SFC task ends and the physical resources occupied are released. t i d represents the delay requirement, and if the delay exceeds the constraint, the service will be rejected. Therefore, it is crucial to meet resource requirements and effectively plan paths to satisfy the delay constraint.

3.3. The Problem Formulation

The SFC deployment problem can be defined as mapping the directed graph G V representing the set of SFCs onto the undirected graph G S of the underlying network while satisfying service latency and traffic requirements: M : G V ( N V , E V ) G S ( N S , E S ) . A binary variable x i , j n , ι defines the embedding location of the j-th VNF v i , j in the i-th SFC, and another binary variable y i , e represents whether the i-th SFC G i V passes through link e u , v . z i denotes whether the SFC request G i V is accepted, as shown in the following equations.
x i , j n , ι = 1 , if v i , j is embedded on node n 0 , otherwise
y i u , v , ι = 1 , if SFC G i V traverses link e u , v 0 , otherwise
z i = 1 , if SFC G i V is accepted 0 , otherwise .
The mapping process involves two parts: node mapping and link mapping.
For node mapping, we need to find physical nodes that meet the VNF requirements for the SFC request, considering two main constraints:
Node Resource Constraint: The computational resources required by a VNF must not exceed the corresponding available resources of the physical node:
v i , j V i v x i , j n , ι φ i , j c p u ω n c p u , n N S
Node Deployment Constraint: Each VNF can only be deployed on a single physical node:
n N S x i , j n , ι = 1
For link mapping, we need to find physical links between the mapped nodes, considering two main constraints:
Bandwidth Constraint: The total bandwidth mapped to the physical link e u , v should not exceed its remaining bandwidth w u , v b w :
y i u , v , ι f i ω u , v b w , e u , v E S
Path Constraint: If the SFC G i V is accepted, the traversed physical links must be sequentially connected, and splitting is not allowed. We use P i τ , ι = n i , 1 τ , ι , n i , 2 τ , ι , , n i , P i τ , ι τ , ι to denote the selected deployment path for the SFC G i V , and I i , n τ , ι and O i , n τ , ι represent the number of incoming and outgoing flows at each node n, respectively. Thus, for each node n P i τ , ι , we have the following constraint:
O i , n τ , ι I i , n τ , ι = 1 , n = s r c i τ , ι 1 , n = d s t i τ , ι , n P i τ , ι 0 , otherwise
Delay Constraint: the delay experienced by the SFC flow must not exceed its maximum tolerable delay. We primarily consider propagation delay and packet queuing delay, as expressed in the following equation:
d u , v τ = d u , v p + d u , v q , τ
Propagation delay d u , v p is defined as the time required for a message to propagate through the transmission medium between nodes, proportional to the length of the medium between the endpoints. Queuing delay d u , v q , τ refers to the time packets spend waiting in a queue before being processed, which depends on the current link traffic. In this paper, we model queuing delay using an M/M/1 queuing model [19], where ω u , v b w represents the remaining bandwidth of the link e u , v . According to the M/M/1 model, the queuing delay for a link is calculated using the following formula:
d u , v q , τ = b u , v ω u , v b w ω u , v b w d o u , v
We denote G τ as the set of all deployed and active SFCs at time τ , and thus the delay constraint can be expressed as follows:
j = 1 P i τ , ι 1 y i n i , j τ , ι , n i , j + 1 τ , ι , ι d n i , j τ , ι , n i , j + 1 τ , ι τ t i d , τ , ι , G i τ , ι G τ
In this paper, our optimization objective is to maximize the number of successfully deployed SFCs, i.e.,
max G i V G V z i
s.t.(1), (2), (3), (4), (5), (6), (7), (8), (9), (10),
x i , j n , ι { 0 , 1 } , y i u , v , ι { 0 , 1 } , n N S , e u , v E S
The above formulation defines a constrained optimization problem for the online SFC deployment scenario, where the objective is to maximize the number of successfully deployed SFC requests while satisfying computational, bandwidth, and delay constraints. This problem structure provides the theoretical foundation for the DRL-based orchestration method introduced in the next section.

4. Algorithm

In this section, we first propose a Comprehensive Resource-Aware Deployment and Routing Method (CRADR) to directly obtain an SFC deployment plan. We then introduce a DRL-based Fast Joint Mapping Approach for SFC Deployment (DRL-FJM). The overall workflow is illustrated in Figure 2, which will be detailed after introducing the necessary components.

4.1. Comprehensive Resource-Aware Deployment and Routing Method

A complete SFC deployment solution involves both VNF node mapping and traffic routing path mapping. To serve more SFC requests, it is crucial to ensure the efficient utilization of both node computational resources and link bandwidth resources while meeting latency requirements. This necessitates a comprehensive evaluation of both node mapping and link mapping impacts.
To achieve this, we need to ensure that the mapping scheme has adequate computational and bandwidth resources and satisfies latency constraints, while also considering the availability of neighboring resources. Specifically, during the SFC mapping process, the service capacity of a single node or link depends not only on its own capacity but also on the capacity of neighboring nodes or links. Therefore, we use ω u cpu and ω u , v b w to evaluate the resource potential of nodes and links, respectively.
ω u cpu = ω u c p u + v N s χ u , v ω v c p u v N s χ u , v · ( 1 ω ^ c p u )
ω u , v b w = ω u , v b w + n N s χ u , n ω u , n b w n N s χ u , n + n N s χ v , n ω v , n b w n N s χ v , n ( 1 ω ^ b w )
where χ u , n indicates whether nodes u and n are neighbors, ω ^ c p u represents the average remaining resource rate of all nodes in the network, and ω ^ b w represents the average remaining resource rate of all links in the network:
χ u , n = 1 , node u is a neighbor of node n 0 , otherwise
ω ^ c p u = n N S ω n c p u c n | N S |
ω ^ b w = e u , v E s ω u , v b w b u , v | N S |
Next, we propose an SFC Deployment Solution Metric (DSM) to evaluate the overall impact of the deployment plan. The DSM consists of two components: the Node Mapping Metric (NMM) and the Link Mapping Metric (LMM), where NMM assesses the impact of node mapping, and LMM assesses the impact of link mapping. Thus, for a mapping plan M i k = { x i , j n , k , y i u , v , k | v i , j V i v , n N S , e u , v E S } of an SFC G i V , we have:
D S M = ( 1 ω ^ c p u ) · N M M + ( 1 ω ^ b w ) · L M M
N M M = j = 1 | V i p | φ i , j c p u n N S ω n c p u · x i , j n , k m = 1 j 1 n N S φ i , m c p u · x i , j n , k · x i , m n , k
L M M = e u , v E S y i u , v , k · ( 1 ω u , v b w b u , v )
DSM dynamically considers changes in computational and bandwidth resources based on the network resource status. When the selected node for deployment has fewer available computational resources, the value of DSM will be higher. Similarly, a longer path or lower available bandwidth will result in a higher DSM value. Therefore, by directly searching this metric within the joint solution space of node and link mappings, we can select deployment plans with lower DSM values. This enables a more comprehensive evaluation of the mapping impact and allows us to directly determine the joint node and link mapping plan.

4.2. Definition of DRL Environment

The CRADR method can identify an optimal deployment plan based on the DSM metric for the current network environment. However, due to the rapid and complex variations in network resource states, a plan based on a single fixed metric may not yield a globally optimal deployment. To address this challenge, we employ DRL, which offers strong feature-extraction and decision-making capabilities to determine a long-term optimal SFC deployment plan, maximizing resource utilization and SFC acceptance rates over time.
DRL combines reinforcement learning and deep learning, allowing an agent to learn policies in complex, high-dimensional environments. Through interaction with the environment, the DRL agent takes actions and adjusts based on reward signals, aiming to maximize the expected cumulative reward. Deep neural networks act as function approximators in DRL, helping the agent efficiently estimate state values or action values, thus overcoming the limitations of traditional reinforcement learning in large-scale state spaces. We use a Markov Decision Process (MDP) to model the sequential decision-making of SFC deployment, with three main definitions: state, action, and reward, detailed as follows:
State: In DRL, the state represents the information obtained by the agent from the environment. This includes node resource information, link resource information, current delay information of deployed SFCs, and the characteristic information of the deployment plan returned by CRADR.
Node resource information is represented as s i n o d e , where ω u c p u c u represents the remaining resource rate of the node u, and ω u b w b u represents the remaining bandwidth resource rate of neighboring nodes:
s i n o d e = ω u c p u c u ω u b w b u , u N S
ω u b w = v N S χ u , v ω u , v b w
b u = v N S χ u , v b u , v
Link resource information s i l i n k includes the remaining bandwidth resource rate of the link:
s i l i n k = ω u , v b w b u , v , e u , v E S
Delay information s i d e l a y is composed of the maximum delay constraint ratio of non-expired SFCs traversing a given node or link in the current network, indicating the urgency of delay at each node or link:
s i d e l a y = max v i , j V i p x i , j n i d i ( t ) t i d | G i V G V ( t ) max v i , j V i p y i u , v , t d i ( t ) t i d | G i V G V ( t ) , n N S , e u , v E S
Deployment information s i s o l u t i o n s indicates the expected usage of node computational resources and link bandwidth resources by each deployment plan:
s i s o l u t i o n s = v i , j V i v x i , j n , k φ i , j c p u c n y i u , v , k f i b u , v , n N S , e u , v E S , M i k M i
Therefore, the final state information s i can be represented as:
s i = s i n o d e s i l i n k s i d e l a y s i s o l u t i o n s
These elements are chosen to ensure that the agent has a complete and dynamic view of both the current network conditions and the implications of each possible action. This design enables the agent to effectively evaluate trade-offs between resource usage, path selection, and latency compliance, thus aligning with the goal of maximizing SFC acceptance under multiple constraints.
Action: DRL-FJM aims to perceive network load status and SFC deployment information and select an SFC deployment plan among K options. Thus, the decision space is defined as A = ( 0 , 1 , . . . , K 1 ) .
Reward: The DRL agent improves its performance by continually receiving rewards from the environment. The reward function reflects the extent to which the current action meets the objectives and is used to evaluate the action taken. Our goal is to maximize the acceptance of SFC requests, thus the reward is defined as r i = z i , where z i is the deployment result defined in Equation (3). This design directly reflects the optimization objective, as the accumulated reward corresponds to the total number of successful deployments.

4.3. DRL-Based Fast Joint Mapping Approach for SFC Deployment

We use a Proximal Policy Optimization (PPO)-based DRL algorithm to make deployment plan decisions for SFCs [20]. Proximal Policy Optimization is a policy-gradient-based deep reinforcement learning algorithm that is widely used to solve complex decision optimization problems. PPO introduces a clipping mechanism to control the change in policies during updates effectively, ensuring stability and efficiency in policy updates. PPO employs an Actor–Critic architecture, which includes two core neural networks: an Actor network and a Critic network. The Actor network is responsible for generating a probability distribution of actions, while the Critic network estimates the value of the current state.
The Actor network generates a probability distribution π θ ( a t | s t ) of actions a t based on the current state s t . Its primary goal is to update the policy by maximizing a clipped objective function, ensuring that the change between the new and old policies is not too large. The objective function for updating the policy network is defined as:
L C L I P ( θ ) = E t min r t ( θ ) A ^ t , clip r t ( θ ) , 1 ϵ , 1 + ϵ A ^ t
where r t ( θ ) is the probability ratio between the new and old policies, and A ^ t is the advantage function, which measures the superiority of taking a specific action in the current state:
r t ( θ ) = π θ ( a t | s t ) π θ old ( a t | s t )
A ^ t = r t + γ V ϕ ( s t + 1 ) V ϕ ( s t )
PPO introduces clipping to control the magnitude of policy updates, limiting the update ratio r t ( θ ) to the range [ 1 ϵ , 1 + ϵ ] , to prevent instability due to drastic changes in the policy. By maximizing L C L I P ( θ ) , the Actor network gradually improves the policy, converging to a better policy over multiple iterations.
The Critic network estimates the value of the current state s t as V ϕ ( s t ) to assist the Actor network in making better action choices. The Critic network updates its parameters by minimizing the Mean Squared Error (MSE) of value estimation, enhancing the accuracy of state value prediction. Its optimization objective function is:
L V F ( ϕ ) = E t V ϕ ( s t ) R ^ t 2
where R ^ t represents the target return, given by:
R ^ t = r t + γ V ϕ ( s t + 1 )
By minimizing the MSE between the state value V ϕ ( s t ) and the target return R ^ t , the Critic network continuously corrects its value estimation, providing more accurate value references for policy optimization.
The overall optimization goal of PPO is to optimize both the Actor network and the Critic network while incorporating an entropy regularization term to enhance exploration and prevent the policy from prematurely converging to a local optimum. The overall loss function of PPO can be expressed as:
L P P O ( θ , ϕ ) = L C L I P ( θ ) c 1 L V F ( ϕ ) + c 2 H ( π θ )
where H ( π θ ) is the entropy regularization term, used to increase the randomness of the policy, defined as:
H ( π θ ) = a π θ ( a | s ) log π θ ( a | s )
In the above equation, c 1 and c 2 are important coefficients that adjust the value loss and the entropy regularization term. By optimizing this overall loss function, PPO adjusts both the Actor and Critic networks simultaneously, ensuring that policy optimization and value estimation improve in tandem, striking a balance between exploration and exploitation.
Based on the above design, we propose the DRL-FJM algorithm (as shown in Algorithm 1). Figure 2 illustrates its workflow. First, at the beginning of each time step, we release resources occupied by expired SFCs in the network. Then, we use CRADR to generate a set of deployment plans M i = M i 1 , M i 2 , , M i K (steps 7–8 in Algorithm 1), obtaining deployment plans based on network resource information and delay information. Next, the deployment plan is executed, obtaining a new state and immediate reward, and recording the transition.
During the network parameter update phase, we jointly optimize the networks through gradient ascent for the Actor network and gradient descent for the Critic network (steps 16–19 in Algorithm 1), simultaneously improving the policy performance and the accuracy of value prediction in each epoch.
The computational complexity of DRL-FJM is primarily determined by two components: the candidate deployment generation and the PPO-based decision step. The CRADR module generates K candidate deployment schemes in polynomial time relative to the size of the substrate network. The PPO model then selects the final deployment from these K candidates with a forward pass through the neural network. Therefore, the approach remains efficient and applicable in real-time deployment scenarios.
Algorithm 1 DRL-FJM Algorithm
Require: 
Learning rate α , remaining bandwidth and computational resources, SFC requests, maximum steps per epoch
Ensure: 
Actor network parameters θ , Critic network parameters ϕ
  1:
Initialize policy network θ and value network ϕ
  2:
Initialize an empty replay buffer D
  3:
for episode = 1 to MaxEpisodes do
  4:
    Reset environment
  5:
    for  t = 1 to MaxTimeSteps do
  6:
        Release expired SFCs and receive request req i
  7:
        Use CRADR to generate candidate deployment plans M i
  8:
        Obtain resource state and compute current state s t
  9:
        Input s t to policy network to obtain π θ ( a t | s t )
10:
        Sample action a t π θ ( a t | s t )
11:
        Execute action a t and deploy in simulator
12:
        Observe r t , s t + 1 , and completion flag
13:
        Store transition ( s t , a t , r t , s t + 1 ) in D
14:
        if  t mod K = 0  then
15:
           Compute A ^ t and R ^ t for each transition in D
16:
           Update parameters:
17:
               θ θ + α A θ L P P O ( θ , ϕ )
18:
               ϕ ϕ + α C ϕ L P P O ( θ , ϕ )
19:
        end if
20:
        Update old policy θ old θ
21:
        Clear buffer D
22:
    end for
23:
end for

5. Performance Evaluation

In this section, we evaluate the performance of our proposed DRL-FJM algorithm. All simulations were conducted on a computer equipped with a 12th Gen Intel® Core™ i9-12900H CPU @ 2.50 GHz and an NVIDIA GeForce RTX 3070 GPU.

5.1. Simulation Environment

The simulation experiments were conducted using SFCSim [21], an open-source service function chain orchestration simulation tool, which is built on the NetworkX network simulation library. The simulation employed a cellular network topology, as shown in Figure 3. The cellular network contains 19 nodes and 42 links, with each node capable of establishing links with its neighboring nodes. Each network node is accompanied by an edge computing node with 80 CPU cores. The link bandwidth between nodes is 1 Gbps, the propagation delay is 0.4 ms, and the transmission delay for a single packet is 0.2 ms.
The network provides eight types of VNFs. The CPU resource coefficient for processing each Mbps of traffic by each VNF is uniformly distributed between 0.02 and 0.04 cores. Additionally, we assume that SFC requests arrive sequentially following a Poisson process with an average arrival rate λ = 2 . The source and destination nodes of user service chains are randomly generated in the network topology, and the traffic demand follows a uniform distribution U (40 MBps, 50 MBps). The number of VNFs that the traffic needs to pass through ranges between 2 and 5.
Furthermore, DRL is implemented using PyTorch 2.2.1, and the relevant parameter settings are shown in Table 1. The Actor and Critic networks are built using three fully connected layers, with ReLU activation functions. The learning rates for the Actor and Critic networks were selected empirically. A higher learning rate was used for the Actor network to facilitate efficient policy exploration, while a lower rate for the Critic network ensured stable and accurate value estimation. This configuration aligns with common settings in Actor–Critic reinforcement learning methods. In testing phase, we evaluate the performance using 2000 SFC requests.

5.2. Compared Algorithms

To validate the effectiveness of DRL-FJM, we compare it with the following four baselines:
  • PPO-Deployment: A node mapping algorithm that uses a PPO-based DRL model to select suitable VNF deployment nodes and then connects the VNFs using the shortest path.
  • PPO-Routing: A link mapping scheme that uses a PPO-based DRL model to select appropriate traffic routing paths, followed by selecting nodes with the highest remaining computational resources along the path to deploy VNFs.
  • CRADR-Random: Randomly selects one deployment plan from K generated by the CRADR algorithm.
  • CRADR-Greedy: Selects the optimal deployment plan computed by the CRADR algorithm for deployment.

5.3. Performance Comparison

As shown in Figure 4, we evaluate the performance of the DRL-FJM algorithm under balanced network resource conditions. Figure 4a shows the acceptance ratio of SFC requests as they arrive sequentially. It can be observed that, when the number of SFC arrivals reaches 2000, CRADR-Random and CRADR-Greedy outperform the DRL-based node and link mapping algorithms, indicating that the CRADR approach effectively considers both computational and bandwidth resources, enabling a more reasonable end-to-end SFC deployment solution. Furthermore, the DRL-FJM algorithm dynamically perceives resource and latency conditions in the network, selecting a more optimal SFC deployment solution, achieving an acceptance rate close to 100%, which is superior to CRADR-Greedy (89%) and CRADR-Random (85%). This high acceptance ratio is not an isolated result. As demonstrated in Figure 5, Figure 6, Figure 7 and Figure 8, DRL-FJM consistently demonstrates superior adaptability and resource awareness under varying constraints and traffic patterns. These results jointly confirm that the excellent acceptance performance observed in Figure 4 reflects a robust and generalizable policy, rather than case-specific overfitting. This demonstrates the effectiveness of the neural network structure and reinforcement learning’s feature extraction and decision-making capabilities in DRL-FJM for improving deployment outcomes.
Figure 4b shows the total traffic served by different algorithms over time slots. It can be seen that DRL-FJM can serve more SFC requests simultaneously under the same network resources, with an average service traffic 42.6% higher than that of the PPO-Deployment algorithm and 30.7% higher than that of the PPO-Routing algorithm.
Figure 4c,d show the average utilization of node computational resources and link bandwidth resources, respectively, as SFC requests continue to arrive. As can be seen, while the PPO-Deployment algorithm takes node computing capabilities into account, it can cause detours, resulting in bandwidth wastage. The PPO-Routing algorithm, while effectively considering path length, leads to rapid depletion of resources on some nodes and links, resulting in lower overall resource utilization. Our DRL-FJM algorithm comprehensively perceives global network resources to intelligently decide deployment plans, ensuring efficient utilization of both node and bandwidth resources.
Figure 5 and Figure 6 evaluate the performance of the algorithms under limited computing resources and limited bandwidth scenarios, respectively. In Figure 5, when computing resources are scarce and bandwidth resources are abundant, the deployment success rates of CRADR-Random, CRADR-Greedy, and DRL-FJM are 3.7%, 7.4%, and 10.5% higher, respectively, compared to the node and link mapping algorithms. Regarding resource utilization, PPO-Routing and PPO-Deployment have similar acceptance rates but use less bandwidth, demonstrating some advantage in link resource utilization. DRL-FJM selects paths that better satisfy computational resource requirements, improving SFC acceptance rates under scarce computing resources. In Figure 6, when computing resources are sufficient but bandwidth resources are limited, node-first mapping schemes perform poorly. DRL-FJM achieves higher overall resource utilization and SFC acceptance rates.
We also examine the impact of different average SFC arrival rates and flow durations, as shown in Figure 7 and Figure 8. As the arrival rate increases, the SFC acceptance ratio gradually decreases. It is noteworthy that increasing the average SFC arrival rate and average flow duration has a similar effect on the acceptance ratio. This is because, assuming sufficient computing and bandwidth resources, Little’s Law indicates that the expected number of SFCs present in the network at each time slot is proportional to the product of the average arrival rate and average flow duration. Thus, increasing these parameters results in a greater number of pending SFCs, leading to resource shortages and a decrease in new SFC acceptance. Our DRL-FJM algorithm outperforms other algorithms under different SFC arrival rates and flow durations, demonstrating superior performance and robustness.

6. Conclusions

This paper focuses on the challenges of SFC online deployment, proposing a deep reinforcement learning-based algorithm, called DRL-FJM. The algorithm is designed to maximize the acceptance rate of SFCs while satisfying network resource and latency constraints. First, it searches the evaluation metric for SFC deployment schemes within the joint solution space of node and link mappings to generate multiple end-to-end joint deployment candidates. Based on this, a PPO-based DRL model is employed to select the final deployment plan. By holistically evaluating the network resource state, the algorithm jointly optimizes node and link mappings, effectively capturing the multi-resource and multi-constraint characteristics of the network and enabling more efficient intelligent deployment. Extensive experiments demonstrate that the proposed DRL-FJM algorithm significantly outperforms existing methods, achieving up to 42.6% higher total network service traffic, 17.3% higher node resource utilization, and 26.6% higher link bandwidth utilization. It also reaches a near-100% SFC deployment success rate and demonstrates strong adaptability and robustness across different network resource environments.
For future work, we plan to further assess the algorithm’s performance in ultra-large-scale networks and enhance its speed and efficiency by employing deep reinforcement learning to prune the joint mapping space. Our objective is to achieve faster and more effective deployment solutions. Additionally, we aim to explore dynamic SFC migration and readjustment strategies, thereby extending the applicability of the proposed method to broader business scenarios.

Author Contributions

Conceptualization, Y.W., H.H. and Z.Z.; Formal analysis, Y.W. and H.H.; Writing—review and editing, Y.W.; Supervision, H.H. and Z.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 61821001.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
NFVNetwork function virtualization
SFCService function chain
VNFVirtual network function
DRLDeep reinforcement learning

References

  1. Qi, D.; Shen, S.; Wang, G. Virtualized Network Function Consolidation Based on Multiple Status Characteristics. IEEE Access 2019, 7, 59665–59679. [Google Scholar] [CrossRef]
  2. Bhamare, D.; Jain, R.; Samaka, M.; Erbad, A. A Survey on Service Function Chaining. J. Netw. Comput. Appl. 2016, 75, 138–155. [Google Scholar] [CrossRef]
  3. Liu, Y.; Lu, H.; Li, X.; Zhao, D. An Approach for Service Function Chain Reconfiguration in Network Function Virtualization Architectures. IEEE Access 2019, 7, 147224–147237. [Google Scholar] [CrossRef]
  4. Shakarami, A.; Shakarami, H.; Ghobaei-Arani, M.; Nikougoftar, E.; Faraji-Mehmandar, M. Resource Provisioning in Edge/Fog Computing: A Comprehensive and Systematic Review. J. Syst. Archit. 2022, 122, 102362. [Google Scholar] [CrossRef]
  5. Bhamare, D.; Jain, R.; Samaka, M.; Vaszkun, G.; Erbad, A. Multi-cloud Distribution of Virtual Functions and Dynamic Service Deployment: Open ADN Perspective. In Proceedings of the 2015 IEEE International Conference on Cloud Engineering, Washington, DC, USA, 9–13 March 2015; pp. 299–304. [Google Scholar]
  6. He, N.; Yang, S.; Li, F.; Trajanovski, S.; Kuipers, F.A.; Fu, X. A-DDPG: Attention Mechanism-Based Deep Reinforcement Learning for NFV. In Proceedings of the 2021 IEEE/ACM 29th International Symposium on Quality of Service (IWQOS), Tokyo, Japan, 25–28 June 2021; pp. 1–10. [Google Scholar]
  7. Dalgkitsis, A.; Garrido, L.A.; Rezazadeh, F.; Chergui, H.; Ramantas, K.; Vardakas, J.S. SCHE2MA: Scalable, Energy-Aware, Multidomain Orchestration for Beyond-5G URLLC Services. IEEE Trans. Intell. Transp. Syst. 2022, 24, 7653–7663. [Google Scholar] [CrossRef]
  8. Thiruvasagam, P.K.; Chakraborty, A.; Mathew, A.; Murthy, C.S.R. Reliable Placement of Service Function Chains and Virtual Monitoring Functions With Minimal Cost in Softwarized 5G Networks. IEEE Trans. Netw. Serv. Manag. 2021, 18, 1491–1507. [Google Scholar] [CrossRef]
  9. Zeng, Y.; Qu, Z.; Guo, S.; Tang, B.; Ye, B.; Li, J. RuleDRL: Reliability-Aware SFC Provisioning with Bounded Approximations in Dynamic Environments. IEEE Trans. Serv. Comput. 2023, 16, 3651–3664. [Google Scholar] [CrossRef]
  10. Yue, Y.; Cheng, B.; Wang, M.; Li, B.; Liu, X.; Chen, J. Throughput Optimization and Delay Guarantee VNF Placement for Mapping SFC Requests in NFV-Enabled Networks. IEEE Trans. Netw. Serv. Manag. 2021, 18, 4247–4262. [Google Scholar] [CrossRef]
  11. Varasteh, A.; Madiwalar, B.; Van Bemten, A.; Kellerer, W.; Mas–Machuca, C. Holu: Power-Aware and Delay-Constrained VNF Placement and Chaining. IEEE Trans. Netw. Serv. Manag. 2021, 18, 1524–1539. [Google Scholar] [CrossRef]
  12. Qiu, Y.; Liang, J.; Leung, V.C.M.; Wu, X.; Deng, X. Online Reliability-Enhanced Virtual Network Services Provisioning in Fault-Prone Mobile Edge Cloud. IEEE Trans. Wirel. Commun. 2022, 21, 7299–7313. [Google Scholar] [CrossRef]
  13. Fan, Q.; Pan, P.; Li, X.; Wang, S.; Li, J.; Wen, J. DRL-D: Revenue-Aware Online Service Function Chain Deployment via Deep Reinforcement Learning. IEEE Trans. Netw. Serv. Manag. 2022, 19, 4531–4545. [Google Scholar] [CrossRef]
  14. He, N.; Yang, S.; Li, F.; Trajanovski, S.; Zhu, L.; Wang, Y.; Fu, X. Leveraging Deep Reinforcement Learning with Attention Mechanism for Virtual Network Function Placement and Routing. IEEE Trans. Parallel Distrib. Syst. 2023, 34, 1186–1201. [Google Scholar] [CrossRef]
  15. Tian, A.; Feng, B.; Huang, Y.; Zhou, H.; Yu, S.; Zhang, H. DRL-Based Two-Stage SFC Deployment Approach Under Latency Constraints. In Proceedings of the IEEE INFOCOM 2024-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Vancouver, BC, Canada, 20–23 May 2024; pp. 1–6. [Google Scholar]
  16. Wang, Y.; Huang, C.K.; Shen, S.H.; Chiu, G.M. Adaptive Placement and Routing for Service Function Chains with Service Deadlines. IEEE Trans. Netw. Serv. Manag. 2021, 18, 3021–3036. [Google Scholar] [CrossRef]
  17. Pei, J.; Hong, P.; Xue, K.; Li, D. Resource Aware Routing for Service Function Chains in SDN and NFV-Enabled Network. IEEE Trans. Serv. Comput. 2018, 14, 985–997. [Google Scholar] [CrossRef]
  18. Duan, Y.; Lu, Y.; Shen, S.; Yu, S.; Zhang, P.; Zhang, W.; Igorevich, K.K. NFLCS: An Service Function Chain Path Optimization Strategy Based on Network-Functional Layout Clustering. IEEE Trans. Veh. Technol. 2023, 72, 10813–10825. [Google Scholar] [CrossRef]
  19. Pei, J.; Hong, P.; Xue, K.; Li, D. Efficiently Embedding Service Function Chains With Dynamic Virtual Network Function Placement in Geo-Distributed Cloud System. IEEE Trans. Parallel Distrib. Syst. 2018, 30, 2179–2192. [Google Scholar] [CrossRef]
  20. Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
  21. Xu, L.; Hu, H.; Liu, Y. SFCSim: A Network Function Virtualization Resource Allocation Simulation Platform. Clust. Comput. 2023, 26, 423–436. [Google Scholar] [CrossRef]
Figure 1. System framework of SFC deployment.
Figure 1. System framework of SFC deployment.
Electronics 14 02408 g001
Figure 2. The framework of DRL-FJM.
Figure 2. The framework of DRL-FJM.
Electronics 14 02408 g002
Figure 3. Schematic of the cellular network.
Figure 3. Schematic of the cellular network.
Electronics 14 02408 g003
Figure 4. Performance comparison under balanced workload conditions: (a) Acceptance Ratio; (b) Total Traffic Served; (c) Node Utilization; (d) Edge Utilization.
Figure 4. Performance comparison under balanced workload conditions: (a) Acceptance Ratio; (b) Total Traffic Served; (c) Node Utilization; (d) Edge Utilization.
Electronics 14 02408 g004
Figure 5. Performance comparison of algorithms under sufficient bandwidth and limited computing resources: (a) Acceptance Ratio; (b) Total Traffic Served; (c) Node Utilization.
Figure 5. Performance comparison of algorithms under sufficient bandwidth and limited computing resources: (a) Acceptance Ratio; (b) Total Traffic Served; (c) Node Utilization.
Electronics 14 02408 g005
Figure 6. Performance comparison of algorithms under sufficient computing resources and limited bandwidth: (a) Acceptance Ratio; (b) Total Traffic Served; (c) Node Utilization.
Figure 6. Performance comparison of algorithms under sufficient computing resources and limited bandwidth: (a) Acceptance Ratio; (b) Total Traffic Served; (c) Node Utilization.
Electronics 14 02408 g006
Figure 7. Comparison of Acceptance Ratios Across Different SFC Average Arrival Rates.
Figure 7. Comparison of Acceptance Ratios Across Different SFC Average Arrival Rates.
Electronics 14 02408 g007
Figure 8. Comparison of Acceptance Ratios Across Different Average Flow Durations.
Figure 8. Comparison of Acceptance Ratios Across Different Average Flow Durations.
Electronics 14 02408 g008
Table 1. Parameters of the DRL model.
Table 1. Parameters of the DRL model.
ParameterValueDescription
α A 1 × 10 4 The learning rate of θ
α C 3 × 10 4 The learning rate of ϕ
γ 0.995 The discount factor
c 1 0.5 The value function coefficient
c 2 0.01 The entropy coefficient
U256The number of units in fully-connected layers
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wu, Y.; Hu, H.; Zhang, Z. DRL-Based Fast Joint Mapping Approach for SFC Deployment. Electronics 2025, 14, 2408. https://doi.org/10.3390/electronics14122408

AMA Style

Wu Y, Hu H, Zhang Z. DRL-Based Fast Joint Mapping Approach for SFC Deployment. Electronics. 2025; 14(12):2408. https://doi.org/10.3390/electronics14122408

Chicago/Turabian Style

Wu, You, Hefei Hu, and Ziyi Zhang. 2025. "DRL-Based Fast Joint Mapping Approach for SFC Deployment" Electronics 14, no. 12: 2408. https://doi.org/10.3390/electronics14122408

APA Style

Wu, Y., Hu, H., & Zhang, Z. (2025). DRL-Based Fast Joint Mapping Approach for SFC Deployment. Electronics, 14(12), 2408. https://doi.org/10.3390/electronics14122408

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop