Next Article in Journal
In Medio Stat Virtus: Moderate Cognitive Flexibility as a Key to Affective Flexibility Responses in Long-Term HRV
Previous Article in Journal
Performance Analysis and Design Principles of Wireless Mutual Broadcast Using Heterogeneous Transmit Power for Proximity-Aware Services
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Load-Balanced Dynamic SFC Migration Based on Resource Demand Prediction

School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
*
Author to whom correspondence should be addressed.
Sensors 2024, 24(24), 8046; https://doi.org/10.3390/s24248046
Submission received: 16 November 2024 / Revised: 8 December 2024 / Accepted: 16 December 2024 / Published: 17 December 2024
(This article belongs to the Section Sensor Networks)

Abstract

In network function virtualization, the resource demand of network services changes with network traffic. SFC migration has emerged as an effective technique for preserving the quality of service. However, one important problem that has not been addressed in prior studies is how to manage network load while maintaining service-level agreements for time-varying resource demands. Therefore, we propose the Resource Predictive Load Balancing SFC Migration (RP-LBM) algorithm in this paper. The algorithm uses CNN-AT-LSTM to predict VNF resource demands in advance, eliminating the delays associated with dynamic migrations and determining the optimal migration timing. It leverages the PPO algorithm’s perceptual capabilities in complex environments to develop SFC migration strategies and ensure network load balancing. Additionally, it reduces the number of subsequent migrations and minimizes the service interruption rate. The simulation results show that the service interruption rate of the RP-LBM algorithm is on average 27.3% lower than that of the passive migration method. The PPO-based migration algorithm has lower SFC migration times and service interruption rates compared to the DQN algorithm, ensuring service continuity with low migration costs.

1. Introduction

In traditional networks, deploying or updating the physical network requires specialized hardware to provide new services. This leads to high purchasing and maintenance costs. Additionally, the system cannot be easily customized or adjusted based on demand. Therefore, Network Function Virtualization (NFV) was developed. NFV aims to implement the functions of traditional network devices through software. It decouples network functions from dedicated hardware and runs them as software on general-purpose servers or in virtualized environments. This approach can significantly reduce operators’ costs and improve the flexibility and resilience of network service deployment. Network operators can use Service Function Chains (SFCs) to provide customized network services to users. SFC connects Virtualized Network Functions (VNFs). It not only supports the fine-grained and elastic delivery of services in the network but also allows for the modification of service functions and the migration of loads.
In NFV, operators use Service Function Chains to provide customized network services [1]. Network efficiency is closely related to the mapping of Virtualized Network Functions and the routing of SFCs. At the same time, as network slicing operates, the traffic arrival of each specific service fluctuates over time. This can lead to a mismatch between SFC resource demand and server resource availability, which negatively affects the quality of service (QoS) and resource utilization [2]. When a VNF layout and resource allocation strategy cannot meet the current network demands, the NFV Orchestrator (NFVO) reconfigures the SFC. This reconfiguration includes vertical scaling, horizontal scaling, and dynamic migration. Horizontal scaling makes full use of fragmented resources, but it is only possible when there are enough available resources on the server. Dynamic migration is a flexible solution during overload conditions, but it also incurs overhead.
Due to the dynamic changes in traffic, edge computing networks often experience overloaded nodes. This leads to increased computation delays and sometimes service interruptions. To maintain normal SFC services, SFC migration must be performed. Additionally, network load balancing should be ensured. Network load balancing allows the network system to adapt more easily to dynamic changes in traffic [3]. By introducing load balancing for time-varying network traffic, the network becomes more robust after VNF migration and promotes long-term resource optimization.
Based on the above background, we focus on the problem of SFC dynamic migration for network load balancing in edge computing environments where resources are scarce and highly dynamic. We use machine learning algorithms to predict VNF resource demands, which helps us anticipate changes in network node resource allocation. With the prediction results, we proactively determine the appropriate time for migration and identify the VNFs to migrate from overloaded nodes. We also use deep reinforcement learning algorithms to better understand complex environments and select target nodes for migration. This approach ensures network load balancing, reduces the number of future migrations, and minimizes the impact of unnecessary migrations.
In view of the above problems, the main contributions of this paper are as follows:
  • Considering the real SFC request scenario, we use a time-varying traffic dataset. We apply a CNN-AT-LSTM model to predict the short-term resource demands of VNFs. Using these predictions, we can proactively migrate VNFs in advance within a certain time frame.
  • Considering that dynamic changes in traffic may lead to frequent VNF migration and uneven use of network resources, we introduce a load-balancing model to maintain network stability.
  • In response to the multi-dimensionality and complexity of the VNF migration remapping problem caused by the dynamic network environment, we propose a DRL algorithm based on Proximal Policy Optimization (PPO) to enhance the effectiveness of VNF migration decisions.

2. Related Work

Within the framework of Network Function Virtualization, researchers from around the world have conducted many studies on the dynamic migration of Service Function Chains and have achieved significant results. One major challenge in dynamic migration is how to allocate resources for SFCs as their resource demands change and the duration that SFCs remain in the network is uncertain. Recent research typically takes one or more factors related to migration costs as optimization objectives and proposes corresponding migration decision algorithms.
For instance, the authors of [4] proposed a migration algorithm based on Deep Q-Networks (DQNs). Their goal was to minimize end-to-end latency and address service interruptions caused by overloaded nodes, link failures, and VNF instance failures. The authors of [5] introduced a framework that uses Convolutional Neural Networks (CNNs) and Artificial Neural Networks (ANNs) to determine the optimal migration paths for VNFs. Their aim was to minimize migration time and cost. This solution analyzes network conditions, workload patterns, and resource availability to achieve dynamic and efficient VNF relocation, which significantly improves network performance and reliability. Additionally, the authors of [6] presented a scalable cluster-based VNF migration algorithm. This algorithm is designed to minimize the total embedding cost while meeting the latency requirements between VNFs. It performs better than existing methods in terms of embedding costs and has a much shorter execution time compared to brute-force search methods.
However, the methods mentioned above only trigger migrations after changes occur in network traffic or after network node or link failures. These are forms of reactive migration and cannot proactively handle sudden changes in resource demand. Currently, to solve reactive migration, researchers are trying to predict resource demands in advance. In [7], the authors proposed a CAT-LSTM model to predict the resource demands of VNFs using SFC data. They improved accuracy by using aspect embedding and attention mechanisms. Similarly, the authors of [8] introduced a Graph Neural Network (GNN) model based on the graph features. This model accurately predicts VNF resource demands by identifying multidimensional dependencies within SFC graph structures. Although studies in [7,8] have made progress in prediction, they do not deeply explore how to effectively use these predictions to optimize resource allocation.
Moreover, existing research primarily focuses on optimizing objectives such as reducing latency and migration costs. However, merely minimizing energy consumption or overhead may lead to uneven utilization of network resources, and future dynamic traffic changes could result in frequent VNF migrations. Additionally, in edge computing networks where resources are relatively scarce and highly dynamic, greater emphasis should be placed on the distribution and utilization efficiency of resources to ensure network load balancing.
Current studies have demonstrated the advantages of ensuring network load balancing in enhancing network performance. For example, the authors of [9] proposed a load-balanced Virtual Network Embedding algorithm based on deep reinforcement learning for regional satellite networks. The algorithm divides the satellite network into multiple mapping regions and deploys Virtual Network Requests only in the region with the lowest load. It simultaneously considers topology dynamics and resource loads, effectively improving acceptance rates while also enhancing resource utilization. Similarly, the authors of [10] introduced a task scheduling algorithm considering load balancing issues in cloud computing environments. The proposed BCSV algorithm achieves better results in terms of completion time, load balancing, and resource utilization compared to previous algorithms, even in heterogeneous environments. This improvement enhances the overall performance of cloud computing networks.
In summary, we use high-precision predictions to anticipate SFC resource demands in advance and determine the appropriate timing for migration. We introduce a load-balancing model to ensure efficient use of resources, reduce the frequency of VNF migrations in the network, and avoid potential issues caused by uneven resource usage. Additionally, we design a deep reinforcement learning algorithm to proactively make migration decisions for SFCs on overloaded nodes, preventing the degradation or even interruption of network service quality.

3. System Model and Problem Description

In this section, we define the network model, describe the SFC and VNF, and then propose the VNF migration problem and the network load balancing model.

3.1. Network Model

We consider an NFV network architecture with three layers as depicted in [11]. As shown in Figure 1, physical nodes are connected by links in the physical layer. VNFs are instantiated on physical nodes that supply the necessary resources in the VNF layer. The various NF types that comprise the SFC are deployed on matching VNF types at the application layer.
The physical network is modeled as a fully connected undirected graph G = ( V s , E s ) , where V s is the set of underlying nodes and E s is the set of physical links between nodes. The underlying nodes have a set of resource types to support VNFs, such as CPU, memory, and cores. Here, C i S represents the maximum CPU resource capacity of the i -th node v i s V S , where each unit of CPU resource represents the resources required to process a data packet. The memory resource capacity of the i -th node is M i S , represented in available megabytes. The underlying link directly connecting nodes v i s and v j s is represented as e i j S E s , the physical link bandwidth is B W i j S , and the link delay is represented as L i j S .

3.2. SFC and VNF

The VNF type set is represented as F , and the CPU and memory resources consumed by the server to instantiate a VNF f F , denoted as   v v n f , are c f and m f , respectively. A physical node can instantiate one or more types of VNFs, and each type of VNF can support the deployment of multiple virtual network function instances.
The SFC set is represented as S , and each request s f c j S can be represented as s f c j = v j , i n s , v j , o u t s , V j N F , E j N F , l j , where v j , i n s and v j , o u t s represent the entrance and exit network elements, respectively; V j N F represents a set of ordered NF request sets, the set of VNF types that the traffic passes through in order; E j N F represents the logical link set, connecting the VNFs between the entrance and exit network element nodes; and l j represents the permissible end-to-end delay of the request. The CPU resource demand and memory resource demand of the u -th NF v j , u n f V j N F of s f c j are c j , u n f and m j , u n f , respectively. The traffic segment ( v j , u n f , v j , v n f ) of s f c j is represented as the logical link e j , u v n f E j N F , and the bandwidth demand of the logical link is b w j , u v n f .
The decision variable t j , u f { 0 , 1 } specifies the types of NFs in the SFC. If the u -th virtual node v j , u n f V j N F of s f c j is of type f , then t j , u f equals 1:
t j , u f = { 1 ,         i f   v j , u n f i s   o f   t y p e   o f   f F 0 ,                                                                 o t h e r w i s e
It is stipulated that the entrance and exit network element nodes do not have a type; that is, t j , u f is always equal to 0. When the decision variable η j , u i { 0 , 1 } equals 1, it indicates that the u -th virtual node v j , u n f V j N F of s f c j is deployed on the physical node v i s V S :
η j , u i , t = { 1 ,           i f   v j , u n f i s   m a p p e d   t o   v i s   d u r i n g   t 0 ,                                                                                       o t h e r w i s e
When the decision variable τ j , u v p q { 0 , 1 } equals 1, it indicates that the traffic segment e j , u v n f E j N F of s f c j flows through the physical link e p q S E S , which can be represented as follows:
τ j , u v p q = { 1 ,             i f   e j , u v n f i s   m a p p e d   t o   e p q s   d u r i n g   t 0 ,                                                                                             o t h e r w i s e
If an NF of type f F is mapped to v i s V S , then a VNF of type f must also be instantiated on node v i s . Here we define the state variable β f i { 0 , 1 } , which equals 1 if there is a VNF of type f F on the physical node v i s V S :
β f i = { 1             i f   a   V N F   o f   t y p e   f   i s   a s s i g n e d   o n   v i s 0                                                                                                             o t h e r w i s e ,
β f i , t = 1 ,   if   s f c j S v j , u n f V j N F η j , u i , t t j , u f 1 ,   v i s V S ,   f F ,   t T
Due to dynamic traffic changes, the relationship between the inflow bandwidth and resource demand of v j , v n f V j N F at time t can be expressed as follows:
c j , v n f ( t ) = b w j , u v n f ( t ) f F t j , v f   coeff c f t T
m j , v n f ( t ) = b w j , u v n f ( t ) f F t j , v f   coeff m f t T
where c o e f f c f and c o e f f m f represent the CPU resource coefficient and memory resource coefficient, respectively, of the VNF of type f F . The data flow may be amplified or reduced (such as firewalls and intrusion defense will reduce traffic) when passing through each NF, causing the required bandwidth to change. The relationship between the inflow bandwidth of an NF v j , v n f V j N F and the required bandwidth of the logical link connected to it is
b w j , u v n f ( t ) = b w j , k u n f ( t ) f F t j , v f   ratio f   t T
where v k n f , v u n f , v v n f V i N F are a set of ordered NFs, and ratio f is the bandwidth scaling factor for VNF of type f F .
To describe the migration process, here we define variables related to server status and VNF migration. When ξ j , u t { 0 , 1 } equals 1, it indicates that the virtual node v j , u n f needs to perform migration in the t period.
We define a set of variables related to resource prediction. Resource data can be CPU, disk, memory, or other system resource usage, such as the number of processes, OS load, etc. For simplicity, this paper considers predicting CPU and memory data. c j , u n f ( t + n ) ( n = 1 , 2 , ) represents the CPU resource demand of v j , u n f in the t + n period obtained through prediction; m j , u n f ( t + n ) ( n = 1 , 2 , ) represents the memory resource demand of v j , u n f in the t + n period obtained through prediction.
t h r o v e r represents the overload threshold of the underlying server.
The above variables satisfy the following constraint conditions: each virtual network function of the service function chain should be deployed on a physical node, which is unique, and can be represented as follows:
v i s V s η j , u i , t = 1   s f c j S ,   v j , u n f V j N F ,   t T
Each virtual link needs to be mapped onto a physical link. Since the virtual link may span nodes,
e p q s E s τ j , u v p q , t 1   s f c j S ,   e j , u v n f E j N F ,   t T
The constraints on CPU and memory resources of the underlying network nodes are respectively satisfied:
f F β f i , t c f + s f c j S v j , u n f V j N F η j , u i , t c j , u n f ( t ) C i S   v i s V S ,   t
f F β f i , t m f + s f c j S v j , u n f V j N F η j , u i , t m j , u n f ( t ) M i S   v i s V S ,   t T
The underlying link bandwidth constraint is satisfied:
s f c j S e j , u v n f E j N F τ j , u v p q , t b w j , u v n f ( t ) B W p q S   e p q s E S ,   t T
The service function chain delay limit is satisfied:
e j , u v n f E j N F e p q s E s τ j , u v p q , t L p q S l j s f c j S ,   t T
Every pair of connected VNFs ( v j , u n f , v j , v n f ) satisfies traffic conservation. Given a logical link e j , u v n f E j N F and underlying nodes v i s V S ,
v q s Ω + v p s τ j , u v q p , t v q s Ω v p s τ j , u v p q , t = η j , v p , t η j , u p , t ,                                               v j , u n f , v j , v n f V j N F 1 ,   v j , v n f { v j , o u t s } , v j , o u t s = v p s , v j , u n f V j N F 1 ,   v j , v n f { v j , i n s } , v j , i n s = v p s , v j , u n f V j N F 0 ,                                                                                                               o t h e r w i s e
where Ω + ( v p s ) and Ω ( v p s ) represent the upstream and downstream node sets of vps, respectively.
The migration of the virtual node v j , u n f V j N F is represented by the following constraint condition. If the underlying node mapped by v j , u n f is different before and after the traffic change, then v j , u n f is migrated:
ξ j , u t η j , u i , t η j , u i + 1 , t v j , u n f V j N F ,   s f c j S ,   v i s V S ,   t T

3.3. Load Balancing Model

The network system can more readily adjust to the dynamic fluctuations in network traffic because of network load balancing [3]. Load balancing is implemented for time-varying network traffic in order to improve network resilience following VNF migration and encourage long-term resource optimization.
The system’s efficiency in terms of load balancing is assessed using the change in resources. The average resource usage rate of physical nodes is directly correlated with the variance of system resources [12]. The load on physical node m may change as a result of VNF j being moved from physical node n to physical node m during the VNF migration process.
At this time, the load on physical node m can be represented as follows:
L m cpu ( t ) = j N i v x m i , j · C i , j v ( t )
L m mem ( t ) = j N i v x m i , j · M i , j v ( t )
where the CPU and memory resource loads of physical node m are denoted by L m cpu ( t ) and L m mem ( t ) , respectively. To determine overall load balancing, the variation in resources on a single physical node m is insufficient. Based on variations in system resources, the load-balancing capability of the system can be assessed. Consequently, the mean CPU and memory load values at time t are:
L mean cpu ( t ) = m N P L m cpu ( t ) / C m | N P |
L mean mem ( t ) = m N P L m mem ( t ) / M m | N P |
where C m and M m represent the physical node m ’s CPU and memory resource capacity, respectively. The network’s variations in CPU and memory resources are stated as follows:
L var cpu ( t ) = m N P [ L m cpu ( t ) L mean cpu ( t ) ] 2 | N P |
L var mem ( t ) = m N P [ L m mem ( t ) L mean mem ( t ) ] 2 | N P |
We employ the percentage of resource consumption as the measurement unit when assessing a network system’s load balancing to make sure that storage and processing resources are quantified in the same unit. Consequently, the weighted sum of the variations of computing and storage resources can be used to determine the load-balancing capability L total ( t ) of the network system, which is expressed as follows:
L total ( t ) = ω 1 L var cpu ( t ) + ω 2 L var mem ( t ) ,
where the weight factors for the impact of storage and computing resources on network load balancing are denoted by ω 1 and ω 2 , respectively. Both resources are assumed to have the same effect on the network in this paper, with ω 1 = ω 2 = 0.5 .
Thus, minimizing the variation in computing and storage resources is the system’s optimal goal, as indicated by
m i n L total ( t )

4. Algorithm Design

Liu et al. [13] demonstrated that the VNF migration problem is NP-hard, since the significant scale of the network, the diversity of the network service flow, and the variability of the network environment make the VNF placement decision space high-dimensional and difficult.
Therefore, this section proposes a Resource Predictive Load Balancing SFC Migration Algorithm (RP-LBM) to address the aforementioned issue. The overall architecture of the algorithm is shown in Figure 2. The resource monitoring module obtains the dynamic resource utilization of the physical network. The Long Short-Term Memory (LSTM) model uses historical resource demand data to predict the NF resource demand at a future time. The resource prediction result is input into the migration decision module. For the SFC deployed on the node that is expected to be overloaded, the Proximal Policy Optimization algorithm is used to obtain the migration strategy. This strategy includes determining the migration path and VNF embedding nodes, ensuring that the service is not interrupted, and minimizing resource variance.

4.1. LSTM-Based Resource Demand Prediction Algorithm

Traffic variations in network environments are typically highly real-time and dynamic. Short-term forecasting can promptly reflect and quickly capture these changes. Deep learning models possess rapid learning and adaptation capabilities, enabling them to accurately complete prediction tasks and capture real-time traffic fluctuations within a short time frame. Therefore, we have adopted an integrated algorithm that combines Convolutional Neural Networks, Long Short-Term Memory networks, and attention mechanisms to accurately predict the resource demands of VNF [14].
First, the CNN module is used to extract key features from the historical data of VNF resource usage. CNNs excel at identifying local patterns and spatiotemporal features in data, and are able to generate high-quality feature representations. The convolutional layer scans the VNF resource usage time series data with multiple convolutional kernels. This captures different types of local patterns, such as peaks and regular changes in resource usage. As a result, it generates a rich set of feature maps that clearly show the underlying structure of VNF resource usage.
Next, the features extracted by the CNN are passed to the LSTM network. LSTM has powerful sequence processing capabilities, which can capture the time dependencies and long-term patterns in the use of VNF resources. This helps the model learn the connections between different time steps. LSTM processes the feature sequences and captures the dynamic information of resource demands change over time by maintaining hidden states and memory cells. LSTM can identify trend changes and cyclical fluctuations in resource use.
To further improve the model’s prediction capability, we integrate an attention mechanism into the LSTM network. The attention mechanism can assign different weights to each time step. This highlights the most important historical information for the current prediction. Figure 3 shows a flowchart of how the attention mechanism is integrated with the LSTM network. The workflow of the attention mechanism is as follows:
At each time step t , the attention mechanism calculates the attention weights based on the hidden state of the current LSTM h t and the hidden state of all previous time steps α t , i , where i takes all previous time steps. Attention weights are calculated by a scoring function that measures the relevance of each hidden state h i to the current prediction task t :
e t , i = s c o r e h t , h i = h t W a h i
α t , i = e x p ( e t , i ) k = 1 T e x p ( e t , k )
where W a is a learnable weight matrix, and the softmax function ensures that the attention weights α t , i are 1 on all i .
Then, the attention weight α t , i is used to perform a weighted sum of the hidden states to generate a context vector c t :
Finally, the context vector c t is h t connected to the current hidden state, and the final resource demand forecast y ^ t is generated through the fully connected layer:
y ^ t = s o f t m a x W c c t ; h t + b c
where W c and b c are the weight matrix and bias term of the fully connected layer, respectively.
Through this hybrid model combining CNN, LSTM, and attention mechanisms, the paper achieves precise predictions of future VNF resource demands, providing a reliable basis for SFC migration.

4.2. PPO-Based SFC Migration Algorithm

In the study of VNF migration problems, traditional heuristic algorithms are widely used in practical scenarios due to their simplicity and low computational complexity. However, in high-dimensional and complex decision spaces, these algorithms tend to get stuck in local optima. This limits the overall effectiveness of migration strategy optimization.
Deep reinforcement learning (DRL) algorithms, with their self-learning capabilities and extensive data analysis power, can explore a more comprehensive strategy space in high-dimensional decision environments. This allows DRL algorithms to find superior migration strategies compared to heuristic algorithms in complex and dynamic network environments. Although training deep neural networks requires significant computational resources and time, they offer notable advantages in decision accuracy and migration strategy optimization for complex networks involving numerous network nodes, links, and service functions [4]. Additionally, the use of experience replay and parallel training techniques can improve training efficiency and reduce resource consumption.
Therefore, we propose a DRL algorithm based on Proximal Policy Optimization to solve the VNF migration problem. This algorithm converts the optimization objectives into a Markov Decision Process (MDP) and then uses the PPO algorithm to solve the MDP model, obtaining a near-optimal solution.

4.2.1. MDP Model

As part of dynamic optimization, we take into account reducing the variance of network resources following VNF migration under time-varying traffic. Since the state space, action space, state transition probability, and reward function are represented by the four-element tuple   ( S , A , P , R ) , respectively, the optimization aim is changed into a Discrete Time Markov Decision Process (DTMDP).
The network’s state space at time t is represented by the state space S , s ( t ) = { ψ ( t ) , ξ ( t ) , C ( t ) , M ( t ) } S , where ψ ( t ) denotes the state space of the physical node and ξ ( t ) denotes the state space of the physical link, respectively, as follows:
ψ ( t ) = { φ 1 ( t ) , φ 2 ( t ) , , φ n ( t ) }   n N p
ξ ( t ) = { S 1 , 1 ( t ) , S 1 , 2 ( t ) , , S n , m ( t ) }   n , m N p ,   l n m L p
The state space of the CPU and the memory resource requirements of VNF are represented by C ( t ) and M ( t ) , respectively. These demand data reflect the anticipated outcome of VNF’s resource needs. The two resource requirements’ state spaces can be shown as follows:
C ( t ) = { C 1 , 1 v ( t ) , C 1 , 2 v ( t ) , , C i , j v ( t ) }   i F , j N i v
M ( t ) = { M 1 , 1 v ( t ) , M 1 , 2 v ( t ) , , M i , j v ( t ) }   i F , j N i v
The set of mapping activities that VNF can perform is denoted by action space A. The mapping action space of VNF can now be expressed as follows:
a ( t ) = { a 1 , 1 ( t ) , a 1 , 2 ( t ) , , a i , j ( t ) } i F , j N i v , a ( t ) A
where a i , j ( t ) is the mapping action that the j -th VNF on SFC i may perform. This includes the selection scheme of the physical node mapping variable x n i , j and the virtual link mapping variable y n , m j , k .
The transition probability P , which can be expressed as p ( s ( t + 1 ) s ( t ) , a ( t ) ) , is the probability of moving to the next state s ( t + 1 ) following an action a ( t ) a in the current state s ( t ) s .
The network creates a mapping strategy π when it performs action a ( t ) under state s ( t ) . The optimization aims states that an instantaneous reward r ( t ) can be obtained if the mapping approach π fulfills system restrictions.
r ( t ) = L total ( t )
The network uses the value function V π to assess the quality of the current VNF mapping strategy π at this precise instant. V π can be represented as follows:
V π ( s ( t ) , a ( t ) ) = E [ R ( t ) s ( t ) , a ( t ) ] = E [ r ( t ) + γ v r ( t + 1 ) + s ( t ) , a ( t ) ] = E [ r ( t ) + γ v V π ( s ( t + 1 ) , a ( t + 1 ) ) s ( t ) , a ( t ) ]
where the importance of future rewards in relation to present rewards is represented by the discount factor, γ v [ 0 , 1 ] . The optimal strategy for VNF migration can be expressed as follows:
π = arg max a V π ( s , a ) s , a

4.2.2. Proximal Policy Optimization Algorithm

Proximal Policy Optimization is a widely used policy optimization algorithm in reinforcement learning. PPO aims to stabilize the training process and reduce fluctuations and uncertainties during policy updates by limiting the difference between new and old policies [15]. PPO is mainly implemented through the following key steps.
First, it interacts with the environment using the current policy to collect a batch of trajectory data, including states s t , actions a t , rewards r t , and next states s t + 1 . To improve sample efficiency, data are usually collected simultaneously in multiple parallel environments. These data are used for subsequent updates of the policy and value function.
Next, the Generalized Advantage Estimation (GAE) method is used to estimate the advantage function A t ^ . The advantage function measures the superiority of a particular action relative to the average policy:
A t ^ = δ t + ( γ λ ) δ t + 1 + + ( γ λ ) T t + 1 δ T 1
where δ t = r t + γ V ( s t + 1 ) V ( s t ) , γ is the discount factor, and λ is the GAE decay parameter.
Next, for each sample point, the probabilities of selecting action a t under state s t for both the new and old policies, π θ old ( a t | s t ) and π θ ( a t | s t ) , are calculated. The probability ratio r ( θ ) indicates the relative change between the new and old policies. A ratio greater than 1 indicates that the new policy is more inclined to choose action a t in this state, while a ratio less than 1 indicates the opposite:
r ( θ ) = π θ ( a t | s t ) / π θ old ( a t | s t )
To limit the extent of policy updates and avoid r ( θ ) being too large or too small, PPO introduces a clipping operation. The clipped objective function is defined as follows:
L CLIP ( θ ) = E t [ min ( r t ( θ ) A t ^ ,   clip ( r t ( θ ) , 1 ϵ , 1 + ϵ ) A t ^ ) ]
where ϵ is a small positive number, typically between 0.1 and 0.3. The clipping operation constrains r t ( θ ) within the range [ 1 ϵ ,   1 + ϵ ] , preventing excessive policy updates that lead to unstable training.
In addition, PPO uses mean squared error to minimize the difference between the value function and the target value:
L VF ( θ ) = E t [ ( V θ ( s t ) V t target ) 2 ]
The value function V θ ( s t ) estimates the expected return after executing the policy in state s t , while the target value V t target estimates the actual return, which can be calculated using Monte Carlo or temporal difference methods.
To encourage exploration and prevent premature convergence to local optima, an entropy regularization term is added. The entropy regularization term rewards policies with higher entropy, i.e., more random policies, which helps ensure adequate exploration during the early stages of training:
S [ π θ ] ( s t ) = a π θ ( a | s t ) log π θ ( a | s t )
Combining these components, the total loss function L ( θ ) is defined as follows:
L ( θ ) = E t ( L CLIP ( θ ) c 1 L VF ( θ ) + c 2 S π θ )
where c 1 and c 2 are weights used to balance the importance of policy loss, value function loss, and the entropy regularization term.
To optimize the total loss function L ( θ ) , PPO typically uses stochastic gradient descent algorithms (such as the Adam optimizer) for parameter updates.
First, based on the current batch of sample data, calculate the gradients of the total loss function L ( θ ) with respect to the policy parameters and value function parameters, θ a L ( θ ) and θ c L ( θ ) . Use the Adam optimizer to update the parameters based on the computed gradient information:
θ a L ( θ ) θ θ α · m t v t + ϵ
where α is the learning rate; m t and v t are the first- and second-moment estimates of the gradient, respectively; and ϵ is a small constant to prevent division by zero.
The policy parameters θ a aim to maximize policy rewards by minimizing the negative terms in the loss function:
θ a θ a α · θ a ( L CLIP ( θ ) c 2 S π θ )
The value function parameters θ c aim to minimize the value function loss L VF ( θ ) :
θ c θ c α · θ c L VF ( θ )
Through these steps, PPO effectively balances exploration and exploitation during policy optimization, stabilizes the training process, and enhances sample efficiency.

4.3. Resource Predictive Load Balancing SFC Migration Algorithm

The pseudocode for the RP-LBM algorithm is shown in Algorithm 1. RP-LBM uses predicted future resource demands and the physical network graph as inputs to generate the SFC mapping strategy π (Lines 1–2). First, it calculates the resource utilization η R for each physical node based on the prediction results (Line 3). If η R exceeds the set threshold t h r o v e r (Line 4), it selects an SFC to migrate on that node (Line 5) and it initializes the policy parameters ( θ c ,   θ a ) , the maximum number of iterations K m a x , and other relevant parameters (Line 6). In the main loop (Lines 7–19), it selects mapping actions a ( t ) from the policy π ( s n ( t ) | a n ( t ) , θ a n ) , executes the action to obtain an immediate reward r ( t ) , and transitions to a new state s ( t + 1 ) , while calculating the advantage function A ( s n ( t ) , a n ( t ) ) . If constraints are not satisfied, it sets the immediate reward r(t) = 1 / ε and reselects actions. Then, it computes the clipped surrogate objective for Proximal Policy Optimization and updates the policy parameters θ a using gradient ascent and the value function parameters θ c using gradient descent. If the resource utilization does not exceed the threshold, no action is taken (Line 20).
Algorithm 1 Resource Predictive Load Balancing SFC Migration Algorithm (RP-LBM)
1Input: Prediction result c j , u n f ( t + n ) ,     m j , u n f ( t + n ) ( n = 1 , 2 , ) ; Physical network diagram G = ( V s , E s ) ; SFC network diagram G i n f = ( V i n f ,   E i n f )
2Output: SFC mapping strategy π ;
3Calculate the resource utilization η R of each physical node according to the prediction result;
4if η R     t h r o v e r then
5  Select an SFC to migrate on that node;
6  Initialize ( θ c ,   θ a ) , K m a x , M, ( ( ε c ,   ε a ) );
7  for episode = 1, …, M do
8    Select mapping actions a ( t ) from the strategies π ( s n ( t ) | a n ( t ) , θ a n ) ;
9    if constraints are satisfied then
10      Execute the action a ( t ) , get the instantaneous reward
11      r ( t ) and transfer it to the state s ( t + 1 ) ;
12      Obtain the advantage function A ( s n ( t ) , a n ( t ) ) ;
13    else
14    Set instantaneous reward r(t) = 1 / ε , and re-select the action a ( t ) from the policy network;
15    end if
16    Compute the clipped surrogate objective for PPO
17    Update the policy parameters θ a using gradient ascent
18    Update the value function parameters θ c using gradient descent
19  end for
20end if

5. Performance Evaluation

5.1. Simulation Setup

In order to evaluate the effectiveness of the above algorithm, we conducted simulations using the SFC simulation platform SFCSim [16] using the Python environment and tested the program on a cellular network topology.
Figure 4 depicts the physical network, which consists of 42 links and 19 physical nodes. The server’s CPU and memory resources are set to roughly ( 250 ,   300 )   MIPS and ( 600 ,   1000 )   GB , respectively, uniformly divided [17]. The link delay is uniformly distributed in units of ( 1 ,   4 )   ms , and we have set the link bandwidth capacity to 500 Mbps. The node overload threshold is 0.7.
We consider eight types of VNFs. Each SFC chooses a subset of VNFs at random, and the length of each SFC is randomly selected from two to five. The entrance and exit are chosen at random from V S . The bandwidth factor ratios of the various VNF variants are chosen from 0.5, 1, and 1.5.
The CPU coefficient c o e f f c f and memory coefficient c o e f f m f of different types of VNFs are uniformly distributed in (0.1, 0.2). The maximum delay of the SFC is 30 ms. We set the network to have 20 paths in a 24-hour period using the Clearwater VNF Dataset’s traffic dataset [18].

5.2. Simulation Result and Analysis

In this paper, the following metrics are used to evaluate the performance of the algorithm under three conditions of sufficient resources, moderate resources, and scarce resources: node resource variance, total network migration times, average variance of network node resources, and service interruption rate, i.e., the ratio of interruption time to the SFC lifecycle.
Firstly, the performance of three algorithms in predicting the resource demand of each network function in the prediction network was compared: LSTM, CNN-AT-LSTM, and LSTM–encoder–decoder. The results of model training are shown in Table 1. It can be seen that the mean square error (MSE) and root mean square error (RMSE) between the predicted values and the actual values of the CNN-AT-LSTM model are the smallest, indicating that this model has the highest prediction accuracy.
To intuitively demonstrate the predictive capabilities of the models, we chose LSTM and CNN-AT-LSTM as examples. We compared the CPU demand predictions for a single VNF with the actual values. The detailed comparison results are presented in Figure 5, which includes annotations for “Level Shift” and “Change Point.” During times when the data experienced significant level shifts, CNN-AT-LSTM quickly adjusted its predictions and closely followed the changes in the actual values. In contrast, the LSTM model showed some delay. At data change points, the prediction errors of CNN-AT-LSTM were much smaller than those of the LSTM model. This highlights CNN-AT-LSTM’s ability to capture rapid dynamic changes more effectively. The results indicate that the attention mechanism in CNN-AT-LSTM allows it to focus on important time steps within the input sequence. This is especially useful during pattern shifts and data change points. As a result, CNN-AT-LSTM improves overall prediction accuracy and response speed. In summary, CNN-AT-LSTM offers higher prediction precision and reliability when dealing with the dynamic fluctuations of network resource demands. It is better equipped to handle severe and sudden variations in resource requirements compared to the LSTM model.
The resource demand prediction results serve as input for the RP-LBM. RP-LBM can identify potential overload nodes in advance and develop corresponding migration strategies. Figure 6 and Figure 7 illustrate the changes in network resource variance under three different resource states: (a) sufficient resources, (b) moderate resources, and (c) scarce resources. The figure also compares the average node resource variance under these different conditions. In each scenario, we use different deep reinforcement learning algorithms, namely DQN and PPO, to create migration strategies. We compare these strategies with situations where no pre-migration is performed. The reward functions of the DQN and PPO algorithms are based on the network system’s load-balancing capability. This design causes the algorithms to prefer migration strategies that minimize network resource variance. Consequently, both algorithms maintain a lower variance in network resource usage and support the network’s load-balancing state. In contrast, the no-migration strategy uses the shortest path algorithm to deploy SFC but does not optimize network node resource usage. This approach results in unbalanced network resource utilization and fails to manage resource allocation effectively. As a result, the overall performance and stability of the system may be negatively affected.
We also compare the total number of network SFC migrations and the SFC service interruption rates when using different algorithms, as shown in Figure 8 and Figure 9. The analysis shows that, across all three resource states, the RP-LBM-PPO algorithm always results in fewer SFC migrations than the RP-LBM-DQN algorithm. Additionally, when resources are sufficient or moderate, the PPO algorithm performs similarly to the DQN algorithm in terms of service interruption rates. This indicates that the PPO algorithm can maintain lower service interruption rates with less migration overhead. Figure 5 shows that resource demand has periodic fluctuations and sudden data surges. Throughout the traffic variations, the RP-LBM algorithm consistently maintains the service interruption rate low. This demonstrates the algorithm’s capability to quickly ensure service quality during traffic bursts. By rapidly adjusting resource allocation, the algorithm effectively maintains service continuity. These results further validate the superior performance of the RP-LBM-PPO algorithm in dynamic environments.
In contrast, the strategy that does not use pre-migration can only respond passively to changes in resource demand. This approach cannot develop a migration plan in advance and results in the highest service interruption rate. This finding highlights the importance of predicting resource demand and planning ahead to improve network stability and reduce the risk of interruptions. Combined with Figure 7, it can be known that when the network resource variance is small, resource utilization is more balanced. Balanced resource utilization directly impacts the overall performance and efficiency of the network. The PPO and DQN algorithms maintain lower levels of network resource variance through effective load-balancing strategies. In this way, these algorithms achieve more balanced resource allocation and reduce the service interruption rate.
However, when resources are scarce, the number of migrations using the DQN algorithm increases significantly. This leads to a sharp rise in the service interruption rate, which is 7.6% higher than that of the PPO algorithm. In a resource-scarce environment, the DQN algorithm tends to frequently adjust network configurations to balance the load. These excessive migration operations increase the system’s overhead and complexity, resulting in a higher service interruption rate. The PPO algorithm, on the other hand, uses a more stable policy optimization method. This approach allows the PPO algorithm to adjust policies more effectively under resource constraints, reducing unnecessary migration times. The policy update mechanism of PPO ensures the continuity and stability of network services with only small policy changes. Therefore, in situations where resources are limited, the PPO algorithm demonstrates greater efficiency and stability. It reduces the number of migrations while maintaining a low service interruption rate. This performance indicates that the PPO algorithm has better robustness and adaptability in complex network environments.

6. Conclusions

6.1. Summary of the Performance

To address the issue of the service QoS degradation or interruptions caused by dynamic changes in resource demand within edge computing environments, we conducted an in-depth study on the dynamic migration problem of SFCs where resources are scarce and constantly changing. The focus is on optimizing network performance through load balancing. We used the CNN-AT-LSTM machine learning model to predict the resource demands of VNFs. This approach allows us to anticipate changes in network node resource allocation and proactively develop migration strategies based on these predictions. By determining which VNFs to migrate from overloaded nodes, we ensure efficient resource management. We integrated the Proximal Policy Optimization deep reinforcement learning algorithm into our prediction model. This combination enables the system to detect changes in complex environments and select appropriate target nodes for migration. It also achieves network load balance, reduces the number of subsequent migrations, and minimizes the impact of unnecessary migrations.
The experimental results demonstrate that the RP-LBM-PPO algorithm outperforms the RP-LBM-DQN algorithm in both the number of migrations and the service interruption rate. When resources are sufficient or moderate, the PPO algorithm maintains a low service interruption rate with minimal migration overhead. However, in resource-scarce conditions, the DQN algorithm significantly increases the number of migrations and the service interruption rate due to frequent adjustments. In contrast, the PPO algorithm effectively reduces these negative impacts through stable policy optimization. These findings indicate that the PPO algorithm offers better efficiency and stability, making it more robust and adaptable in complex network environments.

6.2. Potential Future Research Directions

We assume that the lifecycle of SFC is infinite and do not consider how dynamic arrivals affect network load, migration operations, and service interruption rates. Additionally, the single-objective optimization algorithm we used focuses solely on minimizing the variance of system computation and storage resources to balance the network load. Although reducing the number of migrations and controlling service interruption rates have significantly improved the efficiency and stability of network resource management, we have not specifically modeled or deeply analyzed the time costs and resource overhead of migration operations. Future research should address SFCs with dynamic lifecycles and develop detailed models for migration costs and energy consumption [19]. Furthermore, it should explore multi-objective optimization methods to achieve a comprehensive optimization of resource utilization efficiency, service quality, and energy consumption.

Author Contributions

T.S., H.H., and S.Z. conceived of the idea, designed and performed the evaluation, analyzed the results, drafted the initial manuscript, and revised the final manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundations of China under grant 61821001.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors gratefully acknowledge the informative comments and suggestions of the reviewers, which have improved both the content and presentation.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Bhamare, D.; Jain, R.; Samaka, M.; Erbad, A. A Survey on Service Function Chaining. J. Netw. Comput. Appl. 2016, 75, 138–155. [Google Scholar] [CrossRef]
  2. Qu, K.; Zhuang, W.; Shen, X.; Li, X.; Rao, J. Dynamic Resource Scaling for VNF over Nonstationary Traffic: A Learning Approach. IEEE Trans. Cogn. Commun. Netw. 2020, 7, 648–662. [Google Scholar] [CrossRef]
  3. Guo, Z.; Xu, Y.; Liu, Y.F.; Liu, S.; Chao, H.J.; Zhang, Z.-L.; Xia, Y. AggreFlow: Achieving Power Efficiency, Load Balancing, and Quality of Service in Data Center Networks. IEEE/ACM Trans. Netw. 2020, 29, 17–33. [Google Scholar] [CrossRef]
  4. Liu, H.; Chen, J.; Chen, J.; Cheng, X.; Guo, K.; Qin, Y. A Deep Q-Learning Based VNF Migration Strategy for Elastic Control in SDN/NFV Network. In Proceedings of the 2021 International Conference on Wireless Communications and Smart Grid (ICWCSG), Hangzhou, China, 26–28 November 2021; pp. 217–223. [Google Scholar]
  5. Tanuboddi, B.R.; Gad, G.; Fadlullah, Z.M.; Fouda, M.M. Optimizing VNF Migration in B5G Core Networks: A Machine Learning Approach. In Proceedings of the 2024 International Conference on Smart Applications, Communications and Networking (SmartNets), Harrisonburg, VA, USA, 28–30 May 2024; pp. 1–5. [Google Scholar]
  6. Afrasiabi, S.N.; Ebrahimzadeh, A.; Promwongsa, N.; Mouradian, C.; Li, W.; Recse, Á.; Szabó, R.; Glitho, R.H. Cost-Efficient Cluster Migration of VNFs for Service Function Chain Embedding. IEEE Trans. Netw. Serv. Manag. 2024, 21, 979–993. [Google Scholar] [CrossRef]
  7. Kim, H.G.; Lee, D.Y.; Jeong, S.Y.; Choi, H.; Yoo, J.H.; Hong, J.W. Machine Learning-Based Method for Prediction of Virtual Network Function Resource Demands. In Proceedings of the 2019 IEEE Conference on Network Softwarization (NetSoft), Paris, France, 24–28 June 2019; pp. 405–413. [Google Scholar]
  8. Bellili, A.; Kara, N. A Graphical Deep Learning Technique-Based VNF Dependencies for Multi Resource Requirements Prediction in Virtualized Environments. Computing 2024, 106, 449–473. [Google Scholar] [CrossRef]
  9. Zhu, R.; Li, G.; Zhang, Y.; Fang, Z.; Wang, J. Load-Balanced Virtual Network Embedding Based on Deep Reinforcement Learning for 6G Regional Satellite Networks. IEEE Trans. Veh. Technol. 2023, 72, 14631–14644. [Google Scholar] [CrossRef]
  10. Chiang, M.L.; Hsieh, H.C.; Cheng, Y.H.; Lin, W.L.; Zeng, B.H. Improvement of Tasks Scheduling Algorithm Based on Load Balancing Candidate Method under Cloud Computing Environment. Expert Syst. Appl. 2023, 212, 118714. [Google Scholar] [CrossRef]
  11. Wen, T.; Yu, H.; Sun, G.; Liu, L. Network function consolidation in service function chaining orchestration. In Proceedings of the IEEE International Conference on Communications (ICC), Kuala Lumpur, Malaysia, 22–27 May 2016; pp. 1–6. [Google Scholar]
  12. Li, B.; Cheng, B.; Liu, X.; Wang, M.; Yue, Y.; Chen, J. Joint Resource Optimization and Delay-Aware Virtual Network Function Migration in Data Center Networks. IEEE Trans. Netw. Serv. Manag. 2021, 18, 2960–2974. [Google Scholar] [CrossRef]
  13. Liu, Y.; Lu, Y.; Li, X.; Yao, Z.; Zhao, D. On Dynamic Service Function Chain Reconfiguration in IoT Networks. IEEE Internet Things J. 2020, 7, 10969–10984. [Google Scholar] [CrossRef]
  14. Wan, A.; Chang, Q.; Khalil, A.B.; He, J. Short-Term Power Load Forecasting for Combined Heat and Power Using CNN-LSTM Enhanced by Attention Mechanism. Energy 2023, 282, 128274. [Google Scholar] [CrossRef]
  15. Wang, Z.; Yan, H.; Wei, C.; Wang, J.; Bo, S.; Xiao, M. Research on Autonomous Driving Decision-Making Strategies Based Deep Reinforcement Learning. In Proceedings of the 4th International Conference on Internet of Things and Machine Learning, Nanchang, China, 9–11 August 2024. [Google Scholar]
  16. SFCSim Simulation Platform. Available online: https://pypi.org/project/sfcsim/ (accessed on 5 January 2021).
  17. Tang, L.; He, L.; Tan, Q.; Chen, Q. Virtual Network Function Migration Optimization Algorithm Based on Deep Deterministic Policy Gradient. J. Electron. Inf. Technol. 2021, 43, 404–411. [Google Scholar]
  18. VNFDataset: Virtual IP Multimedia IP System. Available online: https://www.kaggle.com/imenbenyahia/clearwatervnf-virtual-ip-multimedia-ip-system (accessed on 30 January 2021).
  19. Xu, L. Research on Key Technologies of Service Function Chain Orchestration Optimization for Mobile Scenarios. Ph.D. Dissertation, Beijing University of Posts and Telecommunications, Beijing, China, 2023. [Google Scholar]
Figure 1. SFC mapping architecture.
Figure 1. SFC mapping architecture.
Sensors 24 08046 g001
Figure 2. Overall framework of the RP-LBM.
Figure 2. Overall framework of the RP-LBM.
Sensors 24 08046 g002
Figure 3. An LSTM model with attention.
Figure 3. An LSTM model with attention.
Sensors 24 08046 g003
Figure 4. Cellular network topology.
Figure 4. Cellular network topology.
Sensors 24 08046 g004
Figure 5. Comparison of model-predicted values and actual values.
Figure 5. Comparison of model-predicted values and actual values.
Sensors 24 08046 g005
Figure 6. Comparison of network node resource variance when different algorithms are used under different resource conditions: (a) when resources are sufficient; (b) when resources are moderate; (c) when resources are scarce.
Figure 6. Comparison of network node resource variance when different algorithms are used under different resource conditions: (a) when resources are sufficient; (b) when resources are moderate; (c) when resources are scarce.
Sensors 24 08046 g006aSensors 24 08046 g006b
Figure 7. Comparison of average node resource variance under different resource conditions.
Figure 7. Comparison of average node resource variance under different resource conditions.
Sensors 24 08046 g007
Figure 8. Comparison of total number of SFC migrations between RP-LBM-PPO and RP-LBM-DQN under different resource conditions.
Figure 8. Comparison of total number of SFC migrations between RP-LBM-PPO and RP-LBM-DQN under different resource conditions.
Sensors 24 08046 g008
Figure 9. Comparison of service interruption rate between RP-LBM-PPO and RP-LBM-DQN under different resource conditions.
Figure 9. Comparison of service interruption rate between RP-LBM-PPO and RP-LBM-DQN under different resource conditions.
Sensors 24 08046 g009
Table 1. Comparison of average results on all VNFs in SFCs between LSTM, CNN-AT-LSTM, and LSTM–encoder–decoder.
Table 1. Comparison of average results on all VNFs in SFCs between LSTM, CNN-AT-LSTM, and LSTM–encoder–decoder.
MetricLSTMCNN-AT-LSTMLSTM–Encoder–Decoder
MSE21.97315.57919.653
RMSE4.6883.9474.433
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sun, T.; Hu, H.; Zhang, S. Load-Balanced Dynamic SFC Migration Based on Resource Demand Prediction. Sensors 2024, 24, 8046. https://doi.org/10.3390/s24248046

AMA Style

Sun T, Hu H, Zhang S. Load-Balanced Dynamic SFC Migration Based on Resource Demand Prediction. Sensors. 2024; 24(24):8046. https://doi.org/10.3390/s24248046

Chicago/Turabian Style

Sun, Tian, Hefei Hu, and Sirui Zhang. 2024. "Load-Balanced Dynamic SFC Migration Based on Resource Demand Prediction" Sensors 24, no. 24: 8046. https://doi.org/10.3390/s24248046

APA Style

Sun, T., Hu, H., & Zhang, S. (2024). Load-Balanced Dynamic SFC Migration Based on Resource Demand Prediction. Sensors, 24(24), 8046. https://doi.org/10.3390/s24248046

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop