3.1. System Model
3.1.1. Digital Twin-Assisted Network Architecture
This manuscript focuses on a software-defined network (SDN) and NFV-based network, which contains access, edge, and core networks. Access and core networks may belong to one network resource provider’s domain or cross multiple network resource provider’s domains. Each domain has a centralized SDN controller, which can obtain complete knowledge of the domain’s network topology, resource capacity, resource demands, and so on [
17]. The network architecture is described in
Figure 1. It is assumed that the overall multi-domain is centralized by a global orchestrator SDN controller, and it can obtain a global view of the multi-domain network and knowledge of multi-domain resources status by collecting information from each domain’s centralized SDN controller [
18,
19]. The network edge has multiple types of end nodes, including hosts, data center servers, mobile phones, and other devices. The network service demands of these sites and devices are aggregated and submitted to various network service providers, who, in turn, make resource usage requests to network resource providers in the form of a service function chain (SFC). Network service providers use the resources of the network resource providers to deliver services to users. Service providers may offer the same services or different ones. Network traffic flows are transmitted to different users’ devices and data centers via the core network. In the whole physical network, physical nodes are assumed to be divided into five categories: (1) NFV-enabled nodes, (2) transport nodes, (3) digital twin nodes, (4) access nodes, (5) end nodes. NFV-enabled nodes process and transmit VNFs that are required for services, transport nodes can only transmit traffic flows, twin nodes collect and synchronize the data of VNFs, access nodes can transmit traffic flows in the access network, while end nodes request a service or provide a service [
10]. Due to privacy concerns, network service providers are unwilling to share information about network traffic, and network resource providers are also reluctant to share information about the usage of underlying network resources. Therefore, network service providers may have their own digital twin nodes, which can collect dynamic network traffic information. At the same time, network resource providers also have their own digital twin nodes, which monitor VNFs in real time and prevent the service failure caused by VNF operation failure. In this way, the digital twin node of each network service provider has network traffic information of their own, and the digital twin node of the network resource provider has information about network resource usage. The network traffic of each SFC varies dynamically according to higher resource demands to minimize the need for SFC reallocation (VNF migration), which results in the overuse of network resources. At the same time, this cannot completely avoid resource reallocation. To alleviate this issue, recent approaches suggest dividing the day into multiple time slices, predicting the network traffic peak within each time slice before each time slice begins, and proactively reallocating resources. Considering the data privacy of each provider, a new framework can be applied to independently train each network traffic prediction model of each network service provider and share the local model training results with the network resource provider. The global model calculates the costs and benefits of the underlying physical network based on mappings and migrations, which are used to train the global model. During the prediction phase, network traffic is predicted based on the physical network resources of the network resource provider. We utilize the SDN controller to manage network data flows [
20]. It is assumed that the SDN controller can gather information on the VNFs through the network resource provider’s digital twin nodes [
10]. SDN controllers collect information from core network nodes (transport nodes, NFV-enabled nodes, and the digital twin nodes) and access network nodes (access nodes), such as locations, resource usage, and SFC mappings, in order to improve the quality of service (QoS) of SFCs. This framework can physically isolate privacy information while enabling cooperation.
3.1.2. Physical Network
The physical network is represented by an undirected graph , where V and E stand for the node set and the link set, respectively. V consists of the transport node set , the NFV-enabled node set , the access node set , the end node set , and the digital twin node set , i.e., and . Transport nodes have forwarding devices and only forward traffic flow to other nodes, NFV-enabled nodes are data centers (DCs) that can not only forward flow but also carry VNF instances, access nodes can forward traffic flow between the core network and the edge network, end nodes are terminal nodes that belong to users or service providers, while digital twin nodes can construct DTs of a physical network and cannot deploy VNFs. Digital twin nodes that belong to network resource providers are represented by , and digital twin nodes that belong to network service providers are represented by . Each NFV-enabled node is endowed with certain computing, processing, and storage resources, and each digital twin node is endowed with certain computing and storage resources. We use to denote the computing capacity of the NFV-enabled node , we use to denote the computing capacity of the digital twin node , and we use to denote the computing capacity of the digital twin node . When , the computing capacity or of access node or transport node is 0. Each NFV-enabled node and digital twin node is endowed with a certain storage resource. We use to denote the storage capacity of the NFV-enabled node , we use to denote the storage capacity of the digital twin node , and we use to denote the storage capacity of the digital twin node . Each NFV-enabled node is endowed with a certain processing resource. We use to denote the processing capacity of the NFV-enabled node . We use to represent the neighbor node set of the node , we use to represent the neighbor node set of the node , and we use to represent the neighbor node set of the node . Each link () has a bandwidth capacity () and a link propagation latency ().
3.1.3. Scenario
Different physical nodes may have different constraints on the allowable VNF types and the allowable number of VNF instances. Each physical node can instantiate multiple VNFs, and each VNF can instantiate multiple instances on the physical node. We use to represent if the VNF can be instantiated on the physical node , and to represent the maximum number of VNF ’s instances that can be instantiated on node . Each instance on a physical node consumes CPU resources, and the basic computing resource requirement for instance is represented by . Each instance on a physical node consumes storage resources, and the basic storage resource requirement for instance is represented by . Each instance instantiated on a physical node has a processing capacity, and the total processing capacity of the instances on the physical node cannot exceed the total processing capacity of the physical node. The total processing capacity of a physical node is represented by , and the processing capacity of each instance on the physical node is represented by . Additionally, some VNFs can be shared among multiple SFCs on the physical node, while others cannot. Therefore, the set of VNFs V consists of the shareable set and the non-shareable set . The VNF instance can be shared among multiple different SFCs, and the VNF instance in can’t be shared among different SFCs.
3.1.4. Service Function Chain Request
There is a set of service function chain requests (SFC requests) R in the physical network, and a set of network service providers Q. These SFCs belong to different network service providers. We assume that there are SFCs, and they belong to network service providers. The SFC requests can be represented as , and the network service providers can be represented as . We use to represent if SFC request belongs to network service provider . An individual SFC request can be represented by an ordered sequence of VNFs, the delay requirement, the packet size, and the resource requirements, which is represented as . In the SFC request , represents the VNFs of the SFC request, and represents the kth VNF of the SFC, represents the number of VNFs in the SFC request . represents the end-to-end latency requirement of the SFC request and represents the packet size of the SFC request . represents the resource requirements of the SFC request , represents the bandwidth requirement of the SFC request , represents the resource requirements of the VNF in the SFC request , represents the computing requirement of the VNF , represents the storage requirement of the VNF , and represents the processing requirement of VNF .
3.1.5. Digital Twin
Digital twin nodes have two kinds of categories: one belongs to network service providers to collect user requests, understand service network traffic, and other information; the other belongs to network resource providers to collect the mapping and migration status of the underlying physical network and the resource usage of the underlying physical network. Digital twin nodes that belong to network service providers are connected to end nodes, while digital twin nodes that belong to network resource provider are connected to transport nodes, NFV-enabled nodes, and access nodes. We use to represent a digital twin node that belongs to a network resource provider. We use to represent if VNF is mapped to the node which is associated with digital twin node , and to represent the same relationship in the previous time slice. The packet for collecting or updating data of VNF between the NFV-enabled node and the digital twin node passes a physical path. If physical link belongs to the physical path, we set , otherwise, we set . Constructing the DT of nodes (such as NFV-enabled nodes, access nodes, and transport nodes) consumes resources of CPU, storage, and bandwidth. represents the CPU resources to construct the DT of VNF in the digital twin node for the NFV-enabled node , represents the CPU resources to construct the DT in the digital twin node for the transport node , and represents the CPU resources to construct the DT in the digital twin node for the access node . represents the storage resources to construct and maintain the DT of VNF in the digital twin node for the NFV-enabled node , represents the storage resources to construct and maintain the DT in the digital twin node for the access node , and represents the storage resources to construct and maintain the DT in the digital twin node for the transport node . represents the bandwidth resources to transmit the DT data of VNF between the physical node and the digital twin node , represents the bandwidth resources to transmit the DT data between the access node and digital the twin node , and represents the bandwidth resources to transmit the DT data between the transport node and the digital twin node . represents the digital twin node that belongs to the network service provider, and represents if the digital twin node belongs to the network service provider . Constructing the DT of end nodes (such as user nodes, service nodes, and IoT devices) also consumes resources of CPU, storage, and bandwidth. represents the CPU resources to construct the DT in the digital twin node for the end node . represents the storage resources to construct and maintain the DT in the digital twin node for the end node . represents the bandwidth resources to transmit the DT data between the end node and the digital twin node .
3.1.6. Problem Objectives
Given the physical network, and SFC requests, the problem is to predict network traffic for migrating SFC requests while achieving the following goals.
(1) Maximize the Acceptance Ratio of SFC Requests: A higher acceptance ratio of SFC requests brings higher revenue from network service providers.
(2) Minimize the Cost of Migration: Under the premise of maximizing the acceptance ratio of SFC requests, minimize the migration cost, mainly the latency of migration.
(3) Minimize Energy Consumption: Under the premise of maximizing the acceptance ratio of SFC requests, minimize the energy consumption, mainly including the energy consumption of NFV-enabled nodes.
3.3. Markov Decision Process
The knowledge layer is the core of the framework, where the DRL module of network resource providers solves the whole network traffic prediction problem. In this section, the network traffic prediction problem is modeled as a Markov Decision Process.
(1) States: The states include the state of the physical network and the state of the network traffic. The state of the physical network is stored in a network status database of the store collector that belongs to the network resource provider, and the state of the complete network traffic is stored in a history database of the store collector that belongs to the network service provider. They are separately described below.
State of Physical Network: We define this state as a composite of nodes, instances, and links where SFC requests are embedded, and available resources of nodes, instances, and links. This state includes seven kinds of record matrices: the number record matrix of VNF instance
, the placement record matrix of VNF instance
, the mapping path record matrix of virtual link
, the record matrix of residual processing capacity
, the record vector of residual CPU capacity
, the record vector of residual storage capacity
, and the record symmetric matrix of residual bandwidth capacity
. In
N, each element
is a single value, where
indicates the current number of VNF
’s (
) instances in the NFV-enabled node
. In
, if
, each element
is a single value, where
denotes the current SFC request of VNF
’s (
) deployment in instance
. If
, each element
is an
-dimensional vector, where
indicates the VNF
(
) of SFC
(
) is deployed in instance
, otherwise, the value of
is 0. In
, each element
is a physical path. If VNF
and VNF
embed to the same physical node, then,
is null. In
, each element
is a single value, where
denotes the current residual processing capacity of VNF
’s (
) instance
in NFV-enabled node
. In
, each element
is the current residual CPU computing capacity of the NFV-enabled node
. In
, each element
is the current residual storage capacity of the NFV-enabled node
. In
, each element
is the current residual bandwidth capacity of the physical link
. If link
does not exist, the element
equals 0.
State of Network Traffic: Each network service provider has the complete network traffic information of their own. They are stored in their history database of the store collector. During the training process, records can be extracted from their history database, and each record is specified by a time slice. The record in the history database can be defined as follows. In
, each element
is the current residual storage capacity of the NFV-enabled node
. In
, each element
is the current residual bandwidth capacity of the physical link
. If link
does not exist, the element
equals 0.
In Equation (
45),
indicates the latest history network traffic in a given time period for the specified time slice
, where
indicates the date of the specified time slice, and
indicates the serial number of the time slice in date
.
represents the actual network traffic for time slice
, and
represents the actual network traffic for time slice
which is the previous time slice of
.
(2) Action: The action is the predicted value of network traffic for the specified time slice. We combine low-rank adaptation and deep reinforcement learning to solve the network traffic prediction problem. Prediction operations are performed in the actor network of the global model and the neural network of the local model.
The input of the local model is the network traffic
owned by each network service provider, and the output is the prediction value of network traffic
. This process can be expressed as follows:
The network resource provider knows the combined traffic from SFCs of the same type; however, it does not know the individual parts. Therefore, the global model uses the total SFC traffic as its input. The local models use their own components as the input. In the actor network of the global model, the input is the actual total network traffic
from each SFC of the same type, and the output is the new prediction value of the total network traffic
for SFC type
i. This process can be expressed as follows:
In the critic network of the global model, the main task is to evaluate the prediction performance. The input of the critic network is the prediction values of network traffic
from the global model, the actual total network traffic
, and the physical network status
. The output is the evaluation value. This process can be expressed by the following equation:
(3) Reward: In our proposed framework, the reward represents the network performance, and it needs to be considered from four aspects: migration cost, energy consumption, request acceptance revenue, and request failure penalty.
When the network traffic prediction value is greater than or equal to the actual peak value, there will be no passive migration within the time slice. Migration will only be actively initiated at the beginning of the time slice. Then, the migration cost is defined as the sum of the migration delays during active migration; the energy consumption is the network power consumption after the active migration; the request acceptance revenue is the total amount of resources of the SFC requests accepted after the active migration; and the request failure penalty is the total amount of resources of the SFC requests that fail to map after the active migration. Thus, the network reward is
When the network traffic prediction value is less than the actual peak value, passive migration will occur within the time slice. In this case, both active migration and passive migration occur. The migration cost is the sum of the migration delays during active migration and passive migration; the energy consumption is the network power consumption under the actual peak value of network traffic; the request acceptance revenue is the total amount of resources of the SFC requests accepted after passive migration; and the request failure penalty is the total amount of resources of the SFC requests that fail to map after passive migration. Thus, the network reward in this case is
The actual reward value, however, must be calculated on a case-by-case basis.
and
are the same. The sole distinction lies in the computation of the passive migration latency between
and
, which is performed in the same manner as described in
Section 3.2.3 (2).
Specifically, we formulate the network traffic prediction as an optimization problem, balancing migration latency, operation cost, and acceptance ratio to find the optimal predicted value. To achieve this objective, we first define and solve the VNF migration problem for evaluating the effect of the network traffic prediction. Secondly, we apply the LoRA and DDPG model to protect the data privacy of all merchants. With the aforementioned MDP model, we characterize the system state, action, and reward. Finally, we need to design an efficient network traffic prediction algorithm which can predict the optimal value of network traffic. The details of the algorithms will be described in
Section 4.