Optimizing Traffic Engineering for Resilient Services in NFV-Based Connected Autonomous Vehicles

The massive amount of data generated daily by various sensors equipped with connected autonomous vehicles (CAVs) can lead to a significant performance issue of data processing and transfer. Network Function Virtualization (NFV) is a promising approach to improving the performance of a CAV system. In an NFV framework, Virtual Network Function (VNF) instances can be placed in edge and cloud servers and connected together to enable a flexible CAV service with low latency. However, protecting a service function chain composed of several VNFs from a failure is challenging in an NFV-based CAV system (VCAV). We propose an integer linear programming (ILP) model and two approximation algorithms for resilient services to minimize the service disruption cost in a VCAV system when a failure occurs. The ILP model, referred to as TERO, allows us to obtain the optimal solution for traffic engineering, including the VNF placement and routing for resilient services with regard to dynamic routing. Our proposed algorithms based on heuristics (i.e., TERH) and reinforcement learning (i.e., TERA) provide an approximation solution for resilient services in a large-scale VCAV system. Evaluation results with real datasets and generated network topologies show that TERH and TERA can provide a solution close to the optimal result. It also suggests that TERA should be used in a highly dynamic VCAV system.


Introduction
Recently, emerging Internet of Things (IoT) applications, such as connected autonomous vehicles (CAV), smart home, mobile augmented reality, smart agriculture, became increasingly popular [1]. A CAV system relies upon computer vision using a series of video cameras, radars, and Light Detection and Ranging (LIDAR) that allow the car to perceive the world around it. The system processes a massive amount of data collected from sensors to provide its application services composed of several application functions, including video capturing, sensor fusion, object tracking, localization, path planning, and control components. For example, a video camera on an autonomous car could generate hundreds of gigabytes in an hour of driving for a 720p video. A critical issue for a CAV system is how to transfer and process the massive amount of data generated daily in a timely fashion.
Network Function Virtualization (NFV) has been raised as a promising approach to tackling this issue [2]. A virtualized network function (VNF), including a traditional network function and general computation task, can be deployed as an instantiable software component running in a commercial off-the-shelf server. A VNF instance can be placed in edge devices to enable application services with low latency. Several VNFs on different edge and cloud devices can be connected as a service function chain (SFC) to enable realtime and flexible services. While an NFV-based CAV system, referred to as VCAV, is able to provide a flexible CAV service with low latency, it is challenging to protect a service from any system failure.
The issue of resilient services has been discussed in a specification published by The European Telecommunications Standards Institute (ETSI) [3]. The main challenge of providing a resilient service in a VCAV system is to optimize the placement and routing of VNFs in response to a failure in an NFV infrastructure (NFVI). Previous researches have considered various techniques to address several aspects of the resilient service problem [4][5][6][7][8][9]. However, all previous approaches could not be applied to a VCAV system due to the high dynamics of VCAV data traffic. In addition, most previous work assumes a fixed mapping of routing paths onto NFVI and implicit paths connected between VNFs in an SFC, which is not practical. Our work aims to optimize traffic engineering, including the VNF placement and routing for resilient services in a VCAV system, to minimize the service disruption cost, considering the dynamics of routing paths and service function chaining.
The main contributions of this paper are three-fold: • We proposed an integer linear programming (ILP) model for the resilient service problem, referred to as TERO. The TERO model provides the optimal VNF placement and routing for a set of service demands when a failure occurs in a VCAV system. • We developed a heuristic algorithm (i.e., TERH) and Reinforcement Learning (RL) based algorithm (i.e., TERA) to find an approximation solution for the resilient service problem in an extensive network. The approximation solution provided by TERH and TERA is close to the optimal solution. In comparison with TERH, TERA can achieve a similar cost with significantly reduced time in a dynamic failure scenario. • We validate our proposed models and algorithms in real datasets and generated network topologies. The evaluation results suggest that TERA should be used to minimize the service disruption cost of a VCAV system concerning the high dynamics of data traffic.
The rest of the paper is organized as follows. Section 2 reviews some related works. Section 3 describes the system and states the optimization problem of traffic engineering for resilient services in a VCAV system. Section 4 presents the TERO model that provides the optimal traffic engineering for resilient services, including VNF placement and routing when a failure occurs in a VCAV system. Section 5 describes the TERH and TERA algorithms based on heuristic and reinforcement learning to find an approximation solution for the resilient service problem. We present the evaluation of our proposed model and algorithms in In Section 6. Finally, the conclusion is presented in Section 7.

Related Work
Network Function Virtualization (NFV) has been raised as a potential approach to a flexible and efficient solution for processing a massive volume of data in an IoT system. For the evolution of an IoT system with NFV, we refer the readers to [10]. The reliability of services on the Internet and in an IoT system is a crucial problem that has been widely studied (e.g., [11][12][13][14][15][16]). However, existing solutions are not suitable for an NFV-based IoT system, where functional modules can be deployed in different data centers and connected together to create a flexible service.
The challenge of developing a solution for resilient services in an NFV-based IoT system is to find the optimal resource allocation for data routing and processing when a failure occurs in a distributed system. So far, a few studies have considered the design of a resilient NFV-based IoT system [17][18][19][20]. Huang et al. devised a proactive fail-over mechanism based on failure prediction to enhance the resilience of NFV services deployed in a distributed edge network [17]. Ergenc et al. analyzed the complexity and boundaries of the problem as well as developed heuristics to increase the fault tolerance of an IoT network when there are some node and link failures [18]. Bakhshi et al. proposed a mathematical model for an SDN-based fault-tolerant architecture in an IoT environment [19]. Sanabria et al. used machine learning techniques to provide prediction and alert capabilities for telemedicine applications [20]. They proposed a hybrid Edge/Cloud architecture training of the deep learning prediction model. Several optimization models have been proposed for resilient services in the Mobile Edge Computing [21]. However, these solutions lack considering service function chaining that is a key feature of NFV. In addition, the dynamic routing path has not been tackled while it is an essential feature of a CAV system. Some studies have discussed an efficient design for data processing and routing in a VCAV system (e.g., [22][23][24]) However, a solution to the traffic engineering problem for resilient services in a VCAV system has not been provided.
This paper offers a new ILP formulation for a traffic engineering solution, including the VNF placement and routing when node or link failures occur in a VCAV system. Moreover, we take into account the dynamic routing path at the request time. We also propose two algorithms based on heuristics and reinforcement learning to provide an approximation solution in a large-scale VCAV system.

System Description
In a VCAV system, system and safety functions are deployed locally in an autonomous vehicle. Light workload functions such as planning and sensor fusion can be run in edge nodes. Some functions that require a heavy computational task and process a massive volume of data collected from many cars can be implemented in the cloud layer ( Figure 1). These functions can be connected in an order list to create an application service. An example of an SFC in a VCAV system is sensor fusion, world model, behavior generation, planning, and vehicle control. A VCAV system allocates its resource in the edge and cloud layers for a set of service demands required by vehicles. When a failure occurs, system resources are rapidly reallocated to maintain application services supplied to vehicles. A CAV system based on NFV includes three main elements: the NFV infrastructure (NFVI), the VNFs, and the management and orchestration of NFVs (MANO). NFVI consists of the shared and virtualized resources of physical networking, computing, and storage. A VNF can be any functional module of a VCAV system, e.g., sensor fusion, world model, and planning. The MANO element handles all automatic processes for loading and managing VNFs. A traffic engineering solution for resilient services can be incorporated into the NFV architecture as a part of MANO.
We represent a VCAV system as a directed graph G = (V, E) where V and E denote the set of physical nodes and links. We define r n v to be node v's resource capacity, and r l e to be link e's bandwidth capacity. The beginning node and ending node of link e are denoted by i e and j e . The node's processing resource considered in this work is the number of CPU cores. We can use similar formulas of the processing resource in the model to include additional types of resources (e.g., memory, storage). We represent different network topologies by setting the parameters of links and nodes in the model. We denote by F the set of VNF types. η u is the number of cores required by VNF type u ∈ F to process a traffic volume. The routing delay β v is the time duration needed by node v to route an amount of traffic. The processing delay µ vu is the time duration required to provide VNF type u at node v. We denote by w = (w e ) the weight vector of NFVI where w e is an integer number representing link e's weight. We define λ n v to be the failure state of node v, and λ l e to be the failure state of link e. λ n v = 0 if node v fails, otherwise λ n v = 1. λ l e = 0 if link e fails, otherwise λ l e = 1. A link failure can be caused by hardware problems, software issues (e.g., too many connections, configuration changes, denial of service attacks), or the mobility of vehicles.
We define Ω = {S i } to be all system-supported SFC. An SFC is denoted by S i = u i1 , . . . , u ij , . . . , u in where u ij is the jth VNF of SFC S i . The service demand set is denoted by Γ = {d}. The parameter set of service demand d ∈ Γ includes arrival node s d , departure node t d , SFC S d ∈ Ω, SFC delay α d , and bandwidth volume b d . An arrival node is an NFV node that provides an entry of a service demand into a VCAV system. A departure node is an NFV node at which the demand traffic leaves a VCAV system. A middle node is an NFV node between an arrival node and a departure node on an SFC path realizing a service demand. An NFV node either provides a VNF instance or routes traffic of a service demand.
When a failure happens, a VCAV system needs to modify some paths of service demands and VNF placement on these paths to meet the requirement of service demands and avoid an overload of some nodes. The process is referred to as the traffic engineering problem for resilient services. Optimizing VNF placement and routing could significantly impact the cost efficiency and performance of a VCAV system. The problem is stated as follows: Problem 1 (Traffic Engineering for Resilient Services (TER)). Given a VCAV system G, find a traffic engineering solution for fulfilling a service demand set Γ, in order to minimize the system interruption when failures occur under constraints on service functions chaining and the restriction rule of routing reallocation.

Optimization Model for Resilient Services
We propose an optimization model based on ILP to find the optimal result of the TER problem. The model is referred to as TERO. The main variables of TERO are as follows: is the routing solution satisfying the service demand set when a failure occurs in a VCAV system. If demand d uses link e, x 2ed = 1, otherwise, x 2ed = 0. • y 2 = (y 2vdi ) is the VNF placement solution in the failure state. If node v provides the ith VNF of demand d, y 2vdi = 1, otherwise, y 2vdi = 0.
We summarize the main mathematical notations of TERO in Table 1.

Input Parameters
A directed graph representing a VCAV system where V and E is denoted the set of physical nodes and physical links, respectively. The number of CPU cores required by VNF type u ∈ F to process a volume of data traffic The current VNF placement solution: If node v provides the ith VNF of SFC S d , y 1vdi = 1, otherwise, y 1vdi = 0 The current routing solution: If demand d uses link e, x 1ed = 1, otherwise, x 1ed = 0 The weight vector of NFVI where w e is link e's weight.
Output variables The routing solution for satisfying demands in the failure state: If demand d uses link e, x 2ed = 1, otherwise, x 2ed = 0 The VNF placement solution in the failure state: If node v provides the ith VNF of SFC S d , y 2vdi = 1, otherwise, y 2vdi = 0

Auxiliary variables
If a node between s d and v on the path realizing demand d provides the ith VNF of demand d (i.e., u di ), If link e is on the path realizing demand d, and a node between s d and i e on the path provides u di ,ȳ σ 2edi = 1, otherwiseȳ σ 2edi = 0.

Service Function Chaining Routing
The four conditions of service function chaining routing in a VCAV system are as follows: the flow balance, function provision, function chain, and delay constraints. The flow balance condition guarantees to conserve the flow traffic of a service demand along its path. The function provision condition assures that the VCAV system provides all VNFs of a service demand. The function chain condition guarantees that all VNFs of a service demand are connected in sequence. The delay constraint assures the fulfillment of the end-to-end delay of an SFC.
We define l v 1 v 2 to be the length of the path from node v 1 to node v 2 . Let θ be a large number. The balance condition is as follows: Equation (1) guarantees that there is one entering flow and one leaving flow at a middle node on the path of a service demand. Equation (2) assures that there is one leaving flow at the departure node of a service demand. Equation (3) assures that there is one entering flow at the arrival node of a service demand. Equation (4) guarantees that there is no cycles in a service demand path.
The function provision condition is as follows: y 2vdi ∑ {e:i e =v or j e =v} x 2ed , ∀v, ∀d, ∀i.
Equation (5) assures that the VCAV system provides all VNFs required by a service demand. Equation (6) ensures that the VCAV system only selects a node on the path of demand d to allocate a VNF for the demand.
To represent the function chain condition, we add two additional binary variables y σ 2vdi andȳ σ 2edi . If a node between s d and v on the path realizing demand d provides the ith VNF of demand d (i.e., u di ), y σ 2vdi = 1, otherwise y σ 2vdi = 0. If link e is on the path realizing demand d, and a node between s d and i e on the path provides u di ,ȳ σ 2edi = 1, otherwisē y σ 2edi = 0. The constraint is as follows: y σ x 2ed , ∀v, ∀d, ∀i, y σ 2edi y σ 2i e di , ∀e, ∀d, ∀i.
Equation (7) guarantees that node v supplies demand d with u di if and only if u d(i−1) is fulfilled by either node v or its preceding node that belongs to the demand d's path. Equation (8) guarantees that y σ 2vdi = 1 if and only if u di is delivered by a node between s d and v and the node belongs to the path realizing demand d. Note that we have the sum of y σ 2edi on the right-hand side of Equation (8) because there might be several incoming links of node v. Equations (9) and (10) assures thatȳ σ 2edi = 1 if and only if link e belongs to the path realizing demand d, and VNF u di is deployed at either i e or its preceding node that belongs to the demand d's path. The SFC delay represents the sum of the routing delay and VNF processing delay at every node that belongs to the demand path. We express the condition as follows:

Restriction Rule in Flow Reallocation
First, the demand traffic cannot routed through a failed node or link. The condition is as follows: Equation (12) guarantees that the total traffic of all demands passing through a link cannot surpass its bandwidth capacity. Equation (13) guarantees that the number of cores that a node allocates to the VNFs of all demands cannot surpass the node capacity. Note that when a node and link fail, the system loses all capacity of the node and link.
Second, the resource allocation for a service demand without failures on its paths should not be changed. We introduce three binary variables λ e , ϕ ed and ϕ σ d . λ e = 1 if and only if a failure occurs on link e, at node i e , or at node j e . ϕ ed = 0 if and only if λ e = 1 and link e is on the path realizing demand d.
λ n i e λ e , ∀e, λ n j e λ e , ∀e, λ e λ l e + λ n i e + λ n j e , ∀e, Equations (14)- (17) guarantee that λ e = 1 if and only if we have either λ l e = 1, λ n i e = 1, or λ n j e = 1, and λ e = 0 if and only if we have λ l e = 0, λ n i e = 0, and λ n j e = 0. Equation (18) guarantees that the routing solution for demand d does not change if there is no failures on its path. Equation (19) ensures that ϕ σ d = 0 if and only if ϕ ed = 0 for one of links along the path used by demand d. Equations (20) and (21) ensure that ϕ ed = 0 if and only if λ e = 1 and x 1ed = 1.

Objective Function
Our objective is to minimize the service disruption cost. The service disruption cost of a service demand is the cost of moving its VNF state and data to a new node. It is in proportion to the time required to provide all services normally. Its unit of measurement is a derived unit of time. We denote by γ vv u di the cost when moving ith VNF of demand d from v to v . Let ρ vv be the cost of the minimum-weight path from v to v . κ u is the size of the state and data of a VNF type u.
The service disruption cost of a VNF instance is given by: We add an additional variable z vv di to compute the service disruption cost in a VCAV system when a failure occurs. In a failure state, if the ith VNF of SFC S d is moved from node v to node v , z vv di = 1, otherwise, z vv di = 0. Let y 1 = (y 1vdi ) be the current VNF placement solution. If node v provides the ith VNF of SFC S d , y 1vdi = 1, otherwise, y 1vdi = 0. The constraints on the value of z vv di are given by: z vv di y 2v di , ∀v, ∀v , ∀d, ∀i, Equations (23)- (25) guarantee that v , z vv di = 1 if and only if we have y 1vdi = 1 and y 2v di = 1, otherwise, z vv di = 0.
The service disruption cost of a resource allocation solution when a failure happens is given by:

ILP Model for Resilient Services
The TER problem is to find a traffic engineering solution for minimizing a cost function of service disruption when a failure occurs in a VCAV system. The TERO model provides the optimal VNF routing and placement for the TER problem in a failure state. The formulation of the TERO model includes the objective function given by Equation (26) and the constraints given by Equations (1)-(25).

Approximation Algorithms
In the previous section, we proposed the TERO model to obtain the optimal solution for traffic engineering in a VCAV system when a failure occurs. An ILP solver is not able to handle a scenario with hundreds of nodes and thousands of demands since the number of variables in TERO comes to billions in such a large scenario. Hence, we propose two algorithms based on a heuristic approach and reinforcement learning to find an approximation solution for the TER problem in a large-scale VCAV system. The two algorithms use the similar input parameters of the TER problem, which are presented in Table 1.

Heuristic Algorithm
We propose a heuristic algorithm, namely TERH, based on the Simulated Annealing (SA). In TERH, we develop the structure of the resource allocation solution and the function of neighborhood selection for the TER problem. SA is a heuristic technique that finds the optimum for a global optimization problem [25]. The search method accepts a worse scenario with a certain probability of overcoming a local optimum.
We represent a resource allocation solution for a service demand set in a VCAV system as a list of tuples, which is denoted by O m = ((d, i, v) : d ∈ D, i ∈ S d , v ∈ V). The solution shows that node v provides the ith VNF of demand d. The details of the TERH algorithm are presented in Algorithm 1. while T ≥ T n do 7: for n ← 1 to φ do 8: repeat 9: Compute x 2 and constraints in a failure scenario 13: until O m is feasible 14: Compute y 2 from O m

15:
Compute y 2 from O m 16: if U(y 2 , y 1 ) < U(y 2 , y 1 ) then 17: if U(y 2 , y 1 ) < U(y * 2 , y 1 ) then 19: end if 23: else 24: ε ← a random number between 0 and 1 26: if exp(−∆/T) > ε then 27: The algorithm contains two main loops. The outer loop is controlled by the temperature parameter T, the start temperature parameter T 0 , the stop temperature parameter T n , and the cooling function C(T). For each T, the algorithm runs an inner loop that uses a neighborhood function to move from the current solution to another. T decreases by C(T) after one iteration of the outer loop. The algorithm completes its solution search when T is smaller than T n .
We define φ to be the number of iterations of the inner loop. Let O m be an initial solution. We use the most common cooling function C(T) = τT, for some parameter τ from interval (0, 1). The initial temperature is the maximal cost difference between any two neighbor solutions. The end temperature typically is close to zero.
In the neighborhood selection (i.e., line 8-12), we define the Replace(d, i, v, v , O m ) operator that substitutes node v for node v. We use the Replace operator for a random tuple (d, i, v) ∈ O m and random target node v repeatedly until we find a feasible solution.
In the inner loop, if the objective value of a neighborhood solution is less than that of a current solution, the iteration continues with the neighborhood solution as TERH is moving towards a better solution (i.e., lines [16][17][18][19][20][21][22]. Otherwise, TERH randomly accepts the neighborhood solution with a probability in order to overcome local optimization (i.e., lines [24][25][26][27][28]. The acceptance probability decreases with T for a given value of ∆. Hence, the uphill movement is more uncommon in a successive inner loop. After φ iterations, the inner loop finishes its solution search. After the temperature is decreased, the inner loop is started again. The approximation of TERH's solution can be controlled by adjusting the number of iterations φ and cooling function C(T).

Reinforcement Learning Based Approximation Algorithm
We propose a Soft Actor-Critic (SAC) based approximation algorithm, called TERA, to solve the TER problem in a large-scale VCAV system. SAC is a variant of actor-critic methods for reinforcement learning. It aims to maximize expected rewards and entropy in a large-scale continuous action space [26]. While earning as many rewards as possible, it attempts to take actions as randomly as possible. This encourages the search process to discover the environment, which accelerates training and decreases the probability of going back to a visited action.
The mathematical formulation of SAC is a Markov decision process with a set of parameters including the state space M, action space A, probability density p and reward function r. The probability density p, defined by M × M × A → (0, ∞], is the probability of the next state m t+1 ∈ M given the current state m t ∈ M and action a t ∈ A. The reward r, defined by M × A → [r min , r max ] is an environment reward of a state transition. SAC seeks a policy ω(m t |a t ) for maximizing the learning objective. The learning objective is the expected sum of rewards and the policy's entropy. SAC uses the hyperparameter λ, namely temperature, to adjust the association between the reward and the entropy in the learning objective. Let h be the entropy function with regard to the policy ω. The formulation of the learning objective of SAC is as follows: The primary step of developing a solution based on SAC is to formulate the three key parameters: The state space, action space, and reward function. In TERA, the state space should represent how a set of demands is satisfied when a failure happens. Hence, we formulate it by a list of tuples, An element of m t show that node v provides VNF u of service demand d in a failure scenario. An action in TERA makes a movement between states, representing a possible resource allocation solution.
We represent an action by a t = v 1 , v 2 , . . . , v |m t | where v i ∈ V is a resource allocation solution for the ith tuple in the action space. As TERA optimizes the learning policy to maximize the learning objective, we use the objective function U a = −U to compute the reward of a solution. Hence, we can evaluate the solution's cost efficiency and learning policy produced by TERA for minimizing the service disruption cost.
We present the main steps of TERA in Algorithm 2. The actor network returns a resource allocation action according to an input state. The NFV environment runs the action to move to a new state. The critic network uses the new state, its reward and the previous state to compute the advantage of the new state, which is used to update the weights of the actor and critic networks. The role of the critic network is the actor's loss function. We implement the actor and critic networks as neural networks. We will discuss some details of selecting their parameters in Section 6.1.

Evaluation
We evaluate the service disruption cost and computation time of our proposed solution approaches for traffic engineering in a failure scenario of a VCAV system. We used the optimal solution obtained by TERO as a baseline solution for evaluating the approximation solution achieved by TERH and TERA.

Scenarios and Parameters Setting
Our objective is to evaluate the performance of TERO, TERH and TERA with respect to the service disruption cost and computation time when we consider various network topologies. The three main evaluation questions are as follows: What is the gap between the optimal results and approximation solutions? How do different solution approaches respond to the dynamics of failure scenarios? Can TERH and TERA efficiently provide a VNF placement and routing solution in a large-scale scenario when a failure occurs? We use eight topologies in our evaluation. Note that it is the diversity and size of topologies that affect the answer to our questions rather than a specific topology. The first topology, referred to as Abilene, is the US backbone network composed of 12 nodes and 15 links, described in the Abilene dataset [27]. The second topology, namely Geant, is the Europe backbone network of 22 nodes and 36 links, presented in the Geant dataset [28]. The other topologies are synthetic topologies based on random graph generation algorithms, including the Barabási-Albert (BA), Waxman (WA), Erdős-Rényi (ER) models [29]. We create a small topology composed of 50 nodes and a large topology composed of 200 nodes for each random graph generation algorithm. The random graph generation tool is FNSS [30]. A BA topology is created with four nodes at first. A new node is added by connecting to four preceding nodes. The link density probability used to create a WA topology is 0.9. The edge generation probability used to create an ER topology is 0.2. We denote the small and large BA topologies by BA1 and BA2, the small and large WA topologies by WA1 and WA2, and the small and large ER topologies by ER1 and ER2. In a failure scenario, we randomly generate one node and link failure in a network topology.
We randomly create 15 demands in the Abilene and Geant topologies and 100 service demands in the BA, WA, ER topologies. The arrival and departure nodes of a service demand are randomly selected. The SFC delay is varied between one and thirty milliseconds. The range of the bandwidth demand is between 1 Gbps and 5 Gbps. We consider four types of VNFs. The number of CPU cores demanded by a VNF type for one volume of traffic is varied between one and two cores. The SFC of a service demand is randomly selected in four VNF types. We assign a bandwidth value of 80 Gbps to the capacity of all links. The edge and cloud nodes are randomly selected. The cloud node capacity is 200 cores. The edge node capacity is 50 cores. At a node, the processing delay of a VNF and the routing delay for a traffic unit is randomly generated between 10 and 100 microseconds. The value of link weight is varied between 1 and 3.
We now look at how to choose hyperparameters for the implementation of our proposed algorithms. In TERA, the temperature hyperparameter is automatically configured as described in [31]. We chose two layers for the actor and critic networks because we did not obtain a better policy when the number of layers increases beyond two. After running TERA with a varying number of neurons, we chose 32 neurons for each layer of the actor and critic networks because the policy did not significantly improve when we used a bigger value.
In TERH, the value of the end temperature is 0.1. For each temperature, the number of neighbor selections is φ = 100. For comparison purposes, we select the parameter τ of the cooling function so that the iteration number of TERH and that of TERA is similar. The parameter τ is computed as follows: where φ a = 8000 since TERA can obtain a steady policy after eight thousand iterations. We used an x86 computer in our evaluation. Its hardware configuration is a four-core 2.60 GHz Intel processor with 8 GB memory and an NVIDIA GeForce GTX 850M card. We solved TERO in CPLEX [32]. We implemented TERH in Java and TERA in Python with TensorFlow [33].

Evaluation Results
First, we compare the performance of different solution approaches when a failure scenario is fixed. In a fixed failure scenario, we compute the service disruption cost and computation time in only one failure scenario. We consider limited-size scenarios, including the Abilene, Geant, BA1, WA1, and ER1 topologies, to compare approximation solutions with optimal results. Figure 2a shows that TERO is better than TERH and TERA in terms of the service disruption cost, but the difference is marginal. We also observe that TERH and TERA can archive similar service disruption costs after 8000 iterations. In Figure 2b, the computation time of TERA and TERH is higher than that of TERO, and the computation time of TERA is slightly higher than that of TERH.
Second, we compare the performance of different solution approaches when a failure scenario is changed. Specifically, we consider 8000 failure scenarios in our evaluation. We compute the service disruption cost and computation time in each failure scenario and plot their average value. Figure 3a shows that TERO, TERH, and TERA archive similar service disruption costs. In Figure 3b, we use a base 10 logarithmic scale for the y-axis and a linear scale for the x-axis to illustrate a variation in the computation time of TERO, TERH, and TERA. The figure shows that the computation time of TERA is significantly smaller than that of TERO and TERH. It is because TERA can remember its policy learned from previous data while TERO and TERH are required to solve the TER problem for an individual failure scenario.
Finally, we evaluate the TERH and TERA performance in a large-scale VCAV system when a failure scenario is changed. Figure 4 plots the service disruption cost and computation time for the BA2, WA2, and ER2 topologies with 200 nodes. In such large-scale topologies, CPLEX cannot solve the TERO model to find the optimal solution. In Figure 4b, we use a base 10 logarithmic scale for the y-axis and a linear scale for the x-axis to plot the computation time. We observe that TERA is significantly faster than TERH. The service disruption cost of TERH is slightly smaller than that of TERA, but it is negligible. It suggests that we should use TERA to protect service demands from a failure in a real-time VCAV system.

Conclusions
We studied the optimization problem of traffic engineering for resilient services in a VCAV system. We proposed an ILP model (i.e., TERO) to find the optimal VNF placement and routing when a node or link failure occurs. The model captures essential features of NFV such as service function chaining, the restriction rule of resource reallocation, and the exact placement and routing solution for the service demand set. We developed the TERH and TERA approximation algorithms based on heuristics and reinforcement learning to provide an efficient traffic engineering solution for resilient services in a large-scale VCAV system. The evaluation results show that TERO, TERH, and TERA can protect service demands from node and link failures. The approximation results provided by TERH and TERA are very close to the optimal results. The results also suggest that a network service provider should consider TERA to provide resilient services in a real-time VCAV system. Possible directions for extending our work comprise the consideration of various network technologies supporting a VCAV system, an evaluation of other network topologies and performance metrics, or an optimization model of a resilient service with a federation of several VCAV providers as in [9,34].