Proposal and Investigation of an Artiﬁcial Intelligence (AI)-Based Cloud Resource Allocation Algorithm in Network Function Virtualization Architectures

: The high time needed to reconﬁgure cloud resources in Network Function Virtualization network environments has led to the proposal of solutions in which a prediction based-resource allocation is performed. All of them are based on trafﬁc or needed resource prediction with the minimization of symmetric loss functions like Mean Squared Error. When inevitable prediction errors are made, the prediction methodologies are not able to differently weigh positive and negative prediction errors that could impact the total network cost. In fact if the predicted trafﬁc is higher than the real one then an over allocation cost, referred to as over-provisioning cost, will be paid by the network operator; conversely, in the opposite case, Quality of Service degradation cost, referred to as under-provisioning cost, will be due to compensate the users because of the resource under allocation. In this paper we propose and investigate a resource allocation strategy based on a Long Short Term Memory algorithm in which the training operation is based on the minimization of an asymmetric cost function that differently weighs the positive and negative prediction errors and the corresponding over-provisioning and under-provisioning costs. In a typical trafﬁc and network scenario, the proposed solution allows for a cost saving by 30% with respect to the case of solution with symmetric cost function.


Introduction
Network Function Virtualization [1] technology allows the implementation of software middleboxes located in data centers, referred to as Network Function Virtual Infrastructure-Point of Presence (NFVI-PoP), and running on virtual machines. In these last few years the problem of resource reconfiguration in NFV environments has been widely investigated [2][3][4][5][6]. The studies focused on reactive techniques based on which the network is reconfigured as soon as traffic changes occur [7][8][9][10]. Everyone agrees that reactive techniques are ineffective in relation to the high variability of traffic and the high time of cloud resource allocation. For this reason proactive reconfiguration techniques have been proposed where the traffic or the amount of resources needed is predicted [11]. Traffic and cloud resource prediction methodologies have been recently used in Network Function Virtualization environments for cloud and bandwidth resource allocation purposes. Both traditional and innovative prediction methodologies [12] have been proposed. For instance, Long Short Term Memory-based prediction techniques have been shown to be very effective in allocating resources. All of these techniques are based on the minimization of symmetric cost functions as the Mean Square Error (MSE) that equally weighs positive and negative prediction errors. However, the error sign can differently impact the cost increase due to prediction errors. For instance, when the Quality of Service degradation cost due to traffic loss is prevalent with respect to the cloud resource allocation cost, an algorithm is preferable that overestimates the offered traffic; conversely, the traffic underestimation is preferable in the opposite case when the cloud allocation cost is lower than the QoS degradation one.
To our best knowledge, only Bega et al. [13] proposes a solution for mobile network resource orchestration in which the different values of the over-provisioning and under-provisioning costs are taken into account. DeepCog is proposed, a framework for resource allocation to slicing in a 5G mobile environment. It is based on a deep learning technique in which the cost function attributes a rising cost as the amount of over-allocated resources increases and a constant penalty, which is independent of the lost traffic amount, when a QoS degradation occurs.
In this paper we propose a prediction technique, which, aware of the fact that traffic cannot be accurately predicted, tries to overestimate or underestimate traffic in relation to the values of over-provisioning and under-provisioning costs. This objective is achieved by minimizing an asymmetric cost function characterized by a parameter that takes into account the over-provisioning and under-provisioning costs. The principle of the proposed solution can be applied to any prediction technique and in this work it is applied to one that predicts traffic with a Long Short Term Memory (LSTM) prediction methodology.
The main contributions of the manuscript are the following: • The study and the investigation of a prediction-based resource allocation algorithm for NFV network environments; • The study and the investigation of an LSTM-based traffic prediction algorithm with an asymmetric loss function that optimally predicts the traffic values according to the over-provisioning and under-provisioning costs; • A performance comparison of the proposed solution to other ones proposed in literature and based on predictions with minimization of symmetric loss functions.
The paper is organized as follows. We describe the state-of-the-art, the problem statement and the prediction-based reconfiguration algorithm in Section 2. The traffic forecasting technique based on asymmetric traffic function is illustrated in Section 3. The numerical results, reported in Section 4, show the effectiveness of the proposed technique with respect to MSE-based traditional forecasting techniques in an NFV network environment.

State-of-the-Art
In NFV networks, a network service is composed of a set of Network Functions (NFs) connected in a specific order. As these NFs are implemented as VNFs, the VNF Forwarding Graph (VNF-FG) provides the logical connectivity between them. In other words, a VNF-FG defines the possible sequences of VNFs that the packets traverse between two endpoints, to realize an end-to-end service. The variability of traffic and services required leads to the need to define algorithms for the reconfiguration of cloud and bandwidth resources. The typical variations that could happen are as follows [14]:

•
The instantiation of new Virtual Network Function-Forwarding Graphs (VNF-FG); • The extension or the reduction of VNF-FGs already instantiated with the addition or the removal of VNFs; • The variation of required bandwidth of the current VNF-FGs.
The reconfiguration of NFV networks is based on various techniques of which the most important are those of:

•
Increasing and decreasing the cloud resources (CPU, memory, disk, etc.) assigned to Virtual Network Function Instances (VNFI) that support VNFs; it is possible to apply horizontal and vertical scaling techniques; the former are based on the increase (scaling in) and decrease (scaling out) of the number of Virtual Machines assigned to each VNFI; the latter are based on the increase (scaling up) or decrease (scaling down) of the cloud resources assigned to the single Virtual Machine in which the VNFI are executing the VNFs.

•
Migrating VNFIs to other servers or even other NFVI-PoPs with the application of the above-mentioned scaling techniques.
Reactive reconfiguration procedures are not adequate due to the high time required to reallocate cloud resources [11]. For this reason algorithms based on predictions have recently been proposed. There are two categories of solutions: the first one based on traffic prediction [15], the second one based on the prediction of resources to be allocated [12].
Some state-of-the-art solutions are reported in Table 1. They are compared in terms of prediction type (traffic or resource-based), prediction methodology and loss function characteristic to be minimized. Among the solutions based on traffic prediction, Li et al. [15] proposes a Deep Learning (DL) framework based on Long/Short Term Memory recurrent neural networks [16] to predict the VNF-FGs requests in an NFV network with NFVI-PoPs interconnected by an Elastic Optical Network; the arrival and hold-on times, the bandwidth, the originating and terminating nodes and the type of the SFCs are predicted. Among the solutions based on the prediction of the resources to be allocated, Farahnakian et al. [17] proposes regressive algorithms for estimating memory and processing consumption in cloud data centers; the proposed solutions are based on Linear Regression [18] and K-Nearest Neighbor Regression (K-NNR) [19] methods that notoriously determine the prediction by minimizing symmetric error functions.
Unfortunately, there are random components that are not predictable and that lead to an unavoidable prediction error. Such a mistake leads to higher operational costs. For example, if the predicted traffic is higher than the real traffic, the resources will be over-sized and this will lead to an over-provisioning cost; in the opposite case less resources will be allocated and this will lead to a QoS degradation and to an under-provisioning cost characterized by the compensation due to the user. To our knowledge only in [13] traffic prediction is performed taking into account the over-provisioning and under-provisioning cost. The authors use convolutional neural networks for the prediction, model only the datacenters and consider under-provisioning costs independent of traffic loss. Conversely we propose a solution in which: (i) both NFVI and transport infrastructures are modeled; (ii) the prediction is performed by using an LSTM recurrent neural network with an asymmetric loss function where the cost is dependent on the lost traffic amount. Table 1. Comparison between our proposal and the main related works.

Work Prediction Type Prediction Methodology Minimized Loss Function
Schneider et al. [12] Resource Linear Regression, Support Vector Machine Symmetric Li et al. [15] Traffic LSTM Symmetric Farahnakian et al. [17] Resource Regression Linear, K-Nearest Neighbor Regression Symmetric Bega et al. [13] Traffic Convolutional Neural Networks Asymmetric

Traffic LSTM Asymmetric
A preliminary result on the advantages of the traffic prediction with an asymmetric loss function has been investigated in [20] when the prediction is based on Seasonal Auto Regressive Integrated Moving Average (ARIMA) processes. In this manuscript we extend the proposed solution to the case of LSTM-based predictions. The following contributions are added in this manuscript:

•
An innovative prediction algorithm based on an LSTM recurrent neural network with an asymmetric cusp loss function is proposed; • The performance of electrical networks is investigated; conversely, resource allocation for Optical NFV networks is investigated in [20]; • Extensive numerical results are reported in which the operational costs of an NFV network with resource allocation based on symmetric and asymmetric LSTM are evaluated respectively; • New results are presented with respect to [20] in which the resource allocation is not only performed after on the prediction step but the new approach allows for a multi-step prediction and resource allocation.

Under-Provisioning and Over-Provisioning Costs in Prediction-Based NFV Reconfiguration Algorithms
We show a simple scenario of one VNFI activated in the NFVI-POP of Figure 1a. Processing resources, represented by black rectangles, are allocated to the VNFI. In a dynamic traffic scenario, the cloud resources have to be reallocated to the VNFI according to the current traffic conditions. We report the cloud resource reconfiguration in Figure 1b in the case of a traffic increase. For handling this increase the cloud resources allocated to the VNFI are increased by applying a vertical scaling technique that leads to increase the processing capacity of an amount represented with a grey rectangle in Figure 1b. Reactive reconfiguration approaches are not suited in NFV environments especially due to the high time needed to reconfigure the cloud resources [15]. For this reason traffic prediction is needed to allocate in advance the cloud resources. Unfortunately the traffic cannot be predicted exactly and the prediction error may lead to resource over/under provisioning with a consequent increase in operational network cost.
Over provisioning occurs when the predicted traffic is higher than the real one; in this case more cloud resources than needed are allocated; an example of over provisioning is illustrated in Figure 1c where the additional cloud resources are reported with violet rectangles; obviously the allocation of unnecessary resources leads to a cost increase.
Under provisioning occurs when the predicted traffic is lower than the real one; in this case less resources than needed are allocated as illustrated in Figure 1d where the lack of needed resource is represented with crossed rectangles; the under provisioning leads to QoS degradation due to the traffic amount, which will inevitably be lost because of the lack of resources; that will determine a cost increase for the service provider due to the compensation cost to be paid to the user for the lost traffic.
The proposed resource allocation procedures are based on two algorithms: • A reconfiguration algorithm: it uses the predicted traffic values to reconfigure bandwidth and cloud resource, migrate VNFI, etc.; • A traffic prediction algorithm: it uses LSTM-based advanced prediction mechanisms to predict the traffic values.
Briefly we discuss how the NFV architecture proposed by European Telecommunication Standards Institute (ETSI) [21,22] may be extended to support the proposed resource allocation procedure. The main extensions are the following:

Prediction-Based NFV Reconfiguration Algorithm
We illustrate a very general NFV reconfiguration algorithm for the case where reconfiguration is necessary due to VNF-FG traffic variations. The sets, parameters and variables are defined in Tables 2 and 3. In this paper we assume that the VNF-FG are linear graphs in which each link is characterized by the same bandwidth value. Because we also assume that N VNF-FGs are considered and the VNF-FG bandwidth can vary over time then we denote with b j (i) (i = 1, · · · , N; j = 1, · · · ) the bandwidth of i-th VNF-FG in j-th Time Interval (TI). The TI duration is denoted with T s .
To introduce the algorithm we introduce the VNFI graph, the nodes of which correspond to the instanced VNFIs while the edges correspond to virtual links interconnecting the VNFIs. It should be noted that the VNFI graph corresponds to a VNF-FG when VNFIs are not shared between VNF-FGs. On the contrary, in the case of VNFI sharing, it provides information on the total set of VNFIs instantiated to support VNF-FGs and their interconnection.  Vector of the h bandwidth values b (n+j) (j = 1, · · · , h) for the generic VNG-FG Table 3. Variables.

Sets and Parameters Definition
Predicted bandwidth of the link e ∈ L in (n + j)-th TI b n,h Vector of the h predicted bandwidth values b n+j (i) (j = 1, · · · , h) for the generic VNG-FG e n+j Bandwidth prediction error for a generic VNF-FG in (n + j)-th TI The NFV reconfiguration algorithm has the objective to determine an embedding Γ(Ḡ, G) of the VNFI graph G = (V, L) into the physical graphḠ = (V,L) by determining: (i) in which NFVI-PoP of any VNFI is executed; (ii) the cloud (processing) resources to be assigned to the VNFIs; (iii) in which network paths any logical link has to be routed. When traffic variations over time occur, cloud and bandwidth reconfigurations are needed to reduce the costs. Some reconfiguration techniques have been proposed. For instance the solution proposed in [8] leverages the following techniques: (i) migration of VNFIs towards lowest cost NFVI-PoPs; (ii) vertical cloud resource scaling by increasing/decreasing the number of cores allocated to the VNFIs. To apply the techniques, embedding changes of the VNFI graph G = (V, L) into the physical graphḠ = (V,L) are needed and depending on the processing capacities f (j) v (j = 1, 2, · · · ) requested by the nodes v ∈ V and the requested bandwidth f (j) e (j = 1, 2, · · · ) by the links e ∈ L of the VNFI graph in the j-th TI (j = 1, 2, · · · ). The processing capacity f  e are given by the sum of the bandwidths of VNF-FGs that share the node v ∈ V and the link e ∈ L respectively. Hence the processing capacities and the link bandwidths are depending on the offered VNF-FG bandwidths and for this reason they are not a-priori known. We report a reconfiguration solution based on the prediction of the offered VNF-FG bandwidths.
The algorithm can be easily extended to the case in which the values of f (j) v (j = 1, 2, · · · ; v ∈ V) and f (j) e (j = 1, 2, · · · ; e ∈ L) are directly predicted.
Because it is not possible to determine the traffic exactly, we propose a solution that underestimates or overestimates the traffic according to the values of the resource allocation and QoS degradation costs.
The main steps performed by the framework for the cloud and bandwidth resource provisioning are illustrated in Algorithm 1. The inputs are: the physical graphḠ = (V,L), the VNF-FG bandwidths b j (i) (i = 1, · · · , N, j = 1, · · · , n) known up to TI n and the VNFI graph G = (V, L). Next a multi-step ahead prediction of the VNF-FG bandwidth is performed in step 2 by predicting the next h VNF-FG bandwidth values b n+j (i) (i = 1, · · · , N, j = 1, · · · , h). That allows for the evaluation in step 3 of an estimate of the link bandwidths f e (n+j) and the nodes processing capacities f (n+j) v of the VNFI graph in the TIs n + 1, · · · , n + h. The knowledge of these estimated values and the application of cloud and bandwidth resource reconfiguration algorithms allow in step 4 for the determination of h new embeddings Γ n+j (Ḡ, G) (j = 1, · · · , h) to be applied in the TIs n + 1, · · · , n + h. We apply the reconfiguration algorithms proposed in [8] referred to as Least Cloud Resource and Bandwidth (LCBC) and Deployment Costs Aware (DCA). The new embeddings are evaluated from the current embedding Γ c (Ḡ, G), which is the one applied in TI n. Finally the framework returns the evaluated embeddings Γ n+j (Ḡ, G) (j = 1, · · · , h). of the links e ∈ L and nodes v ∈ V of the VNFI graph in the TIs n + 1, · · · , n + h 4: Reconfigure the bandwidth and the cloud resources by applying the algorithms LCBC/DCA [8] and evaluating the embeddings Γ n+j (Ḡ, G) (j = 1, · · · , h) in the ITs n + 1, · · · , n + h 5: Output: Γ n+j (Ḡ, G) (j = 1, · · · , h)

LSTM Prediction Algorithm
The L unfolded stages version of the LSTM prediction framework is illustrated in Figure 2 and consists of the following two layers: • The LSTM prediction layer: it performs the time series prediction by providing the storage of the internal states; we consider the case of a single layer composed by L LSTM Cell Blocks (LCB) referred to as LCB j (j = n − L + 1, · · · , n); • The feed forward network layer: it evaluates from the output of the last LSTM layer the h steps ahead of the predicted bandwidth valuesb n+j (j = 1, · · · , h) stored in the vectorb n,h .
The VNF-FG bandwidth predictions are performed by the LSTM layer, which has as inputs the VNF-FG bandwidth values b j (j = n − L + 1, · · · , n). The output h n is processed by a feed forward neural network, which provides an evaluation of the vectorb n,h of predicted VNF-FG bandwidth values.
In the LSTM layer the state variable s j (j = n − L + 1, · · · , n) is also updated. In the LSTM Cell Block LCB j , the state variable s j in the j-TI depends on the following variables: (i) the VNF-FG bandwidth value b j ; (ii) the output h j−1 in the (j − 1)-th TI; (iii) the state variable s j−1 in the (j − 1)-st TI.
The operation mode of a single LCB is well known because LSTM has been applied in many fields (handwriting recognition, speech recognition, power consumption prediction, etc.) [16]. However, the training of the LSTM recurrent neural network is performed by minimizing the symmetric loss function. At the same time all of the prediction-based resource allocation algorithms in NFV environments aims at exactly forecasting either the traffic [15] or the resources [12] to be allocated. They are based on the minimization of symmetric cost functions of the errors e n+j = b n+j −b n+j (j = 1, · · · , h) where b n+j is the real VNF-FG bandwidth value in the (n + j)-st TI. Examples of these functions are the Mean Squared Error (MSE) or the Mean Absolute Error (MAE). The choice of symmetric cost functions leads to equally weigh positive and negative errors. Conversely, being aware that an exact traffic prediction is not possible, our objective is to make mistakes where it is more convenient according to the cloud resource allocation the QoS degradation costs. For this reason we consider asymmetric cost functions and because of its simplicity we choose a cusp linear loss function as represented in Figure 3 where the slopes are dependent on the resources allocation cost C RA and QoS degradation cost C QoS both defined in $ per Gbit. As reported in Figure 3 the training process minimizes the Asymmetric Mean Absolute Error AMAE n,h expressed by: where I(x) is the indicator function that is I(x) = 1 for x > 0 and I(x) = 0 for x < 0.

Numerical Results
We will evaluate the effectiveness of the asymmetric cost function-based LSTM forecasting model in predicting the requested VNF-FG bandwidth when both the cloud resource allocation and QoS degradation costs are considered. The LSTM forecasting technique will be applied in a real scenario to evaluate the operation cost of an NFV network and compare it to the one achieved when an MSE traditional forecasting technique is applied.
We provide some results in the case of the Deutsche Telekom (DT) network reported in Figure 4. The network is composed of 14 switches and 24 links. The main input parameters, their description and the values range are reported in Table 4.  The cloud resources are placed in N NP = |V NP | = 4 NFVI-PoPs located in the cities of Hannover, Leipzig, Frankfurt and Nuremberg. Each NFVI-PoP is equipped with Nv = 48 cores. The core costs of the NFVI-PoPs are randomly chosen among the values c core NP i = ς i C 0 (i = 0, · · · , N NP − 1) [8,23] where C 0 is a normalization cost and the parameter ς characterizes the unbalancing of the core costs in the different NFVI-PoPs. In particular the cost unbalancing is high, as the parameter ς is higher. For ς = 1 we achieve equal costs and the balanced case. We will carry out the analysis when the average core cost c core av is fixed to 1 $/h. The knowledge of c core av leads to the normalization value C 0 = N NP * c core av 1−ς 1−ς N NP . Next we provide the results in the case of ς = 1.4.
We assume the link bandwidth B L equals 30 Gbps. We consider four SFs: Firewall (FW), Intrusion Detection System (IDS), Network Address Translator (NAT) and Proxy. VNFIs can be instantiated in the NFVI-PoPs to support these SFs. They are supported by software modules characterized by the maximum processing capacities 900, 600, 900 and 600 Mbps with the number of allocated cores equal to respectively 4, 8, 2 and 4. We consider linear VNF-FG of the same type and composed by one FW, one IDS, one NAT and one Proxy.
The choice of the core costs and processing capacities leads to a cloud resource allocation cost C RA = 0.025 $/Gb for the VNF-FG considered.
We assume that one VNF-FG is established for each tuple of access nodes reported in Figure 4. As VNF-FG bandwidth values, we consider the real traffic values measured at hourly intervals and reported in [24]. These values are used to forecast the future traffic values according to the procedure illustrated in Section 3. We evaluate the operation cost of the NFV network reported in Figure 4 when the resource allocation is based on predicted rather than real VNF-FG bandwidth values. In particular we have evaluated the cost for the period from 21 June 2004 to 25 June 2004 [24] by predicting the VNF-FG bandwidth values requested between all of the tuples of access nodes of Figure 4. Because real traffic is not available for the Deutsche Telecom network, we have used the ones available in [24] for other networks with similar size. The predicted values are evaluated by applying the proposed traffic forecasting algorithm and from the knowledge of the real requested VNF-FG bandwidth values from 31 May 2004 to 20 June 2004 [24]. The real traffic values are used for the LSTM training. To reduce the training times we have considered an LSTM network with the following parameters [16]: (i) the number N nr of neurons equals 8; (ii) the loop-back parameter L equals 24; (iii) the batch size N sz equals 24; (iv) the total number N ep of epochs has been fixed to 20; that is, the LSTM training process is executed 20 times to find the best model to perform forecasting.
We assume the cloud and bandwidth resource allocation is performed by executing two algorithms that have been proposed by the authors in a previous paper [8]. It has been shown how these algorithms, referred to as Least Cloud Resource and Bandwidth (LCBC) and Deployment Costs Aware (DCA) [8], perform well in allocating resources for NFV networks and allows for an operation cost optimization when real traffic data are known.
We report the cost in Figure 5 when the parameter w, defined as the ratio of the Resource Allocation cost C RA to the QoS degradation C QoS , is varied from 0.0001 to 1 and the resource allocation is performed in the cases in which the MSE and Asymmetric (ASYM) LSTM prediction techniques are used. Notice that because C RA is fixed to 0.025 $/Gb, the variation of w is obtained by varying C QoS from 250 to 0.025 $/Gb. In particular we study a case of interest in which the QoS degradation cost C QoS is higher than or equal to the cloud resource allocation cost C RA . The results of Figure 5 have been achieved in the case of prediction step h equal to 1, 12 and 24.
From the results reported in Figure 5 we can make the following remarks: • The proposed forecasting solution based on the asymmetric cost function allows for total costs lower than or equal to the one of the MSE-based forecasting solution; the total costs of the two solutions are equal only for w = 1, that is, when the over-provisioning and under-provisioning costs are equal; as a matter of example, the total costs of the MSE and ASYM solutions for w = 0.04 and h = 12 are 134 $ and 96 $ with 28% cost advantage of our proposed asymmetric LSTM prediction solution; • The better performance in total cost of the asymmetric prediction solution for w lower than 1 is due to the fact that it reduces the resource under-provisioning periods and consequently, the costs due to the QoS degradation.
To justify the results we also report in Figure 6 the predicted VNF-FG bandwidth values from 21 June 2004 to 25 June 2004 [24] for the traffic offered between two access nodes of the Deutsche Telekom network. In particular we compare the real, MSE and ASYM LSTM predicted traffic for values of w equal to 0.1 and prediction steps equal to 1, 12 and 24 in Figure 6a-c respectively. From these figures we can remark that the MSE predictions are very near to the real time series values but they do not allow us to reach the goal of over-estimating the time series values because of the higher QoS degradation costs; conversely the ASYM LSTM predictions allows for a correct operation mode by overestimating the predicted values.  These results are confirmed in Figure 7 where we report the total cost as a function of the prediction step h for values w varying from 0.01 to 1.

Conclusions
We have developed a traffic forecasting algorithm for the allocation of resources in NFV environments that can differently weigh the over-provisioning and under-provisioning costs. The proposed solution is inherited from the classical LSTM prediction algorithm and it is based on minimizing an asymmetric cost function of the prediction error. The use of the prediction technique proposed in an NFV network scenario with the interconnection of four NFVI-PoPs has led to cost advantages by 40% compared to prediction techniques based on minimizing the symmetric cost functions of the prediction error.