# A Q-Learning-Based Approach for Deploying Dynamic Service Function Chains

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Related Work

## 3. Problem Description

#### 3.1. Research Motivation

#### 3.2. Network Model

#### 3.3. Request Model

#### 3.4. Dynamic SFC Deployment

## 4. Q-Learning Framework Hybrid Module Algorithm

#### 4.1. Preliminaries

#### 4.2. Reinforcement Learning Module

#### 4.2.1. Original Q-Learning Training Algorithm

Algorithm 1. Original Q-learning Training Algorithm |

1: initialize the $Q$ matrix with all zero elements; 2: initialize the $R$ matrix; 3: initialize ${h}_{min}$ and ${h}_{max}$; 4: $h$ = 0; 5: While True do6: randomly generate $v1\in V$;//as the $now\_node$ 7: randomly generate $v3\in V$;//as the $end\_node$ 8: For $h<{h}_{max}$ do:9: randomly generate $u\in \left[0,1\right]$; 10: If $u\le 0.8$ then:11: choose the $v2\in {V}_{v1}$ with the maximum recommended value from $Q$; 12: End If13: If $u>0.8$ then:14: randomly choose a $v2\in {V}_{v1}$; 15: End If16: $v1=v2$; 17: If $v2=v3$ then:18: Write the link to the $Q$ matrix using Equation (12); 19: Break; 20: End If21: $h$++; 22: If $h={h}_{max}$ then:23: Break; 24: End If25: End For26: If the $Q$ matrix has basically converged, then:27: Break;//return the $Q$ matrix that can be used 28: End If29: End While |

#### 4.2.2. Optimized Q-Learning Training Algorithm

Algorithm 2. Optimized Q-learning Training Algorithm |

1: initialize the $Q$ matrix with all zero elements; 2: initialize the $R$ matrix; 3: initialize ${h}_{min}$ and ${h}_{max}$; 4: $h$ = 0; 5: For each node v ∈ V do //as the$end\_node$6: $chain$ = [$v$] 7: Find_way ($Q$, $R$, $G$, ${h}_{min}$, ${h}_{max}$, $h$, $chain$)8: End For |

Algorithm 3. Find_way (Q, R, G, h_{min}_{,} h_{max}_{,} h, chain) |

1: $v0$ = $chain$ [0]; 2: $h$ = $h$ ++; 3: $chain\_tmp$ = $chain$; 4: While $h$ ≤ ${h}_{max}$ do5: For each node $v2\in {V}_{v0}$ do6: If $v2$ is not in $chain\_tmp$ then7: $chain\_tmp$ = $v2$ + $chain\_tmp$; 8: Find_way ($Q$, $R$, $G$, ${h}_{min}$, ${h}_{max}$, $h$, $chain$);9: If $h$ ≥ ${h}_{min}$ then10: For $i$ in $chain\_tmp$ do11: Write the link to the $Q$ matrix, 12: $\mathrm{Q}\left({\mathrm{s}}_{i},{\mathrm{a}}_{i}\right)=0.8\left(r+\underset{{a}_{i}{}^{\prime}}{max}Q\left({s}_{i}{}^{\prime},{a}_{i}{}^{\prime}\right)\right)$ 13: End For14: End If15: End If16: End For17: End While |

#### 4.2.3. Complexity Analysis of Original and Optimized Q-Learning Training Algorithm

#### 4.2.4. Q-Learning Decision Algorithm

Algorithm 4. Q-learning decision-making process |

1: read the trained matrix $Q$ 2: read the user request list $RE$ 3: For every $re$ in $RE$ do4: Select some optional paths $PA$ from $Q$; 5: For every $pa$ in $PA$ do6: If the $pa$ can deploy the required VNFs then7: add $pa$ to the candidate list $CA$; 8: End If9: End For10: If the candidate list $CA$ is empty then11: deployment for this $re$ failed; 12: continue;13: End If14: Send the candidate list $CA$ to the load balancing module; 15: End For |

#### 4.3. Load Balancing Module

Algorithm 5. The load balancing scoring process |

1: read the information from $G$ 2: read the candidate list $CA$ 3: For every $pa$ in $CA$ do4: calculate the score of $pa$ using Equation (15); 5: End For6: take the path with the highest score from the candidate list $CA$; 7: record the start time ${t}_{start-re}$, and record the end time ${t}_{end-re}$ 8: add $re$ to the online SFC list $ONL$; 9: change the resource residuals in the topology; 10: If any $re$ in $ONL$ reaches ${t}_{end-re}$ then11: return the related resources in the topology; 12: End If |

## 5. Performance Evaluation and Discussion

#### 5.1. Simulation Environment

#### 5.2. Performance Metrics

**Request acceptance ratio:**This value is the ratio of incoming service requests that have been successfully deployed on the network to all incoming request. Ratio $A$ is defined as

**Average service provider profit:**This value is the total profit earned by the service provider after processing the input service requests. The average service provider profit K can be calculated as follows:

**Calculation time per request:**This value reflects the decision time required before each SFC is deployed. The calculation time per request $C$ is expressed as follows:

#### 5.3. Simulation Results and Analysis

#### 5.3.1. Performance Comparison in a Dynamic Network

#### 5.3.2. Effects of the Use Ratio $\lambda $

#### 5.3.3. Comparison of Training Time

## 6. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Sun, G.; Chang, V.; Guan, S. Big Data and Internet of Things—Fusion for different services and its impacts. Future Gener. Comput. Syst.
**2018**, 86, 1368–1370. [Google Scholar] [CrossRef] - Shah, S.; Wu, W.; Lu, Q. AmoebaNet: An SDN-enabled network service for big data science. J. Netw. Comput. Appl.
**2018**, 119, 70–82. [Google Scholar] [CrossRef] - Sun, G.; Anand, V.; Liao, D. Power-efficient provisioning for online virtual network requests in cloud-based data centers. IEEE Syst. J.
**2015**, 9, 427–441. [Google Scholar] [CrossRef] - Wu, K.; Lu, P.; Zhu, Z. Distributed online scheduling and routing of multicast-oriented tasks for profit-driven cloud computing. IEEE Commun. Lett.
**2016**, 20, 684–687. [Google Scholar] [CrossRef] - Sun, G.; Liao, D.; Zhao, D. Towards Provisioning Hybrid Virtual Networks in Federated Cloud Data Centers. Future Gener. Comput. Syst.
**2018**, 87, 457–469. [Google Scholar] [CrossRef] - Herrera, J.G.; Botero, J.F. Resource Allocation in NFV: A Comprehensive Survey. IEEE Trans. Netw. Serv. Manag.
**2017**, 13, 518–532. [Google Scholar] [CrossRef] - Yi, B.; Wang, X.; Li, K. A comprehensive survey of Network Function Virtualization. Comput. Netw.
**2018**, 133, 212–262. [Google Scholar] [CrossRef] - Mijumbi, R.; Serrat, J.; Gorricho, J.L. Network Function Virtualization: State-of-the-Art and Research Challenges. IEEE Commun. Surv. Tutor.
**2016**, 18, 236–262. [Google Scholar] [CrossRef] - Sun, G.; Chang, V.; Yang, G. The Cost-efficient Deployment of Replica Servers in Virtual Content Distribution Networks for Data Fusion. Inf. Sci.
**2018**, 432, 495–515. [Google Scholar] [CrossRef] - Fang, W.; Zeng, M.; Liu, X. Joint spectrum and IT resource allocation for efficient vNF service chaining in inter-datacenter elastic optical networks. IEEE Commun. Lett.
**2016**, 20, 1539–1542. [Google Scholar] [CrossRef] - Ghanwani, A.; Krishnan, R.; Kumar, N. Service Function Chaining (SFC) Operation, Administration and Maintenance (OAM) Framework. J. Am. Chem. Soc.
**2017**, 90, 543–552. [Google Scholar] [CrossRef] - Fang, W.; Lu, M.; Liu, X. Joint defragmentation of optical spectrum and IT resources in elastic optical datacenter interconnections. J. Opt. Commun. Netw.
**2015**, 7, 314–324. [Google Scholar] [CrossRef] - Moens, H.; Turck, F. Customizable function chains: Managing service chain variability in hybrid NFV networks. IEEE Trans. Netw. Serv. Manag.
**2016**, 13, 711–724. [Google Scholar] [CrossRef] - Liu, J.; Lu, W.; Zhou, F. On dynamic service function chain deployment and readjustment. IEEE Trans. Netw. Serv. Manag.
**2017**, 14, 543–553. [Google Scholar] [CrossRef] - Mars, P.; Chen, J.R. Learning Algorithms: Theory and Applications in Signal Processing, Control and Communications; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
- Apostolopoulos, P.A.; Tsiropoulou, E.E.; Papavassiliou, S. Demand Response Management in Smart Grid Networks: A Two-Stage Game-Theoretic Learning-Based Approach. Mob. Netw. Appl.
**2018**, 1–14. [Google Scholar] [CrossRef] - Tsiropoulou, E.E.; Katsinis, G.K.; Filios, A. On the Problem of Optimal Cell Selection and Uplink Power Control in Open Access Multi-service Two-Tier Femtocell Networks. In Proceedings of the International Conference on Ad-Hoc Networks and Wireless, Benidorm, Spain, 22–27 June 2014; pp. 114–127. [Google Scholar]
- Xiong, R.; Cao, J.Y.; Yu, Q.Q. Reinforcement learning-based real-time power management for hybrid energy storage system in the plug-in hybrid electric vehicle. Appl. Energy
**2018**, 211, 538–548. [Google Scholar] [CrossRef] - Radac, M.B.; Precup, R.E. Data-driven model-free slip control of anti-lock braking systems using reinforcement Q-learning. Neurocomputing
**2018**, 275, 314–329. [Google Scholar] [CrossRef] - Xiao, L.; Lu, X.; Xu, D. UAV Relay in VANETs Against Smart Jamming with Reinforcement Learning. IEEE Trans. Veh. Technol.
**2018**, 67, 4087–4097. [Google Scholar] [CrossRef] - Unsal, C.; Kachroo, P.; Bay, J.S. Multiple stochastic learning automata for vehicle path control in an automated highway system. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum.
**2002**, 29, 120–128. [Google Scholar] [CrossRef] - Barto, A.G.; Anandan, P.; Anderson, C.W. Cooperativity in networks of pattern recognizing stochastic learning automata. In Adaptive and Learning Systems; Springer: Boston, MA, USA, 1986; pp. 235–246. [Google Scholar]
- Khazaei, M. Occupancy overload control by Q-learning. Lect. Notes Electr. Eng.
**2019**, 480, 765–776. [Google Scholar] [CrossRef] - Kai, A.; Deisenroth, M.P.; Brundage, M. Deep Reinforcement Learning A brief survey. IEEE Signal Process. Mag.
**2017**, 34, 26–38. [Google Scholar] [CrossRef] - Seeliger, K.; Güçlü, U.; Ambrogioni, L. Generative adversarial networks for reconstructing natural images from brain activity. Neuroimage
**2018**, 181, 775–785. [Google Scholar] [CrossRef] [PubMed] - Sun, G.; Liao, D.; Bu, S. The Efficient Framework and Algorithm for Provisioning Evolving VDC in Federated Data Centers. Future Gener. Comput. Syst.
**2017**, 73, 79–89. [Google Scholar] [CrossRef] - Sun, G.; Liao, D.; Anand, V. A New Technique for Efficient Live Migration of Multiple Virtual Machines. Future Gener. Comput. Syst.
**2016**, 55, 74–86. [Google Scholar] [CrossRef] - Bari, M.F.; Chowdhury, S.R.; Ahmed, R. Orchestrating virtualized network functions. IEEE Trans. Netw. Serv. Manag.
**2016**, 13, 725–739. [Google Scholar] [CrossRef] - Li, D.; Lan, J.L.; Wang, P. Joint service function chain deploying and path selection for bandwidth saving and VNF reuse. Int. J. Commun. Syst.
**2018**, 31. [Google Scholar] [CrossRef] - Sun, G.; Liao, D.; Zhao, D. Live Migration for Multiple Correlated Virtual Machines in Cloud-based Data Centers. IEEE Trans. Serv. Comput.
**2018**, 11, 279–291. [Google Scholar] [CrossRef] - Luizelli, M.C.; Bays, L.R.; Buriol, L.S. Piecing together the NFV provisioning puzzle: Efficient placement and chaining of virtual network functions. In Proceedings of the IFIP/IEEE International Symposium on Integrated Network Management (IM), Ottawa, ON, Canada, 11–15 May 2015; pp. 98–106. [Google Scholar]
- Sun, G.; Li, Y.; Liao, D. Service Function Chain Orchestration across Multiple domains: A Full Mesh Aggregation Approach. IEEE Trans. Netw. Serv. Manag.
**2018**, 15, 1175–1191. [Google Scholar] [CrossRef] - Gupta, A.; Habib, M.F.; Chowdhury, P. Joint Virtual Network Function Placement and Routing of Traffic in Operator Network; Technical Report; University of California Davis: Davis, CA, USA, 2015. [Google Scholar]
- Sun, G.; Li, Y.; Yu, H. Energy-efficient and Traffic-aware Service Function Chaining Orchestration in Multi-Domain Networks. Future Gener. Comput. Syst.
**2019**, 91, 347–360. [Google Scholar] [CrossRef] - Sun, G.; Li, Y.; Li, Y. Low-Latency Orchestration for Workflow-Oriented Service Function Chain in Edge Computing. Future Gener. Comput. Syst.
**2018**, 85, 116–128. [Google Scholar] [CrossRef] - Kim, S.I.; Kim, H.S. A research on dynamic service function chaining based on reinforcement learning using resource usage. In Proceedings of the International Conference on Ubiquitous & Future Networks, Milan, Italy, 4–7 July 2017; pp. 582–586. [Google Scholar]
- Tchana, A.; Tran, G.S.; Broto, L. Two levels autonomic resource management in virtualized IaaS. Future Gener. Comput. Syst.
**2013**, 29, 1319–1332. [Google Scholar] [CrossRef][Green Version] - Teabe, B.; Tchana, A.; Hagimont, D. Enforcing CPU allocation in a heterogeneous IaaS. Future Gener. Comput. Syst.
**2015**, 53, 1–12. [Google Scholar] [CrossRef][Green Version] - Tchana, A.; Palma, N.D.; Safieddine, I. Software consolidation as an efficient energy and cost saving solution. Future Gener. Comput. Syst.
**2016**, 58, 1–12. [Google Scholar] [CrossRef] - Gueye, S.M.K.; de Palma, N.; Rutten, É. Coordinating self-sizing and self-repair managers for multi-tier systems. Future Gener. Comput.Syst.
**2014**, 35, 14–26. [Google Scholar] [CrossRef]

**Figure 1.**(

**a**) Service function chain (SFC) deployment; (

**b**) SFC revocation; SFC deployment and revocation.

**Figure 12.**comparison of the operation time between Original and Optimized Q-learning Training Algorithm.

**Table 1.**Parameters and variables in the Q-learning Framework Hybrid Module algorithm (QLFHM) algorithm.

Parameters and Variables | Definition |
---|---|

$G$ | Information about network topology |

$V$ | A list of nodes in a network topology |

${V}_{i}$ | Node adjacent to node $i$ |

${h}_{max}$ | The maximum number of hops allowed by the model |

${h}_{min}$ | The minimum number of hops allowed by the model |

$Q$ | A 5-dimensional matrix with 5 subscripts {$now\_h$, $now\_node$, $action\_node$, $end\_node$, $h$} that store recommended action values |

$R$ | A 5-dimensional matrix with 5 subscripts {$now\_h$, $now\_node$, $action\_node$, $end\_node$, $h$} that store action reward values |

$h$ | The number of hops in the current state |

$RE$ | User request list, including start and stop nodes, VNF requirements, arrival time, and request online duration |

$PA$ | Path list that matches the start and stop nodes |

$CA$ | Path list that satisfies the start and stop nodes and the VNFs requirements |

$ONL$ | The online SFC list |

Use Ratio | ${\mathit{\lambda}}_{\mathit{l}\mathit{o}\mathit{w}}$ | ${\mathit{\lambda}}_{\mathit{b}\mathit{a}\mathit{l}\mathit{a}\mathit{n}\mathit{c}\mathit{e}\mathit{d}}$ | ${\mathit{\lambda}}_{\mathit{h}\mathit{i}\mathit{g}\mathit{h}}$ |
---|---|---|---|

$x1$ | 0.79 | 0.6 | 0.4 |

$x2$ | 100 | 100 | 100 |

$x3$ | 7 | 7 | 7 |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Sun, J.; Huang, G.; Sun, G.; Yu, H.; Sangaiah, A.K.; Chang, V.
A Q-Learning-Based Approach for Deploying Dynamic Service Function Chains. *Symmetry* **2018**, *10*, 646.
https://doi.org/10.3390/sym10110646

**AMA Style**

Sun J, Huang G, Sun G, Yu H, Sangaiah AK, Chang V.
A Q-Learning-Based Approach for Deploying Dynamic Service Function Chains. *Symmetry*. 2018; 10(11):646.
https://doi.org/10.3390/sym10110646

**Chicago/Turabian Style**

Sun, Jian, Guanhua Huang, Gang Sun, Hongfang Yu, Arun Kumar Sangaiah, and Victor Chang.
2018. "A Q-Learning-Based Approach for Deploying Dynamic Service Function Chains" *Symmetry* 10, no. 11: 646.
https://doi.org/10.3390/sym10110646