A Deep Reinforcement Learning-Based Approach for Bandwidth-Aware Service Function Chaining
Abstract
1. Introduction
2. Related Work
3. The Proposed DRL-BSFC
3.1. Physical Network Architecture
3.2. Service Function Chaining
3.3. Reward Functions
3.4. The Modified A3C Algorithm
4. Performance Evaluation
4.1. Simulation Settings
4.2. Results and Discussion
4.2.1. Scenario I
4.2.2. Scenario II
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Ericsson Subscriptions Outlook. Available online: https://www.ericsson.com/en/reports-and-papers/mobility-report/dataforecasts/mobile-subscriptions-outlook (accessed on 10 December 2025).
- SDN and NFV Technology in Telecom Network Transformation Market Overview. Available online: https://www.marketgrowthreports.com/market-reports/sdn-and-nfv-technology-in-telecom-network-transformation-market-104266 (accessed on 10 December 2025).
- RFC 7665. Available online: https://datatracker.ietf.org/doc/html/rfc7665 (accessed on 10 December 2025).
- Wang, Y.; Chen, Z. Online Service Function Chain Deployment Method Based on Advantage Actor-Critic Learning. In Proceedings of the 2023 3rd International Conference on Digital Society and Intelligent Systems, Chengdu, China, 10–12 November 2023. [Google Scholar]
- Tomassilli, A.; Giroire, F.; Huin, N.; Perennes, S. Provably Efficient Algorithms for Placement of Service Function Chains with Ordering Constraints. In Proceedings of the IEEE INFOCOM 2018, Honolulu, HI, USA, 16–19 April 2018; pp. 774–782. [Google Scholar]
- Addis, B.; Belabed, D.; Bouet, M.; Secci, S. Virtual network functions placement and routing optimization. In Proceedings of the 2015 IEEE 4th International Conference on Cloud Networking, Niagara Falls, ON, Canada, 5–7 October 2015; pp. 171–177. [Google Scholar]
- Cohen, R.; Lewin-Eytan, L.; Naor, J.S.; Raz, D. Near Optimal Placement of Virtual Network Functions. In Proceedings of the IEEE INFOCOM 2015, Hong Kong, China, 26 April–1 May 2015; pp. 1346–1354. [Google Scholar]
- Rost, M.; Schmid, S. On the Hardness and Inapproximability of Virtual Network Embeddings. IEEE/ACM Trans. Netw. 2020, 28, 791–803. [Google Scholar] [CrossRef]
- Jang, I.; Suh, D.; Pack, S.; Dán, G. Joint Optimization of Service Function Placement and Flow Distribution for Service Function Chaining. IEEE J. Sel. Areas Commun. 2017, 35, 2532–2541. [Google Scholar] [CrossRef]
- Liu, F.; Chen, X.; An, W.; Peng, Y.; Cao, J.; Zhang, Y. Multiple Service Function Chaining under Load Balance in SDN/NFV Networks. In Proceedings of the IEEE 28th PIMRC, Montreal, QC, Canada, 8–13 October 2017; pp. 1–6. [Google Scholar]
- Luizelli, M.C.; da Costa Cordeiro, W.L.; Buriol, L.S.; Gaspary, L.P. A Fix-and-Optimize Approach for Efficient and Large Scale Virtual Network Function Placement and Chaining. Comput. Commun. 2017, 102, 67–77. [Google Scholar] [CrossRef]
- Gong, L.; Wen, Y.; Zhu, Z.; Lee, T. Toward Profit-Seeking Virtual Network Embedding Algorithm via Global Resource Capacity. In Proceedings of the IEEE INFOCOM 2014, Toronto, ON, Canada, 27 April–2 May 2014; pp. 1–9. [Google Scholar]
- Zhang, X.; Cui, L.; Tso, F.P.; Li, Z.; Jia, W. Dapper: Deploying Service Function Chains in the Programmable Data Plane Via Deep Reinforcement Learning. IEEE Trans. Serv. Comput. 2023, 16, 2532–2544. [Google Scholar] [CrossRef]
- Elbey, N.E.; Ayad, S.; Benhaya, B. Review on Reinforcement Learning-based Approaches for Service Function Chain Deployment in 5G Networks. In Proceedings of the 2022 2nd International Conference on New Technologies of Information and Communication, Mila, Algeria, 21–22 December 2022. [Google Scholar]
- Wang, S.; Yang, L. A Survey of Service Function Chain Orchestration Based on Neural Network. In Proceedings of the 2023 IEEE 98th Vehicular Technology Conference (VTC2023-Fall), Hong Kong, China, 10–13 October 2023. [Google Scholar]
- Yan, Z.; Ge, J.; Wu, Y.; Li, L.; Li, T. Automatic Virtual Network Embedding: A Deep Reinforcement Learning Approach with Graph Convolutional Networks. IEEE J. Sel. Areas Commun. 2020, 38, 1040–1057. [Google Scholar] [CrossRef]
- Pei, J.; Hong, P.; Pan, M.; Liu, J.; Zhou, J. Optimal VNF Placement via Deep Reinforcement Learning in SDN/NFV-Enabled Networks. IEEE J. Sel. Areas Commun. 2020, 38, 263–278. [Google Scholar] [CrossRef]
- PQuang, T.A.; Hadjadj-Aoul, Y.; Outtagarts, A. Evolutionary Actor-Multi-Critic Model for VNF-FG Embedding. In Proceedings of the IEEE 17th Annual Consumer Communications & Networking Conference, Las Vegas, NV, USA, 10–13 January 2020; pp. 1–6. [Google Scholar]
- A3C. Available online: https://zh.wikipedia.org/zh-tw/A3C (accessed on 10 December 2025).
- Chen, L.; Gu, Q.; Jiang, K.; Zhao, L. A3C-Based and Dependency-Aware Computation Offloading and Service Caching in Digital Twin Edge Networks. IEEE Access 2023, 11, 57564–57573. [Google Scholar] [CrossRef]
- Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P.S. A Comprehensive Survey on Graph Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 4–24. [Google Scholar] [CrossRef] [PubMed]
- Seq2Seq Model. Available online: https://www.geeksforgeeks.org/machine-learning/seq2seq-model-in-machine-learning/ (accessed on 10 December 2025).
- Tian, A.; Feng, B.; Huang, Y.; Zhou, H.; Yu, S.; Zhang, H. DRL-Based Two-Stage SFC Deployment Approach Under Latency Constraints. In Proceedings of the IEEE INFOCOM 2024, Vancouver, BC, Canada, 20–20 May 2024; pp. 1–6. [Google Scholar]
- Wang, T.; Shen, L.; Fan, Q.; Xu, T.; Liu, T.; Xiong, H. Joint Admission Control and Resource Allocation of Virtual Network Embedding via Hierarchical Deep Reinforcement Learning. IEEE Trans. Serv. Comput. 2024, 17, 1001–1015. [Google Scholar] [CrossRef]
- Cao, H.; Wu, S.; Aujla, G.S.; Wang, Q.; Yang, L.; Zhu, H. Dynamic Embedding and Quality of Service-Driven Adjustment for Cloud Networks. IEEE Trans. Ind. Inform. 2020, 16, 1406–1416. [Google Scholar] [CrossRef]
- Sahraoui, R.; Houidi, O.; Bannour, F. Energy-Aware VNF-FG Placement with Transformer-based Deep Reinforcement Learning. In Proceedings of the 2024 IEEE Network Operations and Management Symposium, Seoul, Republic of Korea, 6–10 May 2024; pp. 1–9. [Google Scholar]
- Wang, T.; Fan, Q.; Li, X.; Zhang, X.; Xiong, Q.; Fu, S.; Gao, M. DRL-SFCP: Adaptive Service Function Chains Placement with Deep Reinforcement Learning. In Proceedings of the 2021 IEEE International Conference on Communications, Montreal, QC, Canada, 14–23 June 2021; pp. 1–6. [Google Scholar]
- Fan, Q.; Pan, P.; Li, X.; Wang, S.; Li, J.; Wen, J. DRL-D: Revenue-Aware Online Service Function Chain Deployment via Deep Reinforcement Learning. IEEE Trans. Netw. Serv. Manag. 2022, 19, 4531–4545. [Google Scholar] [CrossRef]
- Chowdhury, M.; Rahman, M.R.; Boutaba, R. ViNEYard: Virtual Network Embedding Algorithms With Coordinated Node and Link Mapping. IEEE/ACM Trans. Netw. 2012, 20, 206–219. [Google Scholar] [CrossRef]
- Waxman, B.M. Routing of multipoint connections. IEEE J. Sel. Areas Commun. 1988, 6, 1617–1622. [Google Scholar] [CrossRef]
- Virne. Available online: https://github.com/GeminiLight/virne/blob/main/resources/pdfs/virne_benchmark_paper.pdf (accessed on 10 December 2025).





| Notation | Description |
|---|---|
| Gp | the physical network topology |
| Np | the set of physical nodes in Gp (np in Np) |
| Lp | the set of physical links in Gp (lp in Lp) |
| K | the number of resource types on each np |
| the list of the remaining resources of K types on np | |
| the list of the maximum resources of K types on np | |
| the remaining bandwidth of link lp | |
| the maximum bandwidth of link lp | |
| Gv | the virtual topology of SFC v |
| Nv | the set of VNFs in Gv (nv in Nv) |
| Lv | the set of virtual links in Gv (lv in Lv) |
| the list of the resource requests of K types on nv | |
| the bandwidth request of lv | |
| 1 or 0, depending on whether nv is mapped to np | |
| 1 or 0, depending on whether lv is mapped to lp | |
| the placement bandwidth for the j-th VNF of SFC v | |
| the bandwidth cost for successfully deploying SFC v |
| Parameter | Value |
|---|---|
| Topology model | Waxman |
| The number of nodes | 100 |
| α (Parameters of distance and connection) | 0.5 |
| β (Connection density) | 0.2 |
| The number of links | 500 |
| CPU resource | 50~100 units |
| RAM resource | 50~100 units |
| ROM resource | 50~100 units |
| Bandwidth resource | 50~100 units |
| Parameter | Value | Description |
|---|---|---|
| 4 | the number of worker agents | |
| 0.001 | the unit price of resource | |
| 0.001 | the unit price of bandwidth | |
| 0.00025 | the learning rate of θn | |
| 0.0005 | the learning rate ωn | |
| 0.95 | the discount factor of TD error | |
| 0.125 | the reward coefficient | |
| 64 | the batch size | |
| Ugcn, Uemb, Uenc, Udec | 64 | the number of GCN layers, embedding layers, encoder hidden states, and decoder hidden states |
| wp | 4 | the weight of the reciprocal of the placement bandwidth |
| ws | 50 | the weight of the reciprocal of the bandwidth cost |
| wf | −0.5 | the weight of the penalty for deployment failure |
| Scheme | Ar | Cb (v) | RLb | Trevenue | Tcost | Trc |
|---|---|---|---|---|---|---|
| DRL-BSFC | 0.8581 | 233.598 | 0.3798 | 418,512 | 617,393 | 0.678 |
| DRL-SFCP | 0.8598 | 268.821 | 0.3212 | 414,965 | 641,731 | 0.646 |
| Scheme | Ar | Cb (v) | RLb | Trevenue | Tcost | Trc |
|---|---|---|---|---|---|---|
| DRL-BSFC | 0.6872 | 238.100 | 0.2838 | 317,952 | 476,640 | 0.667 |
| DRL-SFCP | 0.6483 | 299.542 | 0.2540 | 295,653 | 481,258 | 0.614 |
| Scheme | Ar | Cb (v) | RLb | Trevenue | Tcost | Trc |
|---|---|---|---|---|---|---|
| DRL-BSFC | 0.5306 | 273.746 | 0.2762 | 236,472 | 371,941 | 0.635 |
| DRL-SFCP | 0.5136 | 281.489 | 0.2486 | 225,165 | 360,068 | 0.625 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wu, Y.-J.; Hwang, S.-H.; Hwang, W.-S.; Cheng, M.-H. A Deep Reinforcement Learning-Based Approach for Bandwidth-Aware Service Function Chaining. Electronics 2026, 15, 227. https://doi.org/10.3390/electronics15010227
Wu Y-J, Hwang S-H, Hwang W-S, Cheng M-H. A Deep Reinforcement Learning-Based Approach for Bandwidth-Aware Service Function Chaining. Electronics. 2026; 15(1):227. https://doi.org/10.3390/electronics15010227
Chicago/Turabian StyleWu, Yan-Jing, Shi-Hao Hwang, Wen-Shyang Hwang, and Ming-Hua Cheng. 2026. "A Deep Reinforcement Learning-Based Approach for Bandwidth-Aware Service Function Chaining" Electronics 15, no. 1: 227. https://doi.org/10.3390/electronics15010227
APA StyleWu, Y.-J., Hwang, S.-H., Hwang, W.-S., & Cheng, M.-H. (2026). A Deep Reinforcement Learning-Based Approach for Bandwidth-Aware Service Function Chaining. Electronics, 15(1), 227. https://doi.org/10.3390/electronics15010227

