Intelligent Service Chain Orchestration and Resource Allocation in End–Edge Collaborative IIoT Using Multi-Agent Proximal Policy Optimization
Abstract
1. Introduction
- IIoT services are heterogeneous, spanning ultra-low latency to compute-intensive tasks, which complicates resource offloading and allocation. In an edge gateway–IIoT device cooperative network, SFCs are virtualized into VNFs and offloaded to nodes for flexible orchestration, while scheduling must also address the combined challenges of energy use, end-to-end delay and QoS requirements. A multi-node model is thus established to minimize total system cost by jointly addressing VNF deployment decisions and computation resource allocation, achieving integrated optimization of computation, and deployment.
- To address the challenges of dynamic service demands and large-scale node collaboration in IIoT, this paper proposes the SFC Orchestration and Resource Allocation-based Multi-Agent Proximal Policy Optimization (SORA-MAPPO) algorithm. By integrating SFC orchestration and resource allocation decisions into a unified reinforcement learning framework, SORA-MAPPO uniformly models edge gateways and IIoT devices as independent, cooperative agents and adheres to the CTDE paradigm to learn the optimal policy for complex service processing.
- A multi-edge-gateway-assisted IIoT simulation platform is constructed, and through multi-scenario comparative experiments and hyperparameter tuning, the effectiveness and robustness of SORA-MAPPO in complex environments are validated from multiple dimensions, providing a comprehensive performance evaluation benchmark for SFC orchestration and resource allocation in IIoT scenarios.
2. System Model and Problem Formulation
2.1. Network Model
2.2. SFC Model
2.3. Communication Model
2.4. Computational Model
2.5. Problem Formulation
3. SORA-MAPPO Algorithm
3.1. DEC-POMDP Formulation
3.2. MAPPO Algorithm Framework
3.3. Computational Complexity Analysis
| Algorithm 1 Training phase of SORA-MAPPO |
|
4. Simulation Results
4.1. Simulation Setup
4.2. Experimental Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Peralta, G.; Iglesias-Urkia, M.; Barcelo, M.; Gomez, R.; Moran, A.; Bilbao, J. Fog computing based efficient IoT scheme for the Industry 4.0. In Proceedings of the 2017 IEEE International Workshop of Electronics, Control, Measurement, Signals and their Application to Mechatronics (ECMSM), Donostia, Spain, 24–26 May 2017; pp. 1–6. [Google Scholar] [CrossRef]
- Ning, Z.; Dong, P.; Wang, X.; Hu, X.; Guo, L.; Hu, B.; Guo, Y.; Qiu, T.; Leung, V.C.M. When Deep Reinforcement Learning Meets 5G-Enabled Vehicular Networks: A Distributed Offloading Framework for Traffic Big Data. IEEE Trans. Ind. Inform. 2020, 16, 1352–1361. [Google Scholar] [CrossRef]
- Mahmud, R.; Toosi, A.N.; Ramamohanarao, K.; Buyya, R. Context-Aware Placement of Industry 4.0 Applications in Fog Computing Environments. IEEE Trans. Ind. Inform. 2020, 16, 7004–7013. [Google Scholar] [CrossRef]
- Wang, S.; Chen, H.; Wang, Y. Collaborative Caching for Energy Optimization in Content-Centric Internet of Things. IEEE Trans. Comput. Soc. Syst. 2022, 9, 230–238. [Google Scholar] [CrossRef]
- Xu, Y.; Zhang, T.; Liu, Y.; Yang, D.; Xiao, L.; Tao, M. UAV-Assisted MEC Networks with Aerial and Ground Cooperation. IEEE Trans. Wirel. Commun. 2021, 20, 7712–7727. [Google Scholar] [CrossRef]
- Cao, H.; Lin, Z.; Yang, L.; Wang, J.; Guizani, M. DT-SFC-6G: Digital Twins Assisted Service Function Chains in Softwarized 6G Networks for Emerging V2X. IEEE Netw. 2023, 37, 289–296. [Google Scholar] [CrossRef]
- Liu, Y.; Lu, H.; Li, X.; Zhang, Y.; Xi, L.; Zhao, D. Dynamic Service Function Chain Orchestration for NFV/MEC-Enabled IoT Networks: A Deep Reinforcement Learning Approach. IEEE Internet Things J. 2021, 8, 7450–7465. [Google Scholar] [CrossRef]
- Asgarian, M.; Jamshidi, K.; Bohlooli, A. An Efficient Approximation Algorithm for Service Function Chaining Placement in Edge–Cloud Computing Industrial Internet of Things. IEEE Internet Things J. 2024, 11, 12815–12822. [Google Scholar] [CrossRef]
- Wu, H.; Chen, J.; Nguyen, T.N.; Tang, H. Lyapunov-Guided Delay-Aware Energy Efficient Offloading in IIoT-MEC Systems. IEEE Trans. Ind. Inform. 2023, 19, 2117–2128. [Google Scholar] [CrossRef]
- Lin, B.; Chen, X.; Chen, X.; Ma, Y.; Xiong, N.N. SGCS: An Intelligent Stackelberg-Game-Based Computation Offloading and Resource Pricing Scheme in Blockchain-Enabled MEC for IIoT. IEEE Internet Things J. 2024, 11, 26727–26740. [Google Scholar] [CrossRef]
- Bebortta, S.; Senapati, D.; Panigrahi, C.R.; Pati, B. Adaptive Performance Modeling Framework for QoS-Aware Offloading in MEC-Based IIoT Systems. IEEE Internet Things J. 2022, 9, 10162–10171. [Google Scholar] [CrossRef]
- Sun, L.; Wang, J.; Lin, B. Task Allocation Strategy for MEC-Enabled IIoTs via Bayesian Network Based Evolutionary Computation. IEEE Trans. Ind. Inform. 2021, 17, 3441–3449. [Google Scholar] [CrossRef]
- Chen, Z.; Yu, Z. Intelligent Offloading in Blockchain-Based Mobile Crowdsensing Using Deep Reinforcement Learning. IEEE Commun. Mag. 2023, 61, 118–123. [Google Scholar] [CrossRef]
- Chen, Z.; Xiong, B.; Chen, X.; Min, G.; Li, J. Joint Computation Offloading and Resource Allocation in Multi-Edge Smart Communities with Personalized Federated Deep Reinforcement Learning. IEEE Trans. Mob. Comput. 2024, 23, 11604–11619. [Google Scholar] [CrossRef]
- Chen, Z.; Zhang, J.; Huang, Z.; Wang, P.; Yu, Z.; Miao, W. Computation offloading in blockchain-enabled MCS systems: A scalable deep reinforcement learning approach. Future Gener. Comput. Syst. 2024, 153, 301–311. [Google Scholar] [CrossRef]
- Xu, S.; Li, Y.; Guo, S.; Lei, C.; Liu, D.; Qiu, X. Cloud–Edge Collaborative SFC Mapping for Industrial IoT Using Deep Reinforcement Learning. IEEE Trans. Ind. Inform. 2022, 18, 4158–4168. [Google Scholar] [CrossRef]
- Song, S.; Lee, C.; Cho, H.; Lim, G.; Chung, J.M. Clustered Virtualized Network Functions Resource Allocation based on Context-Aware Grouping in 5G Edge Networks. IEEE Trans. Mob. Comput. 2020, 19, 1072–1083. [Google Scholar] [CrossRef]
- Sun, G.; Xu, Z.; Yu, H.; Chang, V. Dynamic Network Function Provisioning to Enable Network in Box for Industrial Applications. IEEE Trans. Ind. Inform. 2021, 17, 7155–7164. [Google Scholar] [CrossRef]
- Agarwal, S.; Chintapalli, V.R.; Tamma, B.R. FlexSFC: Flexible Resource Allocation and VNF Parallelism for Improved SFC Placement. In Proceedings of the 2022 IEEE 8th International Conference on Network Softwarization (NetSoft), Milan, Italy, 27 June–1 July 2022; pp. 302–306. [Google Scholar] [CrossRef]
- Han, Y.; Meng, W.; Fan, W. SFC Placement and Dynamic Resource Allocation Based on VNF Performance-Resource Function and Service Requirement in Cloud-Edge Environment. J. Syst. Eng. Electron. 2024, 35, 906–921. [Google Scholar] [CrossRef]
- Guo, S.; Dai, Y.; Xu, S.; Qiu, X.; Qi, F. Trusted Cloud-Edge Network Resource Management: DRL-Driven Service Function Chain Orchestration for IoT. IEEE Internet Things J. 2020, 7, 6010–6022. [Google Scholar] [CrossRef]
- Quang, P.T.A.; Hadjadj-Aoul, Y.; Outtagarts, A. A Deep Reinforcement Learning Approach for VNF Forwarding Graph Embedding. IEEE Trans. Netw. Serv. Manag. 2019, 16, 1318–1331. [Google Scholar] [CrossRef]
- Alsenwi, M.; Tran, N.H.; Bennis, M.; Pandey, S.R.; Bairagi, A.K.; Hong, C.S. Intelligent Resource Slicing for eMBB and URLLC Coexistence in 5G and Beyond: A Deep Reinforcement Learning Based Approach. IEEE Trans. Wirel. Commun. 2021, 20, 4585–4600. [Google Scholar] [CrossRef]
- Abedin, S.F.; Munir, M.S.; Tran, N.H.; Han, Z.; Hong, C.S. Data Freshness and Energy-Efficient UAV Navigation Optimization: A Deep Reinforcement Learning Approach. IEEE Trans. Intell. Transp. Syst. 2021, 22, 5994–6006. [Google Scholar] [CrossRef]
- Chen, H.; Wang, S.; Li, G.; Nie, L.; Wang, X.; Ning, Z. Distributed Orchestration of Service Function Chains for Edge Intelligence in the Industrial Internet of Things. IEEE Trans. Ind. Inform. 2022, 18, 6244–6254. [Google Scholar] [CrossRef]
- Li, J.; Wang, R.; Wang, K. Service Function Chaining in Industrial Internet of Things with Edge Intelligence: A Natural Actor-Critic Approach. IEEE Trans. Ind. Inform. 2023, 19, 491–502. [Google Scholar] [CrossRef]
- Pourghasemian, M.; Abedi, M.R.; Hosseini, S.S.; Mokari, N.; Javan, M.R.; Jorswieck, E.A. AI-Based Mobility-Aware Energy Efficient Resource Allocation and Trajectory Design for NFV Enabled Aerial Networks. IEEE Trans. Green Commun. Netw. 2023, 7, 281–297. [Google Scholar] [CrossRef]
- Luong, N.C.; Hoang, D.T.; Gong, S.; Niyato, D.; Wang, P.; Liang, Y.; Kim, D.I. Applications of Deep Reinforcement Learning in Communications and Networking: A Survey. IEEE Commun. Surv. Tutor. 2019, 21, 3133–3174. [Google Scholar] [CrossRef]
- Fu, X.; Yu, F.R.; Wang, J.; Qi, Q.; Liao, J. Service Function Chain Embedding for NFV-Enabled IoT Based on Deep Reinforcement Learning. IEEE Commun. Mag. 2019, 57, 102–108. [Google Scholar] [CrossRef]
- Fu, X.; Yu, F.R.; Wang, J.; Qi, Q.; Liao, J. Dynamic Service Function Chain Embedding for NFV-Enabled IoT: A Deep Reinforcement Learning Approach. IEEE Trans. Wirel. Commun. 2020, 19, 507–519. [Google Scholar] [CrossRef]
- Huang, H.; Zeng, C.; Zhao, Y.; Min, G.; Zhu, Y.; Miao, W.; Hu, J. Scalable Orchestration of Service Function Chains in NFV-Enabled Networks: A Federated Reinforcement Learning Approach. IEEE J. Sel. Areas Commun. 2021, 39, 2558–2571. [Google Scholar] [CrossRef]
- Liu, W.; Li, B.; Xie, W.; Dai, Y.; Fei, Z. Energy Efficient Computation Offloading in Aerial Edge Networks with Multi-Agent Cooperation. IEEE Trans. Wirel. Commun. 2023, 22, 5725–5739. [Google Scholar] [CrossRef]
- Song, F.; Deng, M.; Xing, H.; Liu, Y.; Ye, F.; Xiao, Z. Energy-Efficient Trajectory Optimization with Wireless Charging in UAV-Assisted MEC Based on Multi-Objective Reinforcement Learning. IEEE Trans. Mob. Comput. 2024, 23, 10867–10884. [Google Scholar] [CrossRef]
- Chen, G.; Zhang, X.; Qi, S.; Zeng, Q.; Zhang, Y.D. Network Slicing Resource Allocation Optimization Based on Multiactor-Attention-Critic Joint with Bidding in Heterogeneous Integrated Network. IEEE Syst. J. 2024, 18, 1186–1197. [Google Scholar] [CrossRef]
- Shahab, M.H.; Sharma, Y.; Jindal, A.; Al-Dulaimy, A. A Bi-Objective Policy for Resilient and Sustainable SFC Management in Telco-Cloud Environments. IEEE Access 2025, 13, 215453–215473. [Google Scholar] [CrossRef]











| Notation | Description |
|---|---|
| Set of edge gateways, | |
| Set of IIoT devices, | |
| Set of all computing nodes, | |
| Number of edge gateways | |
| Number of IIoT devices | |
| Computing capacity of node i (CPU cycles per time slot) | |
| CPU processing rate of node i (cycles/s) | |
| Maximum storage capacity of node i | |
| The n-th SFC request | |
| Ordered VNF sequence of SFC n, | |
| The k-th VNF of SFC request n | |
| Computational complexity vector of VNF, | |
| Computational complexity of VNF (CPU cycles/bit) | |
| Storage requirement vector of VNF, | |
| Storage requirement of VNF (including image and state data) | |
| Data packet length vector, | |
| Initial data packet length of SFC request n (bits) | |
| Data packet length after processing by VNF (bits) | |
| Maximum tolerable end-to-end delay of SFC n (ms) |
| Parameter | Description | Value |
|---|---|---|
| Number of edge gateways | 4 | |
| Number of IIoT devices | 6 | |
| Gateway-to-gateway channel bandwidth | 20 MHz | |
| Gateway-to-device channel bandwidth | 10 MHz | |
| Device-to-device channel bandwidth | 5 MHz | |
| Transmission power | 0.1 W | |
| Noise power | W | |
| Gateway computational capacity (per timeslot) | cycles | |
| Device computational capacity (per timeslot) | cycles | |
| Gateway CPU processing rate | cycles/s | |
| Device CPU processing rate | cycles/s | |
| Gateway maximum storage capacity | 50 GB | |
| Device maximum storage capacity | 8 GB | |
| Gateway energy coefficient | ||
| Device energy coefficient | ||
| Number of SFC requests | 3 | |
| VNF sequence length | [3, 4] | |
| Initial data size | KB | |
| VNF computational complexity | cycles/bit | |
| VNF storage requirement | MB | |
| VNF instantiation time | [0.001, 0.005] s | |
| Max end-to-end delay (delay-sensitive) | [20, 40] s | |
| Max end-to-end delay (computation-intensive) | [40, 60] s |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zhao, T.; Tian, B.; Wang, L.; Ma, W.; Wei, B. Intelligent Service Chain Orchestration and Resource Allocation in End–Edge Collaborative IIoT Using Multi-Agent Proximal Policy Optimization. Sensors 2026, 26, 3583. https://doi.org/10.3390/s26113583
Zhao T, Tian B, Wang L, Ma W, Wei B. Intelligent Service Chain Orchestration and Resource Allocation in End–Edge Collaborative IIoT Using Multi-Agent Proximal Policy Optimization. Sensors. 2026; 26(11):3583. https://doi.org/10.3390/s26113583
Chicago/Turabian StyleZhao, Tianzhen, Bingxin Tian, Lei Wang, Wanming Ma, and Bin Wei. 2026. "Intelligent Service Chain Orchestration and Resource Allocation in End–Edge Collaborative IIoT Using Multi-Agent Proximal Policy Optimization" Sensors 26, no. 11: 3583. https://doi.org/10.3390/s26113583
APA StyleZhao, T., Tian, B., Wang, L., Ma, W., & Wei, B. (2026). Intelligent Service Chain Orchestration and Resource Allocation in End–Edge Collaborative IIoT Using Multi-Agent Proximal Policy Optimization. Sensors, 26(11), 3583. https://doi.org/10.3390/s26113583
