RL-PMO: A Reinforcement Learning-Based Optimization Algorithm for Parallel SFC Migration

Hu, Hefei; Liu, Zining; Wu, Fan

doi:10.3390/s26010242

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

RL-PMO: A Reinforcement Learning-Based Optimization Algorithm for Parallel SFC Migration

by

Hefei Hu

,

Zining Liu

and

Fan Wu

^*

School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China

^*

Author to whom correspondence should be addressed.

Sensors 2026, 26(1), 242; https://doi.org/10.3390/s26010242 (registering DOI)

Submission received: 10 November 2025 / Revised: 23 December 2025 / Accepted: 25 December 2025 / Published: 30 December 2025

(This article belongs to the Special Issue Digital Twin Networks, Network Virtualization and Applications for Next-Generation Sensor Systems)

Download Review Reports Versions Notes

Abstract

In edge networks, hardware failures and resource pressure may disrupt Service Function Chains (SFCs) deployed on the failed node, making it necessary to efficiently migrate multiple Virtual Network Functions (VNFs) under limited resources. To address these challenges, this paper proposes an offline reinforcement learning-based parallel migration optimization algorithm (RL-PMO) to enable parallel migration of multiple VNFs. The method follows a two-stage framework: in the first stage, improved heuristic algorithms are used to generate high-quality migration trajectories and construct a multi-scenario dataset; in the second stage, the Decision Mamba model is employed to train the policy network. With its selective modeling capability for structured sequences, Decision Mamba can capture the dependencies between VNFs and underlying resources. Combined with a twin-critic architecture and CQL regularization, the model effectively mitigates distribution shift and Q-value overestimation. The simulation results show that RL-PMO maintains approximately a 95% migration success rate across different load conditions and improves performance by about 13% under low and medium loads and up to 17% under high loads compared with typical offline RL algorithms such as IQL. Overall, RL-PMO provides an efficient, reliable, and resource-aware solution for SFC migration in node failure scenarios.

Keywords: network function virtualization; service function chain; offline reinforcement learning; parallel migration

Share and Cite

MDPI and ACS Style

Hu, H.; Liu, Z.; Wu, F. RL-PMO: A Reinforcement Learning-Based Optimization Algorithm for Parallel SFC Migration. Sensors 2026, 26, 242. https://doi.org/10.3390/s26010242

AMA Style

Hu H, Liu Z, Wu F. RL-PMO: A Reinforcement Learning-Based Optimization Algorithm for Parallel SFC Migration. Sensors. 2026; 26(1):242. https://doi.org/10.3390/s26010242

Chicago/Turabian Style

Hu, Hefei, Zining Liu, and Fan Wu. 2026. "RL-PMO: A Reinforcement Learning-Based Optimization Algorithm for Parallel SFC Migration" Sensors 26, no. 1: 242. https://doi.org/10.3390/s26010242

APA Style

Hu, H., Liu, Z., & Wu, F. (2026). RL-PMO: A Reinforcement Learning-Based Optimization Algorithm for Parallel SFC Migration. Sensors, 26(1), 242. https://doi.org/10.3390/s26010242

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

RL-PMO: A Reinforcement Learning-Based Optimization Algorithm for Parallel SFC Migration

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI