Heterogeneous Computing Resources Scheduling Based on Time-Varying Graphs and Multi-Agent Reinforcement Learning
Abstract
1. Introduction
- We construct a heterogeneous Discrete Time-Varying Graph model. Unlike traditional static graphs, this model integrates the spatiotemporal dynamics of communication links with the specific attributes of heterogeneous hardware (e.g., CPU/GPU capacities), transforming the continuous dynamic scheduling problem into a path optimization problem on a discrete graph.
- We propose a spatiotemporal feature extraction mechanism based on Long Short-Term Memory (LSTM). By capturing the historical load trends and topological changes from the DTVG, this mechanism provides accurate state representation for the decision-making agent, enhancing the perception of environmental dynamics.
- We design a QMIX-based collaborative scheduling algorithm. We formulate the resource allocation problem as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP). A novel reward function is designed to balance task migration costs, heterogeneous resource matching, and QoS satisfaction. Extensive simulations demonstrate that our method outperforms traditional baselines in terms of resource utilization and average delay.
2. Related Work
2.1. Computing Power Network and Heterogeneous Resource Scheduling
2.2. Time-Varying Graph Models and Temporal Feature Extraction
2.3. Reinforcement Learning with Partial Observability
3. System Model and Problem Formulation
3.1. Heterogeneous Computing Network Architecture
3.1.1. Heterogeneous Node Model
3.1.2. Task Diversity Model
3.2. Weighted Discrete Time-Varying Graph
3.2.1. Offloading Edge ()
3.2.2. Migration Edge ()
3.3. Communication and Computation Energy Model
3.4. Problem Formulation
4. LSTM-Enhanced QMIX Framework for Integrated Scheduling
4.1. Spatiotemporal Heterogeneous Feature Extraction
4.1.1. Heterogeneous State Representation
4.1.2. LSTM-Based Temporal Encoding
4.2. Dec-POMDP Formulation
4.2.1. Global State ()
4.2.2. Local Observation ()
- (1)
- Task Profile: The intrinsic requirement vector , specifically highlighting the workload type and the latency deadline .
- (2)
- Node Context: The heterogeneous attribute vector of candidate nodes, augmented by their spatiotemporal load embeddings extracted via the LSTM module.
- (3)
- Link Conditions: The dynamic weights of reachable Offloading Edges () and potential Migration Edges () within the current DTVG snapshot.
4.2.3. Reward Function Design ()
- (1)
- (Architectural Affinity): This term corresponds to the processing efficiency factor . If task is assigned to node , we set . This mechanism explicitly incentivizes the alignment of workload types with hardware architectures, exemplified by mapping AI inference tasks to GPU-accelerated nodes (where ), as opposed to mismatched resources such as CPU-based nodes (where ).
- (2)
- (SLA Adherence): This term corresponds to the component in the system objective (Equation (6a)) and quantifies the satisfaction of QoS requirements, formulated as . A higher value signifies a more substantial safety margin for meeting the completion deadline.
- (3)
- (Migration Penalty): This term reflects the migration overhead , designed to penalize excessive service handovers, thereby promoting service continuity.
4.3. QMIX-Based Collaborative Scheduling Algorithm
5. Simulation and Analysis
5.1. Experimental Setup
5.2. Performance Evaluation
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Tang, Q.; Xie, N.; Kunz, T.; Luan, T.H.; Liu, Z. Computing Power Network: The Architecture of Convergence of Computing and Networking. IEEE Netw. 2021, 35, 166–173. [Google Scholar]
- David, K.; Berndt, H. 6G Vision and Requirements: Is There Any Need for Beyond 5G? IEEE Veh. Technol. Mag. 2018, 13, 72–80. [Google Scholar] [CrossRef]
- Gyawali, S.; Xu, S.; Qian, Y.; Hu, R.Q. Challenges and Solutions for Cellular Based V2X Communications. IEEE Commun. Surv. Tutorials 2021, 23, 222–255. [Google Scholar] [CrossRef]
- Sisinni, E.; Saifullah, A.; Han, S.; Jennehag, U.; Gidlund, M. Industrial Internet of Things: Challenges, Opportunities, and Directions. IEEE Trans. Ind. Informatics 2018, 14, 4724–4734. [Google Scholar] [CrossRef]
- Mao, Y.; You, C.; Zhang, J.; Huang, K.; Letaief, K.B. A Survey on Mobile Edge Computing: The Communication Perspective. IEEE Commun. Surv. Tutorials 2017, 19, 2322–2358. [Google Scholar] [CrossRef]
- Shi, W.; Cao, J.; Zhang, Q.; Li, Y.; Xu, L. Edge Computing: Vision and Challenges. IEEE Internet Things J. 2016, 3, 637–646. [Google Scholar] [CrossRef]
- Fang, F.; Feng, Y.; Ding, Z.; Zhang, H.; Chen, X.; Cheung, G. Joint Task Offloading and Resource Optimization in NOMA-Based Heterogeneous Mobile Edge Computing. IEEE Trans. Wirel. Commun. 2021, 20, 6926–6941. [Google Scholar]
- Deng, S.; Zhao, H.; Fang, W.; Yin, J.; Dustdar, S.; Zomaya, A.Y. Edge Intelligence: The Confluence of Edge Computing and Artificial Intelligence. IEEE Internet Things J. 2020, 7, 7457–7469. [Google Scholar] [CrossRef]
- Casteigts, A.; Flocchini, P.; Quattrociocchi, W.; Santoro, N. Time-Varying Graphs and Dynamic Networks. Int. J. Parallel Emergent Distrib. Syst. 2012, 27, 387–408. [Google Scholar] [CrossRef]
- Arshad, R.; ElSawy, H.; Sorour, S.; Al-Naffouri, T.Y.; Alouini, M.-S. Handover Management in 5G and Beyond: A Topology Aware Skipping Approach. IEEE Access 2016, 4, 9073–9081. [Google Scholar] [CrossRef]
- Hussein, M.K.; Mousa, M.H. Efficient Task Offloading for IoT-Based Applications in Fog Computing Using Ant Colony Optimization. IEEE Access 2020, 8, 37191–37201. [Google Scholar] [CrossRef]
- Luo, Q.; Li, C.; Luan, T.H.; Shi, W. Minimizing the Delay and Cost of Computation Offloading for Vehicular Edge Computing. IEEE Trans. Serv. Comput. 2022, 15, 2897–2909. [Google Scholar] [CrossRef]
- Al-Turjman, F. Hybrid Genetic Algorithm for IOMT-Cloud Task Scheduling. Wirel. Commun. Mob. Comput. 2022, 2022, 6604286. [Google Scholar] [CrossRef]
- Rashid, T.; Samvelyan, M.; Schroeder, C.; Farquhar, G.; Foerster, J.; Whiteson, S. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. In Proceedings of the 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018; Dy, J., Krause, A., Eds.; PMLR: Stockholm, Sweden, 2018; Volume 80, pp. 4295–4304. [Google Scholar]
- Tang, S.; Yu, Y.; Wang, H.; Wang, G.; Chen, W.; Xu, Z.; Guo, S.; Gao, W. A Survey on Scheduling Techniques in Computing and Network Convergence. IEEE Commun. Surv. Tutorials 2024, 26, 160–195. [Google Scholar] [CrossRef]
- Mach, P.; Becvar, Z. Mobile Edge Computing: A Survey on Architecture and Computation Offloading. IEEE Commun. Surv. Tutorials 2017, 19, 1628–1656. [Google Scholar] [CrossRef]
- Bi, S.; Zhang, Y.J. Computation Rate Maximization for Wireless Powered Mobile-Edge Computing with Binary Computation Offloading. IEEE Trans. Wirel. Commun. 2018, 17, 4177–4190. [Google Scholar] [CrossRef]
- Chen, X.; Jiao, L.; Li, W.; Fu, X. Efficient Multi-User Computation Offloading for Mobile-Edge Cloud Computing. IEEE/ACM Trans. Netw. 2016, 24, 2795–2808. [Google Scholar] [CrossRef]
- Wang, Y.; Sheng, M.; Wang, X.; Wang, L.; Li, J. Mobile-Edge Computing: Partial Computation Offloading Using Dynamic Voltage Scaling. IEEE Trans. Commun. 2016, 64, 4268–4282. [Google Scholar] [CrossRef]
- Cui, Y.; Zhang, D.; Li, H.; Qiang, H.; Zhao, H. Cooperative Task Offloading Strategy for Vehicular Edge Computing Based on Multi-Agent Deep Reinforcement Learning. Future Gener. Comput. Syst. 2026, 174, 107950. [Google Scholar] [CrossRef]
- Zhou, Y.; Yu, F.R.; Chen, J.; He, B. Joint Resource Allocation for Ultra-Reliable and Low-Latency Radio Access Networks with Edge Computing. IEEE Trans. Wirel. Commun. 2022, 21, 444–460. [Google Scholar] [CrossRef]
- Song, Z.; Liu, Y.; Sun, X. Joint Task Offloading and Resource Allocation for NOMA-Enabled Multi-Access Mobile Edge Computing. IEEE Trans. Commun. 2021, 69, 1548–1564. [Google Scholar] [CrossRef]
- Gao, Q.; Liu, C.; Wang, L.; Liu, Y.; Xu, Y. Blockchain-Based Heterogeneous Resource Configuration Scheme in Computing Power Network. Sci. Rep. 2025, 15, 21247. [Google Scholar] [CrossRef] [PubMed]
- Zhong, A.; Wu, D.; Yang, B.; Wang, R. Heterogeneous Resource Allocation with Latency Guarantee for Computing Power Network. Digit. Commun. Netw. 2025, 12, 25–37. [Google Scholar] [CrossRef]
- Liu, Y.; Mao, Y.; Liu, Z.; Ye, F.; Yang, Y. Joint Task Offloading and Resource Allocation in Heterogeneous Edge Environments. IEEE Trans. Mob. Comput. 2024, 23, 7318–7334. [Google Scholar] [CrossRef]
- Gounder, V.V.; Prakash, R.; Abu-Amara, H. Routing in LEO-Based Satellite Networks. In Proceedings of the 1999 IEEE Emerging Technologies Symposium on Wireless Communications and Systems, Richardson, TX, USA, 12–13 April 1999; pp. 22.1–22.6. [Google Scholar]
- Shi, K.; Zhang, X.; Zhang, S.; Li, H. Time-Expanded Graph Based Energy-Efficient Delay-Bounded Multicast Over Satellite Networks. IEEE Trans. Veh. Technol. 2020, 69, 10380–10384. [Google Scholar] [CrossRef]
- Han, Z.; Xu, C.; Zhao, G.; Wang, S.; Cheng, K.; Yu, S. Time-Varying Topology Model for Dynamic Routing in LEO Satellite Constellation Networks. IEEE Trans. Veh. Technol. 2023, 72, 3440–3454. [Google Scholar] [CrossRef]
- Yu, B.; Yin, H.; Zhu, Z. Spatio-temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI), Stockholm, Sweden, 13–19 July 2018; pp. 3634–3640. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Fu, R.; Zhang, Z.; Li, L. Using LSTM and GRU Neural Network Methods for Traffic Flow Prediction. In Proceedings of the 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC), Wuhan, China, 11–13 November 2016; pp. 324–328. [Google Scholar]
- Mao, H.; Alizadeh, M.; Menache, I.; Kandula, S. Resource Management with Deep Reinforcement Learning. In Proceedings of the 15th ACM Workshop on Hot Topics in Networks (HotNets); ACM: New York, NY, USA, 2016; pp. 50–56. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G. Human-level Control through Deep Reinforcement Learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y. Continuous Control with Deep Reinforcement Learning. In Proceedings of the International Conference on Learning Representations (ICLR), San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Tan, M. Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents. In Proceedings of the 10th International Conference on Machine Learning (ICML), Amherst, MA, USA, 27–29 June 1993; pp. 330–337. [Google Scholar]
- Sunehag, P.; Lever, G.; Gruslys, A.; Czarnecki, W.M.; Zambaldi, V.; Jaderberg, M. Value-Decomposition Networks for Cooperative Multi-Agent Learning. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), Stockholm, Sweden, 10–15 July 2018; pp. 2085–2087. [Google Scholar]
- Hausknecht, M.; Stone, P. Deep Recurrent Q-Learning for Partially Observable MDPs. In Proceedings of the AAAI Fall Symposium Series, Arlington, VA, USA, 12–14 November 2015. [Google Scholar]
- Wei, H.; Zheng, G.; Gayah, V.; Li, Z. CoLight: Learning Network-level Cooperation for Traffic Signal Control. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM), Beijing, China, 3–7 November 2019; pp. 1913–1922. [Google Scholar]
- Nguyen, T.T.; Nguyen, N.D.; Nahavandi, S. Deep Reinforcement Learning for Multi-Agent Systems: A Review of Challenges, Solutions, and Applications. IEEE Trans. Cybern. 2020, 50, 3826–3839. [Google Scholar] [CrossRef]
- Xue, J.; Wang, L.; Yu, Q.; Mao, P. Multi-Agent Deep Reinforcement Learning-Based Partial Offloading and Resource Allocation in Vehicular Edge Computing Networks. Comput. Commun. 2025, 234, 108081. [Google Scholar] [CrossRef]
- Krajzewicz, D.; Erdmann, J.; Behrisch, M.; Bieker, L. Recent Development and Applications of SUMO - Simulation of Urban MObility. Int. J. Adv. Syst. Meas. 2012, 5, 128–138. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proceedings of the Advances in Neural Information Processing Systems 32 (NeurIPS), Vancouver, BC, Canada, 8–14 December 2019; pp. 8024–8035. [Google Scholar]
- Lowe, R.; Wu, Y.I.; Tamar, A.; Harb, J.; Abbeel, P.; Mordatch, I. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. Adv. Neural Inf. Process. Syst. 2017, 30, 6379–6390. [Google Scholar]





| Parameter | Value |
|---|---|
| GRU Hidden Dimension | 64 |
| Mixing Network Hidden Dimension | 32 |
| Exploration Rate () | Annealed 1.0 → 0.05 |
| Discount Factor () | 0.99 |
| Replay Buffer Size | 5000 Episodes |
| Batch Size | 32 |
| Target Update Frequency | 200 Episodes |
| Learning Rate |
| Method | Avg. Latency (ms) | Migration Count | Resource Util. (%) |
|---|---|---|---|
| PSO | 45.2 ± 2.4 | 124 ± 6 | 68.5 ± 2.1 |
| ACO | 41.8 ± 1.9 | 98 ± 4 | 72.1 ± 1.7 |
| MADDPG | 35.4 ± 1.1 | 83 ± 3 | 79.8 ± 1.2 |
| Proposed (QMIX) | 32.5 ± 0.7 | 76 ± 2 | 84.3 ± 0.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Yuan, J.; Zhang, X.; Gong, K. Heterogeneous Computing Resources Scheduling Based on Time-Varying Graphs and Multi-Agent Reinforcement Learning. Future Internet 2026, 18, 168. https://doi.org/10.3390/fi18030168
Yuan J, Zhang X, Gong K. Heterogeneous Computing Resources Scheduling Based on Time-Varying Graphs and Multi-Agent Reinforcement Learning. Future Internet. 2026; 18(3):168. https://doi.org/10.3390/fi18030168
Chicago/Turabian StyleYuan, Jinshan, Xuncai Zhang, and Kexin Gong. 2026. "Heterogeneous Computing Resources Scheduling Based on Time-Varying Graphs and Multi-Agent Reinforcement Learning" Future Internet 18, no. 3: 168. https://doi.org/10.3390/fi18030168
APA StyleYuan, J., Zhang, X., & Gong, K. (2026). Heterogeneous Computing Resources Scheduling Based on Time-Varying Graphs and Multi-Agent Reinforcement Learning. Future Internet, 18(3), 168. https://doi.org/10.3390/fi18030168

