Coded Caching Optimization in Dual Time-Scale Wireless Networks: An Advantage Actor–Critic Learning Approach
Abstract
1. Introduction
- Under a dual-time-scale network model, we formulate the dynamic coded caching optimization problem to jointly consider long-term content demand changes and short-term MU movement patterns, thereby providing a more realistic and tractable basis for designing efficient caching strategies.
- We model the coded caching optimization problem as a Markov decision process and design an A2C-based coded caching algorithm with a caching capacity constraint. By incorporating an advantage function, the proposed algorithm reduces variance in gradient estimation and accelerates convergence, which ensures more efficient caching decisions.
- Through extensive simulations, we evaluate the convergence of the proposed A2C-based caching algorithm and compare its performance with several benchmark caching schemes. The results demonstrate that our algorithm significantly outperforms baseline caching schemes.
2. Related Work
3. System Models
3.1. Network Model
3.2. Dual Time-Scale Model
3.3. Content Caching Model
3.4. Average Delay Cost Model
3.5. Problem Formulation
4. Coded Caching Algorithm Design
4.1. Markov Decision Process
4.2. Advantage Actor–Critic-Based Coded Caching Algorithm
| Algorithm 1 Advantage Actor–Critic (A2C) based Coded Caching Algorithm |
Initialization: Randomly initialize the actor network parameter and critic network parameter , caching state space , learning rates and , reward discount factor Loop for each episode
End Loop |
4.3. Convergence and Complexity Analysis
5. Performance Evaluation
5.1. Experimental Setups
5.2. Benchmark Caching Schemes
- (1)
- A2C-based Uncoded Caching Algorithm: When content popularity remains unknown within each large time-scale slot, caching decisions are made leveraging the A2C algorithm framework, with each content being either fully cached or not cached at all.
- (2)
- Proximal Policy Optimization (PPO)-based Coded Caching Algorithm: This caching algorithm is based on an actor–critic model and utilizes the exact same problem formulation as our proposed method. The key difference is that PPO trains its actor network using a clipped surrogate objective function to optimize the caching policy.
- (3)
- Informed Greedy Caching (IGC) Algorithm: When the exact content requests of MUs are known a priori for the upcoming large time-scale slot, SBSs jointly determine a caching solution by greedily maximizing the amount of requesting content using coded caching. Although not a globally optimal solution, IGC provides a strong, information-assisted benchmark and effectively approximates a practical upper-bound performance for comparison.
- (4)
- Most Popular Caching (MPC) Algorithm [33]: At the start of each large time-scale slot, the macro base station makes caching decisions based on the historical popularity of the content. That is, each SBS n caches contents in descending order of the content popularity until the caching capacity of bits is reached.
- (5)
- Random Caching (RC) Algorithm: In each large time-scale slot, each SBS n randomly caches bits of contents.
5.3. Results and Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| MUs | Mobile users |
| QoS | Quality of Service |
| SBSs | Small cell base stations |
| MBSs | Macro base stations |
| A2C | Advantage actor–critic |
| MDS | Maximum-distance separable |
| DRL | Deep reinforcement learning |
| TD | Temporal-difference |
| PPO | Proximal policy optimization |
| IGC | Informed greedy caching |
| MPC | Most popular caching |
| RC | Random caching |
References
- Poularakis, K.; Tassiulas, L. Code, cache and deliver on the move: A novel caching paradigm in hyper-dense small-cell networks. IEEE Trans. Mob. Comput. 2016, 16, 675–687. [Google Scholar] [CrossRef]
- Yu, J.; Zhai, C.; Dai, H.; Zheng, L.; Li, Y. Cooperative edge-caching based transmission with minimum effective delay in heterogeneous cellular networks. Comput. Commun. 2024, 228, 107928. [Google Scholar] [CrossRef]
- Wu, Q.; Wang, W.; Fan, P.; Fan, Q.; Zhu, H.; Letaief, K. Cooperative edge caching based on elastic federated and multi-agent deep reinforcement learning in next-generation networks. IEEE Trans. Netw. Serv. Manag. 2024, 21, 4179–4196. [Google Scholar] [CrossRef]
- Dong, Y.; Guo, S.; Wang, Q.; Yu, S.; Yang, Y. Content caching-enhanced computation offloading in mobile edge service networks. IEEE Trans. Veh. Technol. 2022, 71, 872–886. [Google Scholar] [CrossRef]
- Lin, N.; Wang, Y.; Zhang, E.; Wang, S.; Al-Dubai, A.; Zhao, L. User preferences-based proactive content caching with characteristics differentiation in HetNets. IEEE Trans. Sustain. Comput. 2025, 10, 333–344. [Google Scholar] [CrossRef]
- Mehrizi, S.; Chatterjee, S.; Chatzinotas, S.; Ottersten, B. Online spatiotemporal popularity learning via variational bayes for cooperative caching. IEEE Trans. Commun. 2020, 68, 7068–7082. [Google Scholar] [CrossRef]
- Hu, Z.; Fang, C.; Wang, Z.; Tseng, S.; Dong, M. Many-objective optimization-based content popularity prediction for cache-assisted cloud–edge–end collaborative IoT networks. IEEE Internet Things J. 2024, 11, 1190–1200. [Google Scholar] [CrossRef]
- Jiang, W.; Feng, G.; Qin, S.; Yum, T.; Cao, G. Multi-agent reinforcement learning for efficient content caching in mobile D2D networks. IEEE Trans. Wirel. Commun. 2019, 18, 1610–1622. [Google Scholar] [CrossRef]
- Zhang, W.; Zhang, G.; Mao, S. Deep-reinforcement-learning-based joint caching and resources allocation for cooperative MEC. IEEE Internet Things J. 2024, 11, 12203–12215. [Google Scholar] [CrossRef]
- Xue, Z.; Liu, C.; Liao, C.; Han, G.; Sheng, Z. Joint service caching and computation offloading scheme based on deep reinforcement learning in vehicular edge computing systems. IEEE Trans. Veh. Technol. 2023, 72, 6709–6722. [Google Scholar] [CrossRef]
- Maddah-Ali, M.; Niesen, U. Fundamental limits of caching. IEEE Trans. Inf. Theory 2014, 60, 2856–2867. [Google Scholar] [CrossRef]
- Wang, R.; Peng, X.; Zhang, J.; Letaief, K. Mobility-aware caching for content-centric wireless networks: Modeling and methodology. IEEE Commun. Mag. 2016, 54, 77–83. [Google Scholar] [CrossRef]
- Choi, Y.; Lim, Y. Deep reinforcement learning for edge caching with mobility prediction in vehicular networks. Sensors 2023, 23, 1732. [Google Scholar] [CrossRef]
- Jiang, W.; Feng, G.; Qin, S. Optimal cooperative content caching and delivery policy for heterogeneous cellular networks. IEEE Trans. Mob. Comput. 2017, 16, 1382–1393. [Google Scholar] [CrossRef]
- Xiao, Z.; Shu, J.; Jiang, H.; Lui, J.; Min, G.; Liu, J. Multi-objective parallel task offloading and content caching in D2D-aided MEC networks. IEEE Trans. Mob. Comput. 2023, 22, 6599–6615. [Google Scholar] [CrossRef]
- Ren, J.; Guo, C. A game theoretic approach for D2D assisted uncoded caching in IoT networks. Future Internet 2025, 17, 423. [Google Scholar] [CrossRef]
- Hu, C.; Zeng, J. A service-oriented optimization framework for edge caching with revenue maximization and QoS guarantees. IEEE Trans. Serv. Comput. 2025, 18, 2559–2573. [Google Scholar] [CrossRef]
- Xu, X.; Tao, M. Modeling, analysis, and optimization of coded caching in small-cell networks. IEEE Trans. Commun. 2017, 65, 3415–3428. [Google Scholar] [CrossRef]
- Jiang, Y.; Wang, B.; Zheng, F.; Bennis, M.; You, X. Joint MDS codes and weighted graph-based coded caching in fog radio access networks. IEEE Trans. Wirel. Commun. 2022, 21, 6789–6802. [Google Scholar] [CrossRef]
- Yin, F.; Liu, Q.; Liu, D.; Zhang, Y.; Jin, L.; Li, S. Joint coded caching and resource allocation for multimedia service in space-air-ground integrated networks. IEEE Trans. Commun. 2024, 72, 6839–6853. [Google Scholar] [CrossRef]
- Zhang, Z.; St-Hilaire, M.; Wei, X.; Dong, H.; Saddik, A. How to cache important contents for multi-modal service in dynamic networks: A DRL-based caching scheme. IEEE Trans. Multimed. 2024, 26, 7372–7385. [Google Scholar] [CrossRef]
- Wu, X.; Li, J.; Xiao, M.; Ching, P.; Poor, H. Multi-agent reinforcement learning for cooperative coded caching via homotopy optimization. IEEE Trans. Wirel. Commun. 2021, 20, 5258–5272. [Google Scholar] [CrossRef]
- Gu, S.; Sun, X.; Yang, Z.; Huang, T.; Xiang, W.; Yu, K. Energy-aware coded caching strategy design with resource optimization for satellite-UAV-vehicle-integrated networks. IEEE Internet Things J. 2022, 9, 5799–5811. [Google Scholar] [CrossRef]
- Tian, B.; Wang, L.; Chang, Z.; Xu, L.; Fei, A. Multi-Agent DRL-Based Coded Caching and Resource Allocation in UAV-Assisted Networks. IEEE Trans. Wirel. Commun. 2025. [Google Scholar] [CrossRef]
- Cao, T.; Zhang, N.; Wang, X.; Huang, J. Mobility-aware cooperative caching in vehicular edge computing based on federated distillation and deep reinforcement learning. IEEE Trans. Netw. Sci. Eng. 2025, 12, 4416–4432. [Google Scholar] [CrossRef]
- Liu, R.; Wang, J.; Zhang, B. High definition map for automated driving: Overview and analysis. J. Navig. 2020, 73, 324–341. [Google Scholar] [CrossRef]
- Luo, J.; Song, J.; Zheng, F.; Gao, L.; Wang, T. User-centric UAV deployment and content placement in cache-enabled multi-UAV networks. IEEE Trans. Veh. Technol. 2022, 71, 5656–5660. [Google Scholar] [CrossRef]
- Sutton, R.; Barto, A. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Yang, Z.; Liu, Y.; Chen, Y.; Jiao, L. Learning automata based Q-learning for content placement in cooperative caching. IEEE Trans. Commun. 2020, 68, 3667–3680. [Google Scholar] [CrossRef]
- Xiao, H.; Zhang, X.; Hu, Z.; Zheng, M.; Liang, Y. A collaborative cache allocation strategy for performance and link cost in mobile edge computing. J. Supercomput. 2024, 80, 22885–22912. [Google Scholar] [CrossRef]
- Tang, C.; Ding, Y.; Xiao, S.; Huang, Z.; Wu, H. Collaborative service caching, task offloading, and resource allocation in caching-assisted mobile edge computing. IEEE Trans. Serv. Comput. 2025, 18, 1966–1981. [Google Scholar] [CrossRef]
- Konda, V.; Tsitsiklis, J. On actor-critic algorithms. SIAM J. Control Optim. 2003, 42, 1143–1166. [Google Scholar] [CrossRef]
- Pappas, N.; Chen, Z.; Dimitriou, I. Throughput and delay analysis of wireless caching helper systems with random availability. IEEE Access 2018, 6, 9667–9678. [Google Scholar] [CrossRef]









| Notation | Description |
|---|---|
| Set of all base stations | |
| Set of MUs | |
| Set of contents | |
| Number of base stations traversed by MU k | |
| Set of large time-scale slots | |
| The i-th small time-scale slot within the t-th large time-scale slot | |
| Duration of a small time-scale slot | |
| Caching capacity of SBS n | |
| B | Size of each content |
| M | Total number of small time-scale slots in each large time-scale slot |
| Transmission rate between MU k and base station n | |
| Unit transmission delay cost between MU k and base station n | |
| Proportion of content f cached by base station n in the t-th large time-scale slot |
| Algorithm | Training Time (Seconds) | Energy Consumption (Wh) | Convergence Iterations | Average Reward |
|---|---|---|---|---|
| A2C-based Uncoded Caching | 76.88 | 1.5374 | 300+ | −615,189.06 |
| A2C-based Coded Caching | 100.78 | 2.0909 | 200+ | −308,052.9 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ren, J.; Guo, C. Coded Caching Optimization in Dual Time-Scale Wireless Networks: An Advantage Actor–Critic Learning Approach. Appl. Sci. 2025, 15, 11915. https://doi.org/10.3390/app152211915
Ren J, Guo C. Coded Caching Optimization in Dual Time-Scale Wireless Networks: An Advantage Actor–Critic Learning Approach. Applied Sciences. 2025; 15(22):11915. https://doi.org/10.3390/app152211915
Chicago/Turabian StyleRen, Jiajie, and Chang Guo. 2025. "Coded Caching Optimization in Dual Time-Scale Wireless Networks: An Advantage Actor–Critic Learning Approach" Applied Sciences 15, no. 22: 11915. https://doi.org/10.3390/app152211915
APA StyleRen, J., & Guo, C. (2025). Coded Caching Optimization in Dual Time-Scale Wireless Networks: An Advantage Actor–Critic Learning Approach. Applied Sciences, 15(22), 11915. https://doi.org/10.3390/app152211915

