Q-Learning for Resource-Aware and Adaptive Routing in Trusted-Relay QKD Network
Abstract
1. Introduction
- Quantum key generation is fundamentally constrained by physical factors such as quantum bit error rate (QBER) and channel attenuation, leading to substantial heterogeneity in link performance. Links with insufficient availability are unable to sustain frequent transmissions, which can cause local key pool exhaustion and disrupt services.
- Key demand is tightly coupled with dynamic traffic patterns and often exhibits unpredictable and bursty behavior. During traffic surges or directional load concentrations, key resources on specific paths can be rapidly exhausted. In addition, imbalanced path selection and uneven request distribution may cause localized link overuse, leading to resource bottlenecks and reduced transmission efficiency.
- Existing routing strategies, such as shortest-path-first and maximum residual key, rely on static or instantaneous network states without modeling key pool dynamics. This limits their adaptability to resource fluctuations, leading to greater key transmission failures and degraded network performance. Routing in QKD networks must intelligently adapt to dynamic conditions, integrating key generation, consumption, and control feedback to ensure efficient and reliable key delivery.
- 1.
- We propose a resource-aware key scheduling framework for trusted-relay QKD networks that integrates real-time link state monitoring, online Q-Learning–based adaptive routing, and multidimensional path feasibility verification to ensure dynamic congestion avoidance and stable key distribution under time-varying traffic and network conditions.
- 2.
- We constructed a discrete-time model to characterize key dynamics, where the normalized occupancy ratio was uniformly discretized into states, and the action space was defined by adjacent neighbor sets. A composite reward function, integrating occupancy deviation, consumption penalty, and generation incentive, enabled adaptive balancing between network load and key resource replenishment.
- 3.
- The simulation results demonstrate that the proposed method substantially enhances trusted-relay QKD network performance by improving transmission efficiency, optimizing resource utilization, and effectively mitigating congestion to ensure robust stability under high-load conditions.
2. Related Work
3. System Model and Problem Formulation
3.1. QKDN Model and Constraints
3.1.1. Network Topology Model
3.1.2. Quantum Key Pool Model
3.1.3. Routing Constraints
- 1.
- Connectivity constraint: For any pair of adjacent nodes along the routing path, a key management link must exist to guarantee topological continuity.
- 2.
- Key transmission rate constraint: At any time t, for each link , the cumulative key transmission rate of all active tasks traversing this link must not exceed :
- 3.
- Key pool size constraint: To ensure path viability, all links must maintain a positive key pool size at time t, satisfying
3.1.4. Evaluation Metrics and Definitions
- 1.
- Average Key Distribution Time: quantifies the mean time required to deliver successfully distributed quantum keys during a single simulation cycle. It reflects the overall efficiency of routing and scheduling strategies in dynamic trusted-relay QKD networks. The total delivery time for the k-th distribution is defined asThe queuing delay, driven by total key volume and effective path key transmission rate, reflects network load and dominates variations in , while fixed transmission and random perturbation delays capture link characteristics and environmental fluctuations:
- 2.
- Maximum Key Utilization: The highest utilization among active links at the moment of path selection:This metric captures the link that is most critically depleted in terms of key pool size. A value of indicates severe congestion and highlights the need for dynamic rerouting or pool scaling. E is the set of all links in the network, with denoting the total number of links. The active link set is defined as
- 3.
- Proportion of Over-Threshold Key Resource Links: Given a utilization threshold , the over-threshold link set includes all links whose utilization exceeds :
- 4.
- Network Throughput: The network throughput quantifies the efficiency of key distribution by measuring the total number of quantum keys successfully delivered per unit time across all links. It reflects the effectiveness of routing and key management strategies and provides insight into the trade-off between performance and congestion. Let denote the total number of keys successfully transmitted during a single distribution task, and the corresponding duration. The network throughput is then given by
- 5.
- Key Distribution Failure Ratio: This metric quantifies the fraction of failed key distribution attempts:
3.2. Online Q-Learning Model Design
3.2.1. State Space Definition
- 1.
- Normalization: Given the known maximum size of the key pool on each link , the remaining key amount is first normalized into a key occupancy ratio:
- 2.
- Interval Partitioning: The range of occupancy ratios is uniformly divided into M intervals of equal width , forming a discrete set of states:
- 3.
- State Mapping: The continuous occupancy ratio is mapped to a discrete state label using
3.2.2. Action Space Definition
3.2.3. Reward Function Design
3.2.4. Policy and Update Mechanism
Algorithm 1 Episodic online Q-Learning for key scheduling. |
|
4. Results and Performance Discussion
4.1. Simulation Setting
4.1.1. Simulation Parameters
4.1.2. Benchmark Methods
4.2. Experimental Results and Analysis
4.2.1. Reward Convergence Analysis
4.2.2. Total Quantum Key Volume: Performance Variation
4.2.3. Single-Node Key Transmission Rate: Performance Variation
4.2.4. Network Scale: Performance Variation
4.2.5. Link Failure Probability: Performance Variation
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Appendix A.1. QKD Network Architecture
Appendix A.2. Relay Routing Mechanism in QKD Network
- 1.
- In the initialization phase, the source node Alice (node 1) identifies relay nodes within its communication range. Utilizing a classical network routing discovery protocol, responses are collected from various nodes. All responding nodes are included in the candidate set for the next-hop relay nodes, thereby initializing the routing algorithm.
- 2.
- According to the routing algorithm, Alice (node 1) generates a global key (communication key) and simultaneously negotiates a local key with node 2 based on the BB84 quantum key distribution protocol. The global key is sequentially forwarded by trusted-relay nodes. In the example shown in Figure 1, node 2 is selected to relay the global key .
- 3.
- Under frequently changing network topologies, the routing algorithm dynamically senses the availability of key resources within the network to select candidate relay nodes, ensuring efficient key distribution. A detailed description of this algorithm is provided in Section 4.
- 4.
- Node 1 and node 2 share the local key . Node 2 receives the encrypted message from node1 and decrypts it using the shared key to recover . Concurrently, node 2 negotiates a local key with node 4 and encrypts the transmission as .
References
- Zeydan, E.; De Alwis, C.; Khan, R.; Turk, Y.; Aydeger, A.; Gadekallu, T.R.; Liyanage, M. Quantum technologies for beyond 5G and 6G networks: Applications, opportunities, and challenges. arXiv 2025, arXiv:2504.17133. [Google Scholar] [CrossRef]
- ITU-T Y.3800; Overview of Quantum Key Distribution. ITU: Geneva, Switzerland, 2019.
- Wootters, W.K.; Zurek, W.H. A single quantum cannot be cloned. Nature 1982, 299, 802–803. [Google Scholar] [CrossRef]
- Ekert, A.K. Quantum cryptography based on Bell’s theorem. Phys. Rev. Lett. 1991, 67, 661. [Google Scholar] [CrossRef] [PubMed]
- Krelina, M. Quantum technology for military applications. EPJ Quantum Technol. 2021, 8, 24. [Google Scholar] [CrossRef]
- Liang, Q. Employing quantum key distribution for enhancing network security. In Proceedings of the 2023 International Conference on Image, Algorithms and Artificial Intelligence (ICIAAI 2023), Singapore, 11–13 August 2023; Atlantis Press: Dordrecht, The Netherlands, 2023; pp. 518–526. [Google Scholar]
- Sahu, S.K.; Mazumdar, K. State-of-the-art analysis of quantum cryptography: Applications and future prospects. Front. Phys. 2024, 12, 1456491. [Google Scholar] [CrossRef]
- Tsai, C.W.; Yang, C.W.; Lin, J.; Chang, Y.C.; Chang, R.S. Quantum key distribution networks: Challenges and future research issues in security. Appl. Sci. 2021, 11, 3767. [Google Scholar] [CrossRef]
- Yu, X.; Liu, Y.; Zou, X.; Cao, Y.; Zhao, Y.; Nag, A.; Zhang, J. Secret-key provisioning with collaborative routing in partially-trusted-relay-based quantum-key-distribution-secured optical networks. J. Light. Technol. 2022, 40, 3530–3545. [Google Scholar] [CrossRef]
- Zhou, H.; Lv, K.; Huang, L.; Ma, X. Quantum network: Security assessment and key management. IEEE/ACM Trans. Netw. 2022, 30, 1328–1339. [Google Scholar] [CrossRef]
- Elliott, C.; Colvin, A.; Pearson, D.; Pikalo, O.; Schlafer, J.; Yeh, H. Current status of the DARPA quantum network. In Proceedings of the Quantum Information and Computation III, Orlando, FL, USA, 29–30 March 2005; SPIE: Bellingham, WA, USA, 2005; Volume 5815, pp. 138–149. [Google Scholar]
- Peev, M.; Pacher, C.; Alléaume, R.; Barreiro, C.; Bouda, J.; Boxleitner, W.; Debuisschert, T.; Diamanti, E.; Dianati, M.; Dynes, J.F.; et al. The SECOQC quantum key distribution network in Vienna. New J. Phys. 2009, 11, 075001. [Google Scholar] [CrossRef]
- Sasaki, M.; Fujiwara, M.; Ishizuka, H.; Klaus, W.; Wakui, K.; Takeoka, M.; Miki, S.; Yamashita, T.; Wang, Z.; Tanaka, A.; et al. Field test of quantum key distribution in the Tokyo QKD Network. Opt. Express 2011, 19, 10387–10409. [Google Scholar] [CrossRef]
- Chen, Y.A.; Zhang, Q.; Chen, T.Y.; Cai, W.Q.; Liao, S.K.; Zhang, J.; Chen, K.; Yin, J.; Ren, J.-G.; Chen, Z.; et al. An integrated space-to-ground quantum communication network over 4,600 kilometres. Nature 2021, 589, 214–219. [Google Scholar] [CrossRef] [PubMed]
- Bi, L.; Miao, M.; Di, X. A dynamic-routing algorithm based on a virtual quantum key distribution network. Appl. Sci. 2023, 13, 8690. [Google Scholar] [CrossRef]
- Yu, J.; Qiu, S.; Yang, T. Optimization of hierarchical routing and resource allocation for power communication networks with QKD. J. Light. Technol. 2023, 42, 504–512. [Google Scholar] [CrossRef]
- Tanizawa, Y.; Takahashi, R.; Dixon, A.R. A routing method designed for a quantum key distribution network. In Proceedings of the 2016 Eighth International Conference on Ubiquitous and Future Networks (ICUFN), Vienna, Austria, 5–8 July 2016; pp. 208–214. [Google Scholar]
- Han, W.; Wu, X.; Zhu, Y.; Zhou, X.; Xu, C. QKD network routing research based on trust relay. J. Mil. Commun. Technol. 2013, 34, 43–48. [Google Scholar]
- Ma, C.; Guo, Y.; Su, J.; Yang, C. Hierarchical routing scheme on wide-area quantum key distribution network. In Proceedings of the 2016 2nd IEEE International Conference on Computer and Communications (ICCC), Chengdu, China, 14–17 October 2016; pp. 2009–2014. [Google Scholar]
- Zhang, H.; Quan, D.; Zhu, C.; Li, Z. A quantum cryptography communication network based on software defined network. ITM Web Conf. 2018, 17, 01008. [Google Scholar] [CrossRef]
- Li, M.; Quan, D.; Zhu, C. Stochastic routing in quantum cryptography communication network based on cognitive resources. In Proceedings of the 2016 8th International Conference on Wireless Communications & Signal Processing (WCSP), Yangzhou, China, 13–15 October 2016; pp. 1–4. [Google Scholar]
- Yang, C.; Zhang, H.; Su, J. Quantum key distribution network: Optimal secret-key-aware routing method for trust relaying. China Commun. 2018, 15, 33–45. [Google Scholar] [CrossRef]
- Han, Q.; Yu, L.; Zheng, W.; Cheng, N.; Niu, X. A novel QKD network routing algorithm based on optical-path-switching. J. Inf. Hiding Multim. Signal Process. 2014, 5, 13–19. [Google Scholar]
- Yang, C.; Zhang, H.; Su, J. The QKD network: Model and routing scheme. J. Mod. Opt. 2017, 64, 2350–2362. [Google Scholar] [CrossRef]
- Ma, C.; Guo, Y.; Su, J. A multiple paths scheme with labels for key distribution on quantum key distribution network. In Proceedings of the 2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, China, 25–26 March 2017; pp. 2513–2517. [Google Scholar]
- Mehic, M.; Maurhart, O.; Rass, S.; Voznak, M. Implementation of quantum key distribution network simulation module in the network simulator NS-3. Quantum Inf. Process. 2017, 16, 253. [Google Scholar] [CrossRef]
- Yao, J.; Wang, Y.; Li, Q.; Mao, H.; El-Latif, A.A.A.; Chen, N. An efficient routing protocol for quantum key distribution networks. Entropy 2022, 24, 911. [Google Scholar] [CrossRef]
- Cao, Y.; Zhao, Y.; Yu, X.; Wu, Y. Resource assignment strategy in optical networks integrated with quantum key distribution. J. Opt. Commun. Netw. 2017, 9, 995–1004. [Google Scholar] [CrossRef]
- Zhao, Y.; Cao, Y.; Wang, W.; Wang, H.; Yu, X.; Zhang, J.; Tornatore, M.; Wu, Y.; Mukherjee, B. Resource allocation in optical networks secured by quantum key distribution. IEEE Commun. Mag. 2018, 56, 130–137. [Google Scholar] [CrossRef]
- Cao, Y.; Zhao, Y.; Colman-Meixner, C.; Yu, X.; Zhang, J. Key on demand (KoD) for software-defined optical networks secured by quantum key distribution (QKD). Opt. Express 2017, 25, 26453–26467. [Google Scholar] [CrossRef]
- Cao, Y.; Zhao, Y.; Wu, Y.; Yu, X.; Zhang, J. Time-scheduled quantum key distribution (QKD) over WDM networks. J. Light. Technol. 2018, 36, 3382–3395. [Google Scholar] [CrossRef]
- Dong, K.; Zhao, Y.; Yu, X.; Nag, A.; Zhang, J. Auxiliary graph based routing, wavelength, and time-slot assignment in metro quantum optical networks with a novel node structure. Opt. Express 2020, 28, 5936–5952. [Google Scholar] [CrossRef]
- Dong, K.; Zhao, Y.; Nag, A.; Yu, X.; Zhang, J. Distributed subkey-relay-tree-based secure multicast scheme in quantum data center networks. Opt. Eng. 2020, 59, 065102. [Google Scholar] [CrossRef]
- Zou, X.; Yu, X.; Zhao, Y.; Nag, A.; Zhang, J. Collaborative routing in partially-trusted relay based quantum key distribution optical networks. In Proceedings of the 2020 Optical Fiber Communications Conference and Exhibition (OFC), San Diego, CA, USA, 8–12 March 2020; pp. 1–3. [Google Scholar]
- Amer, O.; Krawec, W.O.; Wang, B. Efficient routing for quantum key distribution networks. In Proceedings of the 2020 IEEE International Conference on Quantum Computing and Engineering (QCE), Denver, CO, USA, 12–16 October 2020; pp. 137–147. [Google Scholar]
- Yu, X.; Liu, X.; Liu, Y.; Nag, A.; Zou, X.; Zhao, Y.; Zhang, J. Multi-path-based quasi-real-time key provisioning in quantum-key-distribution enabled optical networks (QKD-ON). Opt. Express 2021, 29, 21225–21239. [Google Scholar] [CrossRef]
- Li, S.; Yu, X.; Zhao, Y.; Wang, H.; Zhou, X.; Zhang, J. Routing and wavelength allocation in spatial division multiplexing based quantum key distribution optical networks. In Proceedings of the 2020 International Conference on Computing, Networking and Communications (ICNC), Big Island, HI, USA, 17–20 February 2020; pp. 268–272. [Google Scholar]
- Gandelman, S.P.; Maslennikov, A.; Rozenman, G.G. Hands-On Quantum Cryptography: Experimentation with the B92 Protocol Using Pulsed Lasers. Photonics 2025, 12, 220. [Google Scholar] [CrossRef]
- Dehingia, K.; Dutta, N. Hybrid Quantum Key Distribution Framework: Integrating BB84, B92, E91, and GHZ Protocols for Enhanced Cryptographic Security. Concurr. Comput. Pract. Exp. 2025, 37, e70221. [Google Scholar] [CrossRef]
- Sharma, P.; Gupta, S.; Bhatia, V.; Prakash, S. Deep reinforcement learning-based routing and resource assignment in quantum key distribution-secured optical networks. IET Quantum Commun. 2023, 4, 136–145. [Google Scholar] [CrossRef]
- Al-Mohammed, H.A.; Al-Kuwari, S.; Kuniyil, H.; Farouk, A. Towards Scalable Quantum Key Distribution: A Machine Learning-Based Cascade Protocol Approach. arXiv 2024, arXiv:2409.08038. [Google Scholar] [CrossRef]
- Horoschenkoff, P.; Rödiger, J.; Kegreiß, S. OptiNode: A ML Algorithm for Optimal Trusted Node Positioning in QKD Networks. In Proceedings of the 2025 International Conference on Optical Network Design and Modeling (ONDM), Pisa, Italy, 6–9 May 2025; pp. 1–6. [Google Scholar]
- Larouci, N.E.H.; Sahraoui, S.; Djeffal, A. Machine learning based routing protocol (MLBRP) for Mobile Internet of Things networks. J. Netw. Syst. Manag. 2025, 33, 67. [Google Scholar] [CrossRef]
- Mammeri, Z. Reinforcement learning based routing in networks: Review and classification of approaches. IEEE Access 2019, 7, 55916–55950. [Google Scholar] [CrossRef]
- Malhotra, S.; Yashu, F.; Saqib, M.; Mehta, D.; Jangid, J.; Dixit, S. Deep Reinforcement Learning for Dynamic Resource Allocation in Wireless Networks. arXiv 2025, arXiv:2502.01129. [Google Scholar] [CrossRef]
- Donatus, R.C.; Ter, K.; Ajayi, O.O.; Udekwe, D. Multi-Agent Reinforcement Learning in Intelligent Transportation Systems: A Comprehensive Survey. arXiv 2025, arXiv:2508.20315. [Google Scholar] [CrossRef]
- Liu, Q.; Ma, Y. Communication resource allocation method in vehicular networks based on federated multi-agent deep reinforcement learning. Sci. Rep. 2025, 15, 30866. [Google Scholar] [CrossRef]
- Meng, X.; Yu, X.; Chen, W.; Zhao, Y.; Zhang, J. Residual-adaptive key provisioning in quantum-key-distribution enhanced internet of things (q-iot). In Proceedings of the 2020 International Wireless Communications and Mobile Computing (IWCMC), Limassol, Cyprus, 15–19 June 2020; pp. 2022–2027. [Google Scholar]
Parameter | Value | Description/Purpose |
---|---|---|
Number of nodes | 50–200 | Network scale; affects path lengths, congestion, and failure probability |
Topology model | Barabási–Albert (BA) model | Simulates a hub-centric scale-free topology to analyze the impact of node degree distribution on routing |
Average node degree | 4 | Network connectivity and path redundancy; influences routing feasibility and robustness |
Link maximum key transmission rate | 100 keys/s | Maximum key transmission rate per link; fundamental resource constraint for routing |
Single-node key transmission rate | 10–100 keys/s | Rate at which an individual node transmits keys |
Link key pool size | 1000 keys | Maximum key storage per link; ensures sustained service under transient high load |
Initial key consumption rate | 50 keys/s | Initial link key consumption; influences early-stage routing decisions and reward computation |
Initial key generation rate | 50 keys/s | Initial link key generation; guides early exploration in training |
Minimum usable key count | 1 key | Minimum key threshold below which a link is considered unavailable |
Ideal occupancy ratio | 0.50 | Target key utilization per link; guides load-balancing strategy |
Overload detection threshold | 0.65 | Key occupancy level above which a link is considered overloaded; used in scheduling and penalty calculation |
Time step duration | 1 s | Simulation time step; defines the interval for network updates and agent decisions |
Network load | 500–3500 | Total network key request volume; used to evaluate algorithm performance under varying load conditions |
Granularity level of discretization M | 10 | State discretization granularity for Q-learning; balances representation accuracy and computational complexity |
Traffic modulation pattern | Sinusoidal ±20% | Periodic traffic fluctuations; simulates environmental or user-induced load variations |
Link drift standard deviation | 0.10 | Random link performance drift; enhances training robustness under stochastic conditions |
Link failure probability | 0.01–0.5 per step | Probability of link failure at each time step; tests routing resilience |
Link recovery probability | 0.01 per step | Probability of link restoration; models dynamic network recovery |
Transmission delay per hop | 0.002 s | Per-hop propagation delay; contributes to total distribution time and reward computation |
Random transmission jitter | ±0.005 s | Random variation in transmission delay; simulates network or physical-layer noise |
Random seed | 2025 | Ensures reproducibility of simulation results |
Max steps per episode | 150 | Number of time steps per training episode; represents a complete key distribution cycle |
Initial Q-table | [0, 0.01] | Initial Q-values for the learning agent; balances early exploration and reward scaling |
Max training episodes | 50 | Total training episodes; determines accumulated experience and policy convergence |
Parameter | Ep.1–5 | Ep.6–15 | Ep.19–23 | Ep.24–30 |
---|---|---|---|---|
(exploration rate) | 1.0 → 0.5 (linear) | 0.5 → 0.1 (exp) | 0.1 | 0.01 |
(learning rate) | 0.01 | 0.01 → 0.005 | 0.005 | 0.002 |
(load balance weight) | 0.5 | 0.6 | 0.4 | 0.5 |
(penalty weight) | 0.5 | 0.4 | 0.6 | 0.5 |
(quality reward weight) | 0.2 | 0.3 | 0.3 | 0.3 |
(discount factor) | 0.8 | 0.9 | 0.95 | 0.95 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hao, Y.; Xie, Y.; Gao, W.; Tang, J. Q-Learning for Resource-Aware and Adaptive Routing in Trusted-Relay QKD Network. Photonics 2025, 12, 969. https://doi.org/10.3390/photonics12100969
Hao Y, Xie Y, Gao W, Tang J. Q-Learning for Resource-Aware and Adaptive Routing in Trusted-Relay QKD Network. Photonics. 2025; 12(10):969. https://doi.org/10.3390/photonics12100969
Chicago/Turabian StyleHao, Yuanchen, Yuheng Xie, Wenpeng Gao, and Jianjun Tang. 2025. "Q-Learning for Resource-Aware and Adaptive Routing in Trusted-Relay QKD Network" Photonics 12, no. 10: 969. https://doi.org/10.3390/photonics12100969
APA StyleHao, Y., Xie, Y., Gao, W., & Tang, J. (2025). Q-Learning for Resource-Aware and Adaptive Routing in Trusted-Relay QKD Network. Photonics, 12(10), 969. https://doi.org/10.3390/photonics12100969