Q-Learning for Resource-Aware and Adaptive Routing in Trusted-Relay QKD Network
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsPros:
-
The problem formulation as a Markov Decision Process is clearly described, and the reward function is well designed to balance key consumption, generation, and occupancy.
-
The simulation setup is carefully constructed, including multiple network sizes and realistic traffic/load variations. Comparative results against Dijkstra provide convincing evidence of the proposed method’s advantages.
Cons:
-
The related work section needs a more thorough review. A clearer positioning of this study relative to other reinforcement learning or machine learning–based routing approaches in both quantum and classical networks would be beneficial. Some recent works on deep reinforcement learning for communication networks could also be briefly discussed.
-
Scalability and limitations should be addressed. Although the authors evaluate 50-node and 100-node networks, real-world QKD deployments may involve much larger scales. A discussion on the computational complexity of the proposed method is needed.
-
More detailed information on the simulation setup is required. Details such as code availability, parameter initialization, and training duration would be useful to ensure reproducibility.
-
Figures 9 and 10 contain multiple subplots, but the current layout is crowded and difficult to read. The authors should consider reorganizing these plots, improving axis labels and legends, and possibly splitting them into separate figures for better readability.
-
Additional baseline experiments are needed. Currently, only Dijkstra is used as a baseline; including at least one more learning-based or prior method for comparison would strengthen the evaluation.
The paper is generally well-structured and clearly written overall, but a thoughtful proof-reading could help. Some figures (e.g., Figures 5–8) are dense and could benefit from additional annotations or clearer legends to help readers interpret the trends.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe manuscript presents a well-written and timely study on adaptive routing in trusted-relay quantum key distribution (QKD) networks using a Q-learning framework. The authors clearly motivate the need for resource-aware scheduling in QKD systems, where imbalanced key resource utilization and congestion can severely degrade performance. The proposed Markov Decision Process formulation, combined with a carefully designed reward function, represents an innovative approach. The simulation results convincingly demonstrate improvements over classical Dijkstra routing, particularly in terms of latency, key utilization, and failure ratio. The work is well-situated within the growing research area of QKD network optimization and will be of interest to both the quantum communications and network science communities.
Before recommendation, some matters should addressed:
- Figure Captions: The captions (e.g., Figures 3–10) could be expanded to provide more context without requiring the reader to search the main text. The manuscript would benefit from explicit statements about what each axis represents, how quantities are normalized, and the main takeaway of the figure. For instance, in Figure. 1 you should explain the logical flow in the caption, same goes for figure 2. Figures 4, 5, 6, 7, 8, 9, 10 are very hard/impossible to read, you need to increase the fonts and revise them.
- What is shown in Figures A1 and A2? Give a full explanation in the caption.3.
- Figures 4–8, the captions should briefly summarize the trends observed.
- The related work section is thorough, some very recent developments in QKD network designs I encourage the authors to reference additional up-to-date literature in analogous areas such as
- Gandelman, et al. "Hands-On Quantum Cryptography: Experimentation with the B92 Protocol Using Pulsed Lasers." Photonics. Vol. 12. No. 3. MDPI, 2025.
- Dehingia, et al. "Hybrid Quantum Key Distribution Framework: Integrating BB84, B92, E91, and GHZ Protocols for Enhanced Cryptographic Security." Concurrency and Computation: Practice and Experience 37.21-22 (2025): e70221.
5. The simulation setup is solid, but more explanation of the chosen parameters (e.g., why Barabási–Albert topologies were used, or the rationale behind the reward function weights α, β, γ) would enhance the clarity.
6. In Section 4.2, some definitions (e.g., Eq. 15 for average hop count) would benefit from more intuitive explanation before the formal definition.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe paper at hand describes a machine-learning assisted, congestion-aware route-finding strategy for key forwarding in QKD networks. A variety of metrics are introduces and used to compare the proposed algorithm with non-congestion-aware, shortest path Dijkstra in a simulated network.
Although the paper is nice to read and it is the best paper with this idea that I reviewed so far, I still suggest a revision for the following reasons:
- The paper misses its full potential, as the benefits of machine learning and congestion awareness are mixed. The authors should provide an additional non-machine-learning algorithm that is congestion aware to separate in the investigation the benefits of machine learning and congestion awareness. E.g., path weights could be changed to reflect congestion and force Dijkstra to select a different route.
- Section 4 reiterates over and over that congestion-aware routing outperforms non-congestion aware routing on 10 pages. The authors should try to highlight additional insights. The quantitative analysis of the simulated network is scientifically not interesting, as it is only an example network. More focus should be put on the qualitative analysis, e.g., how the network reacts to changes, the impact of the parameters that can be tuned, etc.
- The notation should be cleaned up. There are several variables that appear to mean the same. E.g., the set of nodes is defined as V and N, if I'm not mistaken.
Q is being used for both, the filling of the key pool and the Q value of the algorithm.
Comments on the algorithm:
- The proposed algorithm treats key forwarding in a similar way to IP packet routing. This does not exploit or allow that key forwarding can be done on all path segments at the same time in parallel or hierarchical as key forwarding is not the same as packet routing. The final key is a function of the keys on the different segments and not necessarily a key that starts at a source and arrives at a destination. The description of key forwarding in Appendix A.2 is only one (suboptimal) variant. Path length is not necessarily linearly connected with key distribution time (Section 4.2.1.)
- Equation (12) suggest, that key forwarding should be decided on relative key availability per link and not absolute key availability. A low capacity link which is not being used so far would have a higher weight compared to a high capacity link that is already in use, although the remaining capacity is higher than that one of the low capacity link.
- How does the algorithm react to changes in the network?
Minor comments
- Page 5 equation (2) could be enhanced with "max{…, 0}" to reflect that the amount of remaining usable keys cannot be negative.
- The filling of the key pool should not be called "key capacity of a link" as capacity in the context of communication is being used differently.
- There are definitions missing around equation (13). Especially the introduction of Q and what it is being used for.
- In Section 4.1, description of Figure 3. Does the graph really represent virtual key channels on the key management layer, or does it described links on the QKD layer?
- What are epsilon gray areas on line 461?
- The concept of delay is not properly introduced. Are there key request queues maintained over different discrete times? How does the distribution delay of Dijkstra grow? Do key requests wait until they are scheduled?
The English is good and the paper is nice to read. There are only some minor comments in this regard:
- Subject-verb agreement has to be fixed when using "QKD network". Consider using "QKD networks", "QKD networking", or "a QKD network". (page 2 line 66, page 3 line 90, page 3 line 99, etc.)
- Page 3 line 99: "demand"
- Page18 Line 549: "distribution"
Author Response
Please see the attachment
Author Response File:
Author Response.pdf
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsThere are significant improvements in the manuscript in this revision.
I have a question regarding Eq. 16, should you not use the same parenthesis from each side?
Other than that, I think it is ready for publication.
Author Response
Comments 1:I have a question regarding Eq. 16, should you not use the same parenthesis from each side?
Response 1:We appreciate the reviewer’s careful observation. The use of mixed brackets in Eq. (16) is intentional, as the intervals are defined in the left-closed, right-open form [a,b). This ensures that the partition of [0,1) is complete and non-overlapping, so that every occupancy ratio is uniquely assigned to exactly one interval. Using consistent brackets on both sides (e.g., [a,b] or (a,b)) would either introduce overlaps or exclude boundary points. Therefore, we retain the current formulation for mathematical rigor and clarity.
Reviewer 3 Report
Comments and Suggestions for AuthorsAll my previous points were addressed.
Page 6, Line 222/235/239: As we are in the field of communication engineering, I object to the use of bandwidth for data rate or key generation rate in this context.
Author Response
Comments 1:Page 6, Line 222/235/239: As we are in the field of communication engineering, I object to the use of bandwidth for data rate or key generation rate in this context.
Response 1:We sincerely appreciate the reviewer’s comment regarding the use of “bandwidth.” In communication engineering, this term usually refers to the physical channel spectrum, while our original intent was to denote the rate of key transmission over a logical link or path in QKD network. To clarify, we have revised all instances of “bandwidth constraint” to “key transmission rate constraint” and “bandwidth” to “key transmission rate.” Specifically, on Page 6, Lines 222, 235, and 239, these revisions are highlighted in the marked-up manuscript. We hope this resolves any potential ambiguity.

