A Review of Routing and Resource Optimization in Quantum Networks

Shaon, Md. Shazzad Hossain; Akter, Mst Shapna

doi:10.3390/electronics15030557

Open AccessEditor’s ChoiceReview

A Review of Routing and Resource Optimization in Quantum Networks

by

Md. Shazzad Hossain Shaon

and

Mst Shapna Akter

^*

Department of Computer Science and Engineering, Oakland University, Rochester, MI 48309, USA

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(3), 557; https://doi.org/10.3390/electronics15030557

Submission received: 5 December 2025 / Revised: 19 January 2026 / Accepted: 20 January 2026 / Published: 28 January 2026

(This article belongs to the Special Issue AI-Driven Secure Communications and Networking in 6G Integrated Satellite-Terrestrial Networks)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Quantum computing is a new discipline that uses the ideas of quantum physics to do calculations that are not possible with conventional computers. Quantum bits, called qubits, could exist in superposition states, making them suitable for parallel processing in contrast to traditional bits. When it comes to addressing complex challenges like proof simulation, optimization, and cryptography, quantum entanglement and quantum interference provide exponential improvements. This survey focuses on recent advances in entanglement routing, quantum key distribution (QKD), and qubit management for short- and long-distance quantum communication. It studies optimization approaches such as integer programming, reinforcement learning, and collaborative methods, evaluating their efficacy in terms of throughput, scalability, and fairness. Despite improvements, challenges remain in dynamic network adaptation, resource limits, and error correction. Addressing these difficulties necessitates the creation of hybrid quantum–classical algorithms for efficient resource allocation, hardware-aware designs to improve real-world deployment, and fault-tolerant architecture. Therefore, this survey suggests that future research focus on integrating quantum networks with existing classical infrastructure to improve security, dependability, and mainstream acceptance. This connection has significance for applications that require secure communication, financial transactions, and critical infrastructure protection.

Keywords:

quantum computing; entanglement routing; quantum networks; cryptography; quantum key distribution

1. Introduction

Quantum communication networks present an important development in security for data distribution by exploiting quantum physics principles, such as superposition, entanglement, and the no-cloning theorem [1]. At the core of these networks lies quantum entanglement, a phenomenon in which two or more particles become inherently connected, so that the state of one particle immediately impacts the state of the other, regardless of the distance between them [2]. This non-local correlation is the basis of quantum communication protocols such as QKD [3], quantum teleportation [4], and distributed quantum computing [5]. Beyond the standard teleportation protocol, recent studies have explored learning-assisted cooperative teleportation using parameterized quantum circuits (PQCs) to optimize teleportation fidelity and coordination among nodes [4]. The concept of quantum networks broadens the scope of quantum communication beyond point-to-point links. A quantum network is constructed from linked nodes, each containing quantum processors, memory, and communication pathways that can generate, store, and distribute entangled states throughout the network [6]. These networks have the potential to revolutionize disciplines like secure communication, where QKD provides unconditionally secure key exchange that is resistant to eavesdropping [7], and distributed quantum computing, where entanglement supports the development of large-scale quantum computers [8]. However, the practical implementation of quantum networks presents major challenges. One of the key challenges is the fragility of quantum states, which are prone to decoherence and noise. The no-cloning theorem [9] states that quantum information cannot be duplicated or amplified, and hence error correction and fault tolerance are required for reputable communication. Furthermore, the randomness of entanglement creation and the restricted range of direct entanglement dispersion provide substantial challenges for scaling quantum networks [10]. To address these restrictions, researchers have proposed using quantum repeaters, which dissolve long-distance communication into shorter segments linked by entangled states, therefore expanding the range of entanglement distribution [11]. Another significant difficulty in quantum networks is entanglement routing, which is the process of selecting the best method for distributing entangled states across nodes. Unlike traditional routing, which is based on deterministic protocols, entanglement routing requires taking advantage of the stochastic nature of entanglement production, the dynamic availability of quantum resources, and the necessity to maintain high fidelity of entangled states [12]. Traditional routing techniques, which were created for classical networks, are unsuitable for quantum settings due to their inherent restrictions. As a result, there is an increasing demand for adaptive and intelligent routing algorithms that may maximize entanglement dispersion while reducing resource use.

In recent years, machine learning (ML) [13,14] has emerged as an effective method for addressing challenging optimization issues in quantum systems. Reinforcement learning (RL) is a class of sequential decision-making methods, and deep reinforcement learning (DRL) is a subset of RL that uses deep neural networks to approximate value functions or policies [15,16,17,18]. RL algorithms, which learn optimum policies using trial and error, are ideal for dynamic and unpredictable settings involving quantum networks [19]. For example, RL has been used to optimize entanglement swapping techniques, in which intermediary nodes conduct steps to broaden the distribution of entanglements [20]. Similarly, DRL, which combines RL with deep neural networks, has been used to create quantum routing agents capable of learning the best channels for entanglement distribution in large-scale networks [21]. The incorporation of machine learning into quantum network design represents a significant advancement in solving the issues of entanglement routing and resource optimization. Researchers have shown that harnessing the flexibility and learning characteristics of RL and DRL improves entanglement quality, resource efficiency, and network scalability [22]. For example, recent research has demonstrated that encoded quantum Bell pairs and generalized circuits may dramatically improve the fidelity of entangled states over long distances, even in the face of noise and decoherence [23]. These innovations provide opportunities for the establishment of practical, large-scale quantum networks that can allow secure communication and distributed quantum computation.

Despite these optimistic improvements, several major problems remain. One of the most pressing concerns is the effective distribution of entanglement among network nodes, which is necessary for allowing protocols such as QKD, quantum teleportation, and distributed quantum computing [5]. The entanglement distribution is fundamentally probabilistic and sensitive to decoherence, noise, and losses, especially over long distances [10]. While quantum repeaters have been proposed to address these concerns, their implementation is still constrained by the availability of quantum resources such as memory and high-fidelity gates, as well as by the difficulty in controlling entanglement swapping procedures [11,24]. Furthermore, incorporating machine learning techniques into quantum systems required a thorough assessment of the trade-offs between computing efficiency and learning performance [25]. Addressing these issues would need multidisciplinary collaboration among quantum scientists, technologists, and engineers, as well as ongoing developments in quantum hardware and software. Therefore, the primary objective of the present paper is to provide an in-depth understanding of the fundamental concepts, significant elements, and advanced developments in quantum communication. Regarding analytical studies that focus on evaluation or comparative analysis, this study aims to summarize the core concepts, innovations, and improvements underlying quantum computing. The analysis will include the following:

Introduce fundamental concepts, including quantum entanglement, QKD, and quantum teleportation, which are the foundations of quantum communication.
Explore cutting-edge advancements in quantum networking, with an emphasis on future technologies, machine learning applications, and novel routing algorithms.
Provide a comprehensive overview of the present status of the area, including trends, difficulties, and future prospects for study and development.

Ultimately, the purpose of the present study is to motivate deeper advancements and research in quantum networking, therefore contributing to the development of viable, large-scale quantum networks capable of unlocking emerging opportunities in secure communication, computer networking, and other domains.

Background and Preliminaries

QKD is a communication paradigm that enables two distant parties to establish a shared secret key by exploiting fundamental quantum properties. In QKD, any eavesdropping attempt disturbs quantum states and can be detected through observable error patterns, which provides information-theoretic security under standard assumptions. In practice, QKD protocols differ in state preparation and measurement strategies, but all rely on the same principle that quantum information cannot be measured without leaving a trace. Quantum teleportation is a protocol for transferring an unknown quantum state from a sender to a receiver without physically transmitting the qubit carrying that state. Teleportation requires pre-shared entanglement between the communicating parties and classical communication to convey measurement outcomes. As a result, the receiver can reconstruct the original state using simple correction operations, while the overall process remains consistent with causality because classical signaling is still required. Distributed quantum computing refers to executing quantum computation across multiple spatially separated nodes that each possess limited local quantum resources. Entanglement acts as a key enabling resource that allows remote gates, coordination, and state sharing across nodes. Therefore, scalable distributed quantum computing depends on reliable entanglement generation, storage, and delivery over a network, which motivates core networking problems such as entanglement routing, repeater placement, and resource-aware scheduling under noise and decoherence.

With these basic concepts in place, the following section reviews the recent literature and organizes prior studies by their primary methodological focus to clarify how different approaches address routing, resource constraints, fidelity robustness, and scalability in quantum networks.

2. Related Papers

Recent research on quantum networking has applied diverse methodologies to improve entanglement distribution, routing, reliability, and scalability under practical constraints, such as probabilistic link success, limited resources, and decoherence/noise. To present the field with a clearer structure, we organize prior studies by their primary methodological focus and relate them to the taxonomy shown in Figure 1.

(1) RL/DRL-driven adaptive routing and control.

A growing body of work leverages reinforcement learning to adapt routing and entanglement operations to dynamic and uncertain quantum conditions. Abreu et al. proposed a foundational RL framework for adaptive entanglement routing, demonstrating improved efficiency in probabilistic quantum settings [26]. Roik et al. examined RL adaptability under frequently changing network topologies, emphasizing scalability in small-to-medium quantum networks [27]. Islam et al. introduced RL-driven entanglement swapping strategies that mitigate route failures and improve network reliability [28]. Moving toward more complex settings, Le et al. developed DRL agents for multi-path entanglement distribution to improve performance in larger networks [29], and further proposed DRL architectures tailored to quantum constraints, explicitly considering decoherence and resource limitations [30]. Huang et al. advanced this direction through distributed decision-making where nodes coordinate to improve entanglement channels and reduce resource bottlenecks [31]. Viewed together and summarized in Figure 1, these studies reflect a progression from single-policy routing improvements toward topology-aware, failure-resilient, and distributed control, while also revealing recurring challenges related to coordination overhead and generalization across varying network conditions.

(2) Resource-aware optimization and multi-objective routing.

Beyond adaptive learning, several studies explicitly address resource scarcity as a core barrier to practical quantum networking. Zeng et al. designed a multi-objective routing protocol that balances fidelity, latency, and resource utilization, providing a structured basis for deployment-oriented decision-making [32]. Resource-centric approaches, including collaborative optimization and multi-path distribution strategies, similarly highlight the need to manage constrained entanglement and repeater resources while maintaining stable performance [27,29]. This line of work complements RL/DRL methods by making trade-offs explicit, but it also underscores those improvements in that one objective can degrade another, reinforcing the need for mechanisms that can balance multiple constraints while remaining responsive to changing conditions.

(3) Fidelity, noise robustness, and secure communication.

Another major research direction focuses on robustness under noise and decoherence to preserve usable entanglement fidelity and enable secure communication. Shubha et al. showed that robust long-distance entanglement distribution can be supported by reducing noise-induced fidelity loss through encoded Bell pairs and generalized circuits [33]. Islam et al. validated single-qubit QKD methods under realistic noise settings, demonstrating feasibility for secure communication beyond idealized assumptions [34]. As outlined in Figure 1, these studies shift attention toward physical-layer reliability and security, while also suggesting that fidelity-preserving techniques may introduce additional overhead that must be reconciled with network-layer efficiency goals.

(4) Scalability and deployment-oriented strategies.

Scaling quantum networks requires both algorithmic advances and infrastructure-aware planning. Heuristic methods for optimal repeater placement have been explored to support low-cost, long-distance quantum communication, offering practical deployment guidance with lower computational complexity [35]. At the same time, DRL-based routing and distributed optimization approaches indicate scalable control mechanisms that can better adapt to evolving network states [30,31]. As shown in Figure 1, these strategies represent complementary paths toward real-world scalability, highlighting a tension between the simplicity and deployability of heuristics and the adaptivity of learning-based control as networks grow in size and variability.

Across these themes, RL/DRL approaches [26,28,30,31,35] have become central for adapting to stochastic links and dynamic configurations, while multi-objective and resource-aware methods [27,29,32] emphasize explicit trade-offs under constrained resources. Fidelity- and noise-robust techniques [33,34] highlight the increasing importance of realistic physical-layer constraints for dependable and secure operation. As synthesized in Figure 1, the literature collectively motivates integrated approaches that can jointly balance adaptivity, resource efficiency, fidelity robustness, and scalability for practical quantum network implementations.

To position our review within the broader literature, we compare it with representative existing surveys and overview papers on quantum networking. Table 1 summarizes the scope and focus of these works and highlights what they cover well and what they leave open. This comparison clarifies the need for the present review and motivates our emphasis on a thematic organization and cross-study synthesis across routing, QKD-oriented methods, and deployment-oriented perspectives.

PRISMA-ScR Reporting Statement: This scoping review was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) guidelines to ensure transparent and comprehensive reporting of the methodology. The literature selection and screening process, including the identification of studies, exclusion of duplicates, abstract-level screening, and final inclusion of 64 studies, is summarized in Figure 2.

3. Methodological Analysis and Comparative Discussion

Focusing on prior studies, this section provides a methodological analysis and comparative discussion of representative approaches in quantum networking. Rather than describing studies independently, we compare them using common criteria, including the target objectives (e.g., fidelity, throughput, request success rate, latency), the network assumptions (topology scale, noise/decoherence model, and resource constraints, such as memory and Einstein–Podolsky–Rosen pair (EPR) availability), and the methodological mechanism (RL/DRL-driven control, heuristic/optimization-based routing, and fidelity or QKD-oriented designs). This unified perspective enables the identifying of research trends and clarifies how existing studies address recurring challenges such as scalability, limited resources, dynamic topology, and robustness under realistic noise. Figure 3 summarizes the overall analysis flow adopted in this review, and the following subsections discuss strengths, limitations, and practical implications across method families.

3.1. RL-Based Quantum Routing

The RL-based approach is essential for improving quantum circuit performance in noisy, intermediate-scale quantum (NISQ) devices. Efficient routing reduces the requirement for error-prone swap operations, resulting in reduced circuit depth and higher algorithmic success rates. RL adapts routing techniques to various quantum designs and circuit needs, overcoming the limits of inflexible heuristic approaches. This subsection summarizes Q-learning-based reinforcement learning (qRL) [26] and concludes with a brief comparative takeaway to support the systematic comparison of Section 3.

In the qRL protocol [26], Q-learning is used to improve routing decisions in dynamic quantum networks (Figure 4). The objective is to maximize end-to-end entanglement integrity and request success rates while reducing resource consumption (qubits and EPR pairs). In qRL, an agent learns routing policies via Q-learning by interacting with a simulated quantum network environment [42,43]. The environment consists of nodes, communication channels, EPR pairs, and fidelity measures that provide real-time feedback on network status. The state (S) represents the current network setup, including channel quality, EPR pair availability, and the quantity of qubits per node. Based on this state, the agent chooses certain actions (As), such as selecting a certain route or assigning EPR pairs to connections, to optimize routing decisions. The reward (R) is determined based on the fidelity improvement of the chosen route and the effective use of network resources, stimulating the agent to prefer high-quality pathways while reducing resource consumption. The agent employs a policy (π), especially an ε-greedy strategy, that balances exploration (random actions) with exploitation (determining the most prevalent option) for adaptation in dynamic quantum networks. This method provides qRL to dynamically optimize routing decisions, producing high-fidelity communication and efficient resource use in quantum entanglement networks. We include this schematic to explicitly map the RL components (state–action–reward) to quantum networking primitives used in entanglement routing. The approach integrates two critical quantum networking processes, the end-to-end (E2E) entanglement rate and fidelity after purification. For a route with N repeaters, the E2E entanglement ratio is determined as follows:

E 2 E = y . q^{l o g N}

(1)

where γ is the EPR generation rate and q is the per-stage success probability of entanglement extension (including entanglement swapping, as applicable). Furthermore, to improve fidelity using two low-fidelity EPR combinations (f1,f2), the following is used:

F (f_{1}, f_{2}) = \frac{f_{1} \cdot f_{2}}{f_{1} \cdot f_{2} + (1 - f_{1}) (1 - f_{2})}

(2)

The Q-value for state-action pairs is updated using

q (s, a) \leftarrow q (s, a) + α [R (s, a) + γ \begin{matrix} \max \\ a^{'} \end{matrix} q (s^{'}, a^{'}) - q (s, a)]

(3)

where α denotes the learning rate, γ represents the discount factor, and R(s,a) is the reward.

However, the authors use multiple experimental settings where the network architecture follows a lattice structure, expressed as (G = (V, E)). This structure consists of teleportation nodes and an embedded system (ES), which stipulates communication and processing. The configurations of the experimental setup are presented in Table 1, and the key parameters (memory, initial fidelity, and EPR success probability) are varied across four configurations.

According to the study, the proposed qRL technique outperforms all baseline approaches, with around 20% greater fidelity in Config1, which models high-resource networks. Notably, in Config4, which correlates to low-resource situations, qRL maintains a fidelity of greater than 0.6, whereas the qDijkstra technique falls below 0.4. In bigger networks with n > 30, redundant routes improve system stability and dependability. Moreover, qRL accurately maintains a request success rate of more than 80% under a 3000-request demand, even in the resource-constrained Config4 environment. Traditional approaches, such as R0 and qDijkstra, show severe performance deterioration, with success rates falling below 50% under intense network loads. This demonstrates the stability and scalability of the qRL framework in dynamic network situations.

Comparative takeaway: Compared with heuristic baselines (qDijkstra and rule-based routing), qRL improves adaptivity under changing resource and load conditions, which helps sustain both fidelity and request success in constrained regimes. The trade-off is additional learning and tuning overhead (e.g., reward design and exploration strategy), and performance generalization depends on the reported simulation assumptions.

In another study, the authors propose proactive entanglement swapping and entanglement caching to improve quantum routing algorithms [28], particularly SEER [44] and REPS [45]. The technique successfully stores idle entangled connections for future use, while applying RL to strategically swap entanglements in high-demand network segments. The workflow of the proposed approach in [28] is shown in Figure 5.

The study analyzes and optimizes the quantum routing process using several mathematical formulations, such as the Bellman equation, swapped segment, topology matrix, request matrix, and segment vector. All the equations can be stated as follows:

q_{1} (s_{1}, a_{1}) \leftarrow q_{1} (s_{1}, a_{1}) + α [R_{1} (s_{1}, a_{1}) + γ \begin{matrix} \max \\ a^{'} \end{matrix} q_{1} (s_{1}^{'}, a_{1}^{'}) - q_{1} (s_{1}, a_{1})]

(4)

From Equation (4), the authors used Q-Learning update rules, where α is the learning rate and γ is the discount factor.

R_{1} (s_{1}, a_{1})

is the reward for action

a_{1}

in state

s_{1}

.

The proposed approach’s reward system is based on three important components: positive rewards (PRs), negative rewards (NRs), and penalties (Ps).

(\binom{P R = R > 0}{\begin{matrix} N R = R < 0 \\ P = R < 0 \end{matrix}})

(5)

In this framework, PRs are awarded when proactively exchanged segments are successfully employed in future inquiries, hence promoting effective entanglement resource allocation. In contrast, NRs are awarded when deliberately exchanged segments expire without being used, resulting in wasted resources. Furthermore, a P is applied when the algorithm fails to decide on a segment that is further required, affecting the overall routing. Afterwards, in the final steps, the authors used deep Q-Learning state, such as

S_{m}^{t} = [G^{t}, R^{t}, P_{m}^{t}]

(6)

Here,

G^{t}

represents the topology matrix, which serves as the adjacency matrix of entangled links.

R^{t}

denotes the request matrix, capturing the number of requests between node pairs at a given time step. Additionally,

P_{m}^{t}

refers to the segment vector, a binary vector that indicates the candidate segments available for routing entanglement.

Furthermore, the authors use several features to formulate their proposed approach. Table 2 provides the specific setups of the experimental conditions utilized in that study.

The proposed DQRL technique improves request success rates for both SEER and REPS. In the instance of SEER, DQRL raises the success rate by 48%, from 50% to 74%. Similarly, REPS’ success rate improves by 61%, jumping from 45% to 73%. Additionally, the duration of entanglement has a significant impact on the effectiveness of entanglement caching. When entangled links endure actively for six or more time slots, caching improves request success rates by 12% to 19%. Furthermore, the chance of effective entanglement switching is critical for network performance. When the swap probability is increased to 0.9, the system achieves a 55% improvement over the baseline, highlighting the importance of effective entanglement swapping in improving quantum network stability.

Comparative takeaway: Compared with baseline SEER/REPS and heuristic proactive swapping, this RL-guided caching and swapping strategy provides higher request success rates by explicitly targeting high-demand segments and reusing idle entanglement resources. The approach is particularly effective when entanglement lifetime and swap success probability are sufficiently high, suggesting a trade-off in which gains depend on underlying physical-layer reliability and assumed timing parameters.

In another study, the authors propose a deep quantum routing agent (DQRA), a DRL architecture for routing entanglement requests in quantum networks, as illustrated in Figure 6 [30]. The proposed hybrid framework combines a scheduling neural network (SNN) with a qubit-preserving shortest-path strategy to maximize the number of requests served within each time window. DRL is primarily used for dynamic request scheduling to improve resource allocation across qubits and repeaters. The authors also argue that the approach supports scalability by keeping training and routing complexity polynomial with respect to network size, which targets large-scale quantum networks. However, the experimental setup of the study presented in Table 3 and the overall mathematical terms can be expressed as

R^{t} = n_{r}^{t} α + (| D | - n_{r}^{t}) β f + γ R^{t + 1} (1 - f)

(7)

The reward at step t serves to maximize adjustment requests;

n_{r}^{t}

represents the number of resolved requests,

α

denotes the reward of success, and the condition is

α > 0

.

β

is the penalty for failure, where the condition provides

β < 0

. γ ∈ [0, 1] refers to the discount factor, and f serves as the binary terminal for the episode’s completion. For the Q-Learning update, the study uses

Q^{t} (s, a) \leftarrow (1 - l r) Q^{t} (s, a) + l r (Q^{t + 1} (s^{'}, a^{'}) + γ R^{t})

(8)

In Equation (8), lr means the learning rate, and

Q^{t + 1} (s^{'}, a^{'}) +

is a target networks estimation. However, the study utilized the edge weight metric for shortest path weights using qubit ability:

ω_{i, j}^{t} = \frac{1}{m i n c_{i}^{t}, c_{j}^{t}}

(9)

where lower weights prefer nodes with greater qubit availability.

The assessment findings show that both DQN and DRN have high success rates in qubit-rich situations, ranging from 80% to 90%. Even under severe settings when c_i = 2, they retain a success rate of 60% to 75%, exceeding baseline approaches by 20% to 40%. In terms of scalability, training time grows polynomially with network size, whereas routing time remains linear, even for enormous networks like 25 × 25 nodes. In terms of training dynamics, DQN displays a faster convergence rate compared to DRN, but tends to plateau earlier. Additionally, the SNN scales quadratically with network size, indicating an increase in parameter count as network complexity rises.

Comparative takeaway: Relative to heuristic baselines that rely on fixed shortest-path scheduling, DQRA adds adaptivity through DRL-based scheduling and explicitly prioritizes qubit availability in routing decisions. This improves success rate under both qubit-rich and -constrained regimes, but introduces additional training overhead and a growing parameter footprint as network size increases.

In another study [27], they examine proximal policy optimization (PPO)-based routing for quantum networks, which optimizes the singlet fraction (f) to mitigate quantum-specific errors such as amplitude damping and phase noise [46]. This improves error resilience and uses a 6G-inspired low-density parity-check topology to provide reliable quantum communication [47]. The overall procedure can be illustrated in Figure 7. However, the study presents many formulations:

F (\emptyset) = m a x (φ | \emptyset | φ)

(10)

f = 2 F + \frac{1}{3}

(11)

Here,

f

denotes the fidelity, which quantifies entanglement quality in terms of teleportation performance. The study defines both an intermediate (preliminary) reward and a final reward, and uses a threshold value of 0.8 in the reward design. In addition, the quantum-state model considers Werner states under amplitude-damping noise and phase-noise effects.

The proposed method is compared to conventional baselines such as Dijkstra and Monte Carlo, using a reinforcement learning framework designed to maximize routing efficiency. However, the overall setup is demonstrated in Table 4.

In convoluted networks with 16 nodes and white noise, PPO beats Monte Carlo by up to 13,000×. While Dijkstra performs well for additive mistakes, it fails when quantum noise is introduced. PPO successfully uses error cancellation for amplitude damping, needing just 60,000 actions, as opposed to 546 million for the same network size in Monte Carlo. PPO responds to reversible phase shifts, lowering the number of steps required to 39,000 compared to Monte Carlo’s 60 million. In dynamic network situations, PPO retains 93.4% functionality, considerably outperforming Monte Carlo’s 33.1% under changing error scenarios.

Comparative takeaway: Compared with classical shortest-path routing (Dijkstra) and sampling-heavy optimization (Monte Carlo), PPO introduces adaptive decision-making that explicitly accounts for quantum noise models, which improves robustness in amplitude damping and phase-noise settings. The main limitation is the reliance on extensive training interactions and the dependence of performance on the assumed noise model and topology configuration

In another study, Le et al. [29] propose a deep reinforcement learning approach for entanglement routing in quantum networks and introduce the Deep Quantum Routing Agent (DQRA), a learning-based routing system designed to improve network performance by increasing the number of successfully served communication requests within a given time window. The method combines a deep neural network for request scheduling with a qubit-preserving shortest-path strategy for route selection. The quantum network is modeled as a graph G = (V, E), where V denotes quantum nodes and E denotes entangled links. The routing objective is formulated to maximize the effective routing rate by selecting efficient paths for source–destination pairs under resource constraints. The DQRA framework represents the environment state using network topology, available qubit capacities, and pending requests. Actions correspond to scheduling and routing decisions for entanglement requests, while the reward function is designed to encourage serving more requests and penalize inefficient decisions or early termination. The authors train and evaluate the agent using deep learning-based variants, including a DQN-based scheduler and an additional DRN-based model, to support decision-making under dynamic conditions. The reported experiments on grid-based topologies indicate that the approach achieves high request success rates even under limited qubit capacity and demonstrates favorable scaling trends as network size increases.

Comparative takeaway: Compared with heuristic scheduling and static shortest-path routing, this DRL-based design improves adaptivity by jointly optimizing scheduling and routing decisions under resource constraints, while introducing added training complexity and dependence on the assumed topology and evaluation setup

3.2. Heuristic & Algorithmic Approaches

In this subsection we explore non-ML methods like heuristic algorithms and collaborative routing strategies. One study [31] proposes the Collaboratively Optimized Selection of Paths (COSP) method [31], which is intended to improve entanglement routing in quantum networks by maximizing predicted throughput, service rate, and quantum resource use. The technique follows a systematic approach, beginning with a network model where the quantum network is represented as a graph G (V, E, C), and where V comprises quantum processors and repeaters, E represents quantum channels, and C reflects channel use. The entanglement process is separated into four stages: request submission, route selection, link status broadcasting, and entanglement swapping, which ensures systematic distribution of quantum resources. The COSP algorithm consists of multiple optimization strategies. The P1 Algorithm organizes requests into a multilevel queue based on priority and source-destination distance, reducing resource conflicts while ensuring fairness by prioritizing the furthest and closest pairs within the same priority level. The P2 Algorithm selects candidate paths using Dijkstra’s and Yen’s algorithms [48], incorporating linear programming to estimate resource consumption. Additionally, Monte Carlo Tree Search (MCTS) is utilized to optimize resource allocation, treating the problem as a sequential decision-making process where each source-destination pair chooses a path to maximize expected throughput. The P4 algorithm further enhances reliability by implementing a “fail-retransmit” mechanism, allowing failed entanglement attempts to be another opportunity for success in the final request queue [49]. COSP uses probabilistic models to calculate predicted throughput (E) and resource usage (U) for each path, depending on entanglement success rates and qubit consumption. The MCTS method is implemented as a Markov Decision method (MDP), with the goal of maximizing anticipated rewards (throughput) via optimum path selection. The UCB1 formula is adjusted to balance exploration and exploitation, which improves search tree decision-making efficiency. The COSP method has been statistically proven to improve network performance, greatly increasing predicted throughput and service rates, especially in high-concurrency settings, while ensuring fairness and effective resource usage. Table 5 shows the way that, overall, the parameters and performance metrics can be organized.

However, the study proved that COSP showed a 50% increase in average predicted throughput over earlier methods, particularly in high concurrency conditions. COSP also considerably increased service rates, with a 55% increase. This suggests more efficiency in answering communication requirements.

Comparative takeaway: Compared with RL-based routing methods, COSP relies on explicit path search and resource estimation, which can provide more interpretable decision logic and avoid training overhead. However, the approach may incur higher online computation due to multi-stage optimization and MCTS, and performance depends on the assumed topology model and probabilistic resource estimates.

In another study [35], the authors address the challenge of deploying quantum repeaters in large-scale quantum networks at reasonable cost. Quantum repeaters are required to extend communication distance by mitigating fidelity degradation during transmission. To reduce the number of repeaters while preserving end-to-end connectivity, the study proposes two heuristic placement strategies, the Multi-Center Approach (MCA) and the Single-Center Approach (SCA). These heuristics aim to obtain near-optimal solutions substantially faster than Integer Linear Programming (ILP), which can become computationally expensive for large networks [50]. The experimental settings and evaluation metrics are summarized in Table 6.

MCA operates in two phases: center selection and center connection. In center selection, nodes are chosen as candidate repeater locations based on connectivity and coverage, with additional centers added when the distance between leaf-node pairs exceeds the maximum transmission range Lmax. The process continues until the network is covered by the selected centers. In the center connection phase, inter-center links are established, and intermediate nodes are introduced when distances exceed Lmax. The study further explores connection strategies, including an MST-based method that adds repeaters where needed and a more exhaustive variant that considers additional nodes to improve connectivity [50,51,52]. Figure 8 illustrates the overall methodology.

The reported results indicate that the heuristic methods substantially reduce computation time compared with ILP. For example, the study notes that ILP can take days to solve a network with 54 nodes, whereas SCA can produce a feasible solution within seconds, illustrating the practical advantage of heuristic placement in large-scale settings.

Comparative takeaway: Compared with RL-based routing methods that optimize online decisions, repeater-placement heuristics target the infrastructure layer by reducing required hardware while keeping connectivity constraints. The main benefit is scalability and fast runtime relative to ILP, while the limitation is that solution quality depends on heuristic design choices and network topology assumptions.

3.3. Entanglement and QKD Based Methods

Recent quantum networking studies increasingly rely on entanglement-routing optimization and QKD-oriented protocol design to improve reliability, throughput, and security under realistic constraints [32,33,34]. This subsection summarizes how these studies structure their methods and what their findings imply for future deployment.

In study [32], Yiming et al. propose a two-stage framework for entanglement routing that aims to maximize both the number of served quantum-user pairs and the expected throughput. The offline stage focuses on selecting a set of quantum-user pairs and their primary routes by solving an ILP formulation. Because the offline problem is NP-complete, the authors relax binary decision variables to continuous values, solve the resulting linear program, and recover an integer solution using branch-and-bound. Candidate paths are generated using shortest-path-based selection (via Yen’s method) to limit resource usage and form a set of primary routing paths.

In the online stage, the objective shifts to maximizing predicted throughput by optimizing qubit assignments along the selected routes. This stage is also formulated as an ILP and is NP-hard [53]. The study follows a similar relax-and-recover approach (continuous relaxation followed by branch-and-bound) to obtain a feasible integer solution and improve route-level qubit allocation. To enhance robustness, the authors incorporate a recovery-route mechanism in which each switch on a primary path maintains a precomputed local recovery path. When a link fails, a switch uses local information to trigger an alternative recovery path and perform entanglement swapping, thereby sustaining connectivity and improving stability under failures. For the network model, the authors used quantum network graph quantum switch and quantum link. All the mathematical formulations used can be expressed as

G = (\bar{V,} ε)

(12)

where G means the undirected graph,

\bar{V,}

is the quantum set, and

ε

denotes the edges that connect the quantum set’s vertices. Each quantum switch has a qubit accessible for entanglement, and an effective switching rate is denoted in each transition as q ∈ [0, 1]. Then, each link

e_{i j} \in ε

has a success rate of entanglement generation as follows:

P_{i j} = e^{- α L i j}

(13)

Here, alpha is a constant connected to link materials, whereas

L i j

is the connection’s length.

For routing, the authors utilized a notion of expected throughput for a quantum-user pair; the expected throughput of a route is A = {v_0, v_1, ……v_l}, where

v_{0}

and

v_{1}

are the user pairs.

P = \frac{Q^{A}}{2} \prod_{i = 0}^{l - 1} p_{i} (i + 1) . q^{l} - 1

(14)

Here,

Q^{A}

is the number of qubits, A is the path, and

p_{i} (i + 1)

is the success rate of generating entanglement on a connection. q represents the switching in each switch, whereas l is the number of edges in the route. The shortest pathways are determined using Yen’s method, which minimizes resource use and ensures efficient routing. When a connection fails, the recovery route algorithm creates alternate paths within a K-hop distance to improve network resiliency and dependability. The switching strategy drives entanglement swapping at switches, preferring quantum-user pairings with greater predicted throughput to ensure connection and performance. To ensure near-optimal solutions, the branch-and-bound method is used, with an approximation ratio of 2, resulting in efficient resource use and high-performance quantum communication [54]. Figure 9 shows the overall solving problems of the study for three stages. Additionally, Table 7 summarizes their setups.

The authors compare their proposed algorithms, MULTI-R and MR-REC, to benchmark methods, emphasizing important findings. MULTI-R reliably serves all 20 quantum-user pairs, exceeding FER, Q-PASS [55,56], and B1; however, FER suffers in resource-constrained networks. MR-REC achieves the highest predicted throughput, improving by up to 55% above FER, whereas MULTI-R falls behind but still outperforms FER. Despite having the maximum throughput, ALG-4 loses fairness by providing fewer pairings. In terms of resilience, when 10% of the switches go offline, MR-REC has a greater decline in throughput but stays stable, with losses remaining less than twice the ratio of offline switches.

Comparative takeaway: Compared with greedy and heuristic baselines (FER, Q-PASS, and hop-based routing), the proposed MULTI-R/MR-REC framework provides more systematic control over the trade-off between fairness (serving more user pairs) and throughput (expected ebits per unit time) by jointly optimizing path selection and qubit allocation. Unlike approaches that only optimize routing paths, it explicitly includes a recovery mechanism to handle link failures, improving resilience. However, relative to learning-based routing methods, it relies on ILP and branch-and-bound procedures, which introduce higher computational overhead and make scalability and runtime more sensitive to network size and optimization constraints.

In another study, Ref. [34], proposed a multi-qubit GHZ state-based QKD system that employs single qubit transmission and QND measurements to transmit multiple classical bits. The suggested QKD paradigm involves Alice generating L+1 GHZ state qubits and sending one to Bob. She generates L auxiliary qubits and encodes key values (1 as |1⟩|1⟩ and 0 as |0⟩|0⟩). Alice does a Bell State Measurement (BSM) between the first entangled qubit and the first auxiliary qubit and reports the result to Bob. Bob applies suitable gates depending on the BSM results and tests his qubit using the QND measurement. Table 8 presents the overall QKD method and key features of the study and Table 9 shows the overall setup of the study, and Table 10 displays the experimental setup for the QKD Scheme.

Comparative takeaway: Compared with standard single-qubit QKD protocols, the GHZ-based design improves protocol efficiency by enabling multiple key bits per transmitted qubit via multi-qubit entanglement and QND measurement. However, it requires more complex state preparation and control operations, and its practical benefit depends on the feasibility of maintaining GHZ entanglement and performing reliable measurements under realistic noise and resource constraints.

In another study [33], the authors wanted to improve fidelity for encoded quantum bell pairs for long and short-distance communication, together with the generalized network illustrated in Figure 10. A flowchart describing the major phases in the process might help visualize the technique. To begin, a Bell state |φ +⟩ |ϕ +⟩ is created using Hadamard and CNOT gates.

The Bell state is encoded using a Quantum Repetition Code (QRC) with 2k + 1 qubits. The ancilla qubits are initialized to |0⟩ and entangled with the main qubits via CNOT gates. The encoded qubits are sent over a quantum channel, resulting in bit flip or phase flip errors. Short-distance communication uses stabilizer formalism to assess symptoms and repair faults locally. To measure symptoms across great distances, Alice and Bob employ a traditional communication channel, which allows them to communicate measurement findings and fix mistakes globally. Finally, the fidelity of the Bell state is determined following error correction to assess the protocol’s performance. The entire scheme was simulated using the IBM Qiskit QASM simulator.

Comparative takeaway: Compared with routing-centric or scheduling-centric methods, this work focuses on physical-layer robustness by improving entanglement fidelity through encoding and error correction. The approach can enhance reliability under noise, but it introduces additional qubit overhead (2k + 1 encoding) and operational complexity, which may increase resource demands when integrated into larger network-level protocols.

3.4. Cross-Study Trends and Comparative Synthesis

Across the reviewed studies, a clear trend is the transition from static heuristic or shortest-path routing toward adaptive RL/DRL-based control, mainly to handle stochastic entanglement generation and dynamic network conditions. In parallel, optimization and heuristic approaches increasingly adopt multi-objective formulations that balance fidelity, latency, and resource utilization, reflecting the practical scarcity of qubits, memories, and entanglement resources. At the physical layer, fidelity- and QKD-oriented designs place stronger emphasis on robustness under noise and decoherence, but often introduce additional operational overhead that must be reconciled with end-to-end throughput and request success. When comparing scalability, learning-based control can generalize to changing conditions but depends on state design, training cost, and simulator realism, whereas optimization-based methods provide structured decision-making but may become computationally expensive as network size grows. Overall, the literature indicates a convergence toward deployment-aware, cross-layer designs that jointly consider physical-layer noise, resource-aware scheduling, and adaptive routing; however, reproducible benchmarking remains challenging due to inconsistent assumptions, metrics, and reporting practices across studies. These observations motivate the consolidated challenges and future opportunities discussed in the next section.

4. Challenges & Future Scope

Despite rapid progress, practical quantum communication still faces a set of technical and engineering barriers that its limit large-scale deployment.

The first major challenge is end-to-end scalability under loss and decoherence. As distance increases, photon loss in fiber or free-space links and time-dependent decoherence quickly degrade entanglement quality, which directly impacts achievable fidelity and secret-key rates in entanglement-enabled services. Long-haul networking therefore requires designs that jointly optimize performance metrics, such as throughput, fidelity, latency, and request success rate, rather than improving a single metric in isolation. This also exposes a fundamental trade-off: techniques that improve fidelity often consume more resources or introduce additional processing overhead, which can reduce overall throughput.

A second challenge concerns quantum repeaters and memory technologies. Repeaters are central to extending communication distance, yet practical deployments remain constrained by limited memory lifetimes, imperfect gates and measurements, and nontrivial coordination overhead across nodes. Recent studies emphasize that repeater design must balance performance goals with realistic resource overheads and infrastructure constraints rather than assuming idealized components. In parallel, repeater placement and planning methods must account for reliability and failure tolerance, because repeater failures can disconnect routes or sharply reduce service rates. This highlights the need for repeaters and placement strategies that are not only cost-efficient but also resilient under partial node/link failures [49,57,58,59,60].

A third challenge is robustness under realistic noise and device imperfections. While many protocols are theoretically secure, practical implementations must contend with state-preparation errors, detector inefficiencies, miscalibration, and environmental interference. QKD in particular must bridge the gap between ideal security assumptions and real hardware behavior, including device imperfections and finite-key effects. Large-scale QKD demonstrations and analyses consistently point to implementation realism and imperfect devices as key obstacles to moving from laboratory-scale to network-scale operation. Future work should therefore place stronger emphasis on unified noise models, sensitivity analysis across noise regimes, and explicit reporting of what assumptions are required for reported gains [58,61,62].

A fourth challenge is the lack of standardized and reproducible evaluation. Across the literature, studies often vary in topology generation, traffic models, noise models, baseline selection, and reporting granularity, which makes cross-paper comparison difficult. This directly affects the interpretability of “comparative” claims. A practical future direction is to establish a consistent evaluation template for quantum routing and QKD networking studies, including the following: topology size and model, traffic/request distributions, noise/channel models, resource budgets (qubits per node, channel capacities), training budgets for learning-based methods, and a common set of outcome metrics. Equally important is reporting experimental environments (simulation platforms, versions, and hardware details) to improve reproducibility and enable fair comparison.

A fifth challenge is integration with classical networks and operational constraints. Many near-term deployments will be hybrid, combining quantum links and classical control planes. This requires robust synchronization, classical messaging for routing control and error-correction coordination, and interoperability among heterogeneous hardware. Future systems must explicitly consider control-plane overheads, timing constraints, and deployment realities such as maintenance, monitoring, and fault management. Quantum networking is often criticized as being far from practical deployment; therefore, it is important to ground the discussion in realistic scenarios. Near-term deployments are most feasible in metro-scale settings where trusted-node QKD networks can be integrated with existing fiber infrastructure to provide practical key establishment and secure services. A second scenario is satellite-to-ground quantum links, where satellite-based QKD can complement terrestrial backbones and enable long-distance key distribution under intermittent connectivity constraints. A third scenario is inter-data-center secure networking, where entanglement distribution and high-fidelity links can support secure communication and remote quantum operations across geographically separated sites. Finally, distributed quantum computing is an emerging scenario in which multiple quantum processors collaborate through networked entanglement, making routing, resource-aware scheduling, and reliability central to achieving usable performance. These scenarios illustrate that practicality depends on matching methods to constraints such as loss/decoherence, repeater availability, control-plane overhead, and reproducible evaluation assumptions.

Looking forward, several research directions appear especially promising when grounded in these practical constraints. First, integrated cross-layer optimization is needed to connect physical-layer fidelity management (encoding, purification, and error mitigation) with network-layer routing and scheduling, so that improvements at one layer do not unintentionally degrade performance at another. Second, repeaters and memories remain a key enabling technology, and future work should focus on designs that minimize resource overhead while remaining compatible with existing telecom infrastructure and realistic device performance. Third, learning-based routing and control approaches should be studied under reproducible benchmarks and under noise conditions that match real devices, with explicit reporting of training costs and generalization behavior across topologies and traffic patterns. Recent directions emphasize self-organizing satellite quantum networks, where topology changes, link intermittency, and atmospheric effects require routing and resource management that can adapt in real time. This line of work motivates research on intermittency-aware routing, fast reconfiguration, cross-layer control between space and ground segments, and consistent security and performance evaluation under realistic channel dynamics. Another promising direction is quantum-computing-assisted 6G networking, where quantum computing and quantum networking concepts are explored for next-generation communication use cases. This perspective highlights opportunities in hybrid classical–quantum architectures, network optimization under uncertainty, security services, and system-level orchestration. In the context of this review, it reinforces the need for deployment-aware evaluation, interoperability with classical control planes, and scalable routing/scheduling methods that remain robust under realistic noise and resource constraints.

Finally, satellite-based quantum communication is an increasingly important complement to terrestrial networks because it can bypass some distance limitations of optical fiber links. China’s early satellite demonstrations and follow-on activities have motivated broader interest in space-enabled quantum networking, and multiple reports discuss the planned expansion of quantum satellite capability and experiments toward larger satellite-supported quantum networks. Future work in this direction should prioritize end-to-end system engineering, including atmospheric effects, link intermittency, integration with terrestrial backbone routing, and consistent security and performance evaluation [59,63].

5. Conclusions

Quantum networking holds great promise for secure communication and distributed computing, but resource limitations remain a major challenge. Constraints in qubit capacity, entangled-pair storage, and coherence time hinder both reliability and scalability in practical systems. Additionally, high energy demands, hardware inefficiencies, and the fragility of quantum states make large-scale deployment difficult outside of controlled laboratory environments. Network-level issues, such as routing entanglement over long distances, synchronizing quantum operations between distant nodes, and mitigating noise, further complicate real-world implementation. Advancements in quantum memory, fault-tolerant error correction, photonic interfaces, and hybrid quantum–classical integration are essential for overcoming these barriers and achieving robust, scalable quantum networks in the future.

Author Contributions

Conceptualization, M.S.A. and M.S.H.S.; Data curation, Formal analysis, Investigation, M.S.H.S.; Methodology, M.S.H.S.; Project administration, M.S.A.; Resources, Software, M.S.A., M.S.H.S.; Supervision, M.S.A.; Validation, M.S.H.S.; Visualization, M.S.H.S.; Writing—original draft, M.S.H.S.; Writing—review editing, M.S.H.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

QKD: Quantum Key Distribution; EPR: Einstein–Podolsky–Rosen entangled pair; E2E: End-to-End; RL: Reinforcement Learning; DRL: Deep Reinforcement Learning (subset of RL using deep neural networks); qRL/QRL: Q-learning-based reinforcement learning for quantum routing (method name in [26]); DQRL: Deep Q-Reinforcement Learning; DQRA: Deep Quantum Routing Agent; PPO: Proximal Policy Optimization; ILP: Integer Linear Programming; MCTS: Monte Carlo Tree Search; UCB1: Upper Confidence Bound (UCB1) selection; SNN: Scheduling Neural Network.

References

Horodecki, R.; Horodecki, P.; Horodecki, M.; Horodecki, K. Quantum entanglement. Rev. Mod. Phys. 2009, 81, 865–942. [Google Scholar] [CrossRef]
Einstein, A.; Podolsky, B.; Rosen, N. Can quantum-mechanical description of physical reality be considered complete? Phys. Rev. 1935, 47, 777. [Google Scholar] [CrossRef]
Bennett, C.H.; Brassard, G. Quantum cryptography: Public key distribution and coin tossing. Theor. Comput. Sci. 2014, 560, 7–11. [Google Scholar] [CrossRef]
Bennett, C.H.; Brassard, G.; Crépeau, C.; Jozsa, R.; Peres, A.; Wootters, W.K. Teleporting an unknown quantum state via dual classical and Einstein-Podolsky-Rosen channels. Phys. Rev. Lett. 1993, 70, 1895. [Google Scholar] [CrossRef]
Kimble, H.J. The quantum internet. Nature 2008, 453, 1023–1030. [Google Scholar] [CrossRef] [PubMed]
Wehner, S.; Elkouss, D.; Hanson, R. Quantum internet: A vision for the road ahead. Science 2018, 362, eaam9288. [Google Scholar] [CrossRef]
Gisin, N.; Ribordy, G.; Tittel, W.; Zbinden, H. Quantum cryptography. Rev. Mod. Phys. 2002, 74, 145. [Google Scholar] [CrossRef]
Cirac, J.I.; Zoller, P.; Kimble, H.J.; Mabuchi, H. Quantum state transfer and entanglement distribution among distant nodes in a quantum network. Phys. Rev. Lett. 1997, 78, 3221. [Google Scholar] [CrossRef]
Wootters, W.K.; Zurek, W.H. A single quantum cannot be cloned. Nature 1982, 299, 802–803. [Google Scholar] [CrossRef]
Sangouard, N.; Simon, C.; de Riedmatten, H.; Gisin, N. Quantum repeaters based on atomic ensembles and linear optics. Rev. Mod. Phys. 2011, 83, 33–80. [Google Scholar] [CrossRef]
Muralidharan, S.; Li, L.; Kim, J.; Lütkenhaus, N.; Lukin, M.D.; Jiang, L. Optimal architectures for long distance quantum communication. Sci. Rep. 2016, 6, 20463. [Google Scholar] [CrossRef]
Chakraborty, K.; Rozpedek, F.; Dahlberg, A.; Wehner, S. Distributed routing in a quantum internet. arXiv 2019, arXiv:1907.11630. [Google Scholar] [CrossRef]
Jordan, M.I.; Mitchell, T.M. Machine learning: Trends, perspectives, and prospects. Science 2015, 349, 255–260. [Google Scholar] [CrossRef] [PubMed]
Mahesh, B. Machine learning algorithms—A review. Int. J. Sci. Res. (IJSR) 2020, 9, 381–386. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Mathew, A.; Amudha, P.; Sivakumari, S. Deep learning techniques: An overview. In Advanced Machine Learning Technologies and Applications, Proceedings of the AMLTA 2020, Jaipur, India, 13–15 February 2020; Springer Nature: Berlin/Heidelberg, Germany, 2021; pp. 599–608. [Google Scholar]
Wiering, M.A.; van Otterlo, M. Reinforcement learning. Adapt. Learn. Optim. 2012, 12, 729. [Google Scholar]
Ernst, D.; Louette, A. Introduction to Reinforcement Learning; Feuerriegel, S., Hartmann, J., Janiesch, C., Zschech, P., Eds.; Springer Nature: Berlin/Heidelberg, Germany, 2024; pp. 111–126. [Google Scholar]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
Komar, P.; Kessler, E.M.; Bishof, M.; Jiang, L.; Sørensen, A.S.; Ye, J.; Lukin, M.D. A quantum network of clocks. Nat. Phys. 2014, 10, 582–587. [Google Scholar] [CrossRef]
Yu, H.; Zhao, X. Deep reinforcement learning with reward design for quantum control. IEEE Trans. Artif. Intell. 2022, 5, 1087–1101. [Google Scholar] [CrossRef]
Cacciapuoti, A.S.; Caleffi, M.; Tafuri, F.; Cataliotti, F.S.; Gherardini, S.; Bianchi, G. Quantum internet: Networking challenges in distributed quantum computing. IEEE Netw. 2019, 34, 137–143. [Google Scholar] [CrossRef]
Munro, W.J.; Azuma, K.; Tamaki, K.; Nemoto, K. Inside quantum repeaters. IEEE J. Sel. Top. Quantum Electron. 2015, 21, 78–90. [Google Scholar] [CrossRef]
Pant, M.; Krovi, H.; Towsley, D.; Tassiulas, L.; Jiang, L.; Basu, P.; Englund, D.; Guha, S. Routing entanglement in the quantum internet. npj Quantum Inf. 2019, 5, 25. [Google Scholar] [CrossRef]
Dunjko, V.; Briegel, H.J. Machine learning & artificial intelligence in the quantum domain: A review of recent progress. Rep. Prog. Phys. 2018, 81, 074001. [Google Scholar] [CrossRef]
Abreu, D.; Abelém, A. qRL: Reinforcement Learning Routing for Quantum Entanglement Networks. In Proceedings of the IEEE Symposium on Computers and Communications (ISCC 2024), Paris, France, 26–29 June 2024; pp. 1–6. [Google Scholar]
Roik, J.; Bartkiewicz, K.; Černoch, A.; Lemr, K. Routing in quantum communication networks using reinforcement machine learning. Quantum Inf. Process. 2024, 23, 89. [Google Scholar] [CrossRef]
Islam, T.; Arifuzzaman, M.; Arslan, E. Reinforcement Learning Based Proactive Entanglement Swapping for Quantum Networks. In Proceedings of the International Conference on Quantum Communications, Networking, and Computing (QCNC 2024), Kanazawa, Japan, 1–3 July 2024; pp. 135–142. [Google Scholar]
Le, L.; Nguyen, T.N.; Lee, A.; Dumba, B. Entanglement routing for quantum networks: A deep reinforcement learning approach. In Proceedings of the IEEE International Conference on Communications (ICC 2022), Seoul, Republic of Korea, 16–20 May 2022. [Google Scholar]
Le, L.; Nguyen, T.N. DQRA: Deep quantum routing agent for entanglement routing in quantum networks. IEEE Trans. Quantum Eng. 2022, 3, 1–12. [Google Scholar] [CrossRef]
Huang, Z.; Lai, H.; Wan, L. An advanced collaborative routing algorithm for optimizing entanglement and resource efficiency in quantum networks. Int. J. Theor. Phys. 2025, 64, 18. [Google Scholar] [CrossRef]
Zeng, Y.; Zhang, J.; Liu, J.; Liu, Z.; Yang, Y. Entanglement routing design over quantum networks. IEEE/ACM Trans. Netw. 2023, 32, 352–367. [Google Scholar] [CrossRef]
Shubha, S.E.U.; Rahman, M.S.; Mahdy, M.R.C. Significant improvement of fidelity for encoded quantum bell pairs at long and short-distance communication along with generalized circuit. Heliyon 2023, 9, e19700. [Google Scholar] [CrossRef]
Islam, T.; Arslan, E. Quantum Key Distribution with Single Qubit Transmission. In Proceedings of the International Conference on Quantum Communications, Networking, and Computing (QCNC 2024), Kanazawa, Japan, 1–3 July 2024; pp. 357–358. [Google Scholar]
Islam, T.; Arslan, E. A Heuristic Approach for Scalable Quantum Repeater Deployment Modeling. In Proceedings of the IEEE 48th Conference on Local Computer Networks (LCN 2023), Daytona Beach, FL, USA, 1–5 October 2023; pp. 1–9. [Google Scholar]
Abane, A.; Cubeddu, M.; Mai, V.S.; Battou, A. Entanglement routing in quantum networks: A comprehensive survey. arXiv 2025, arXiv:2408.01234. [Google Scholar] [CrossRef]
Dervisevic, E.; Tankovic, A.; Fazel, E.; Kompella, R.; Fazio, P.; Voznak, M.; Mehic, M. Quantum Key Distribution Networks—Key Management: A Survey. arXiv 2024, arXiv:2408.04580. [Google Scholar] [CrossRef]
Kozlowski, W.; Wehner, S.; Van Meter, R.; Rijsman, B.; Cacciapuoti, A.S.; Caleffi, M.; Nagayama, S. RFC 9340: Architectural principles for a quantum internet. RFC 9340. 2023. Available online: https://datatracker.ietf.org/doc/rfc9340/ (accessed on 1 January 2025).
Jiang, W.; Zhang, Y.; Han, H.; Mu, J. Quantum Communication in Self-Organizing Satellite Networks: Challenges and Opportunities. IEEE Commun. Stand. Mag. 2025, 9, 15–25. [Google Scholar] [CrossRef]
Javed, M.A.; Nkenyereye, L.; Nawaz, S.J.; Mirza, J.; Fortino, G.; Dev, K. Quantum Computing-Assisted 6G Networks: Use Cases and Future Opportunities. Tsinghua Sci. Technol. 2025. [Google Scholar] [CrossRef]
Meddeb, A. Quantum internet building blocks state of research and development. Comput. Netw. 2025, 261, 111151. [Google Scholar] [CrossRef]
Kumar, A.; Zhou, A.; Tucker, G.; Levine, S. Conservative Q-learning for offline reinforcement learning. Adv. Neural Inf. Process. Syst. 2020, 33, 1179–1191. [Google Scholar]
Huang, S.-M.; Chien, M.-H.; Cheng, C.-Y.; Kuo, J.-J.; Yang, L.-H. Socially-aware concurrent entanglement routing with path decomposition in quantum networks. In Proceedings of the IEEE Global Communications Conference (GLOBECOM 2022), Rio de Janeiro, Brazil, 4–8 December 2022; pp. 3664–3669. [Google Scholar]
Qiang, W.; Zhongli, Z. Reinforcement learning model, algorithms and its application. In Proceedings of the International Conference on Mechatronic Science, Electric Engineering and Computer (MEC 2011), Jilin, China, 19–22 August 2011; pp. 1143–1146. [Google Scholar]
Zhao, Y.; Qiao, C. Redundant entanglement provisioning and selection for throughput maximization in quantum networks. In Proceedings of the IEEE INFOCOM 2021, Vancouver, BC, Canada, 10–13 May 2021; pp. 1–10. [Google Scholar]
Xu, J.S.; Yung, M.H.; Xu, X.Y.; Tang, J.S.; Li, C.F.; Guo, G.C. Robust bidirectional links for photonic quantum networks. Sci. Adv. 2016, 2, e1500672. [Google Scholar] [CrossRef]
Gisin, N.; Thew, R. Quantum communication. Nat. Photonics 2007, 1, 165–171. [Google Scholar] [CrossRef]
Yen, J.Y. An algorithm for finding shortest routes from all source nodes to a given destination in general networks. Q. Appl. Math. 1970, 27, 526–530. [Google Scholar] [CrossRef]
Shi, S.; Qian, C. Concurrent entanglement routing for quantum networks: Model and designs. In Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication (SIGCOMM), Online, 10–14 August 2020; pp. 62–75. [Google Scholar]
da Silva, F.F.; Avis, G.; Slater, J.A.; Wehner, S. Requirements for upgrading trusted nodes to a repeater chain over 900 km of optical fiber. Quantum Sci. Technol. 2024, 9, 045041. [Google Scholar] [CrossRef]
Adhikary, K.; Pal, P.; Poray, J. The minimum spanning tree problem on networks with neutrosophic numbers. Neutrosophic Sets Syst. 2024, 63, 258–270. [Google Scholar]
SURF. SURF. Available online: https://www.surf.nl/en (accessed on 19 January 2026).
Energy Sciences Network (ESnet). Energy Sciences Network. Available online: https://www.es.net/ (accessed on 19 January 2026).
Even, S.; Itai, A.; Shamir, A. On the complexity of time table and multi-commodity flow problems. In Proceedings of the 16th Annual Symposium on Foundations of Computer Science (SFCS 1975), Berkeley, CA, USA, 13–15 October 1975; pp. 184–193. [Google Scholar]
Xiao, Y.; Ding, T.; Mu, C.; Pan, K.; Zhang, B.; Shahidehpour, M. Convex Hull for Self-Scheduling Energy-Intensive Enterprises with Demand Response Regulations. IEEE Trans. Power Syst. 2024, 40, 2003–2013. [Google Scholar] [CrossRef]
Zhang, S.; Shi, S.; Qian, C.; Yeung, K.L. Fragmentation-aware entanglement routing for quantum networks. J. Light. Technol. 2021, 39, 4584–4591. [Google Scholar] [CrossRef]
HPCwire. Does Quantum Entanglement Hold the Key to Unhackable Communications? Available online: https://www.hpcwire.com/2022/07/28/does-quantum-entanglement-hold-the-key-to-unhackable-communications/ (accessed on 19 January 2026).
Science News. Quantum Entanglement Makes Quantum Communication Even More Secure. Available online: https://www.sciencenews.org/article/quantum-entanglement-communication-security-bell-test (accessed on 19 January 2026).
Phys.org. The World is One Step Closer to Secure Quantum Communication on a Global Scale. Available online: https://phys.org/news/2024-03-world-closer-quantum-communication-global.html (accessed on 19 January 2026).
The Quantum Insider. Chinese Researchers Perform Space-to-Ground Communications with Lightweight Quantum Satellite. Available online: https://thequantuminsider.com/2024/08/24/chinese-researchers-perform-space-to-ground-communications-with-lightweight-quantum-satellite/ (accessed on 19 January 2026).
Rey-Domínguez, J.; Razavi, M. Rethinking Quantum Repeaters: Balancing Scalability, Feasibility, and Interoperability. arXiv 2025, arXiv:2508.16310. [Google Scholar] [CrossRef]
Zhang, Q.; Xu, F.; Chen, Y.-A.; Peng, C.-Z.; Pan, J.-W. Large scale quantum key distribution: Challenges and solutions. Opt. Express 2018, 26, 24260–24273. [Google Scholar] [CrossRef] [PubMed]
MERICS. China’s Long View on Quantum Tech Has the US and EU Playing Catch-Up. Available online: https://merics.org/en/report/chinas-long-view-quantum-tech-has-us-and-eu-playing-catch (accessed on 19 January 2026).

Figure 1. Illustrates a theoretical summary of significant components from recent research, including their significant contributions.

Figure 2. PRISMA-ScR flow diagram illustrating the literature identification, screening, and inclusion process.

Figure 3. An overview of the entire research and future scope, development deficiencies, deployment, and improvements to current research.

Figure 4. RL agent–environment formulation for entanglement routing in quantum networks (state: channel quality/EPR availability/qubit resources; action: route/EPR allocation; reward: fidelity and request success under resource constraints).

Figure 5. The overall systematic procedure of Deep Q-Learning (DQRL) study based on their instructions.

Figure 6. Overall methodology of DQRA model based on the study instructions.

Figure 7. Overall procedure of the study [27].

Figure 8. An overall methodology of the [35] study based on their instructions.

Figure 9. Overall challenges: recovery procedure of the offline and online stages based on ILP and recovery path using local swapping methods for robust entanglement.

Figure 10. Overall procedure and methodology of the study [33].

Table 1. Overview of the existing survey papers.

Survey/Review	Year	Scope	Methods Covered	Strengths	Limitations	Our Review Adds
Entanglement Routing in Quantum Networks: A Comprehensive Survey (Abane et al., IEEE TQE; NIST news) [36]	2025	routing	taxonomy of entanglement routing strategies	structured routing-centric view; bridges classical quantum networking terminology	focuses mainly on routing; less emphasis on learning vs. heuristic side-by-side experimental setup comparison	provides a thematic synthesis across method families (RL/DRL, resource-aware optimization, fidelity/noise robustness, and scalability) and highlights trade-offs and recurring challenges (coordination overhead, generalization, resource constraints) rather than only routing taxonomy
Quantum Key Distribution Networks—Key Management: A Survey [37]	2025	QKD networks	key management + trusted-relay network mechanisms	strong focus on practical key management challenges	less coverage of entanglement routing and RL-driven control	complements QKD-network perspectives by positioning QKD as part of a broader quantum-network stack and linking secure communication needs with routing, fidelity/noise robustness, and deployment constraints discussed in the review
Architectural Principles for a Quantum Internet (IETF RFC 9340) [38]	2023	architecture	design principles, guidelines	authoritative baseline for architecture + terminology	not a deep method-by-method survey of routing/control	adds method-oriented discussion (what algorithms/protocol families do in practice) and connects them to practical constraints and evaluation concerns (metrics, robustness, scalability, reproducibility) highlighted across the literature
Quantum Communication in Self-Organizing Satellite Networks: Challenges and Opportunities [39]	2025	satellite	network challenges/opportunities perspective	highlights deployment realities for satellite/self-organizing settings	not focused on algorithmic routing comparisons	extends the discussion by treating satellite networking as a deployment-relevant direction and summarizing key system issues (link intermittency, end-to-end engineering, consistent evaluation) within the future-scope perspective of the review
Quantum Computing-Assisted 6G Networks: Use Cases and Future Opportunities [40]	2025	6G	QC/QN roles, use cases, opportunities	connects quantum computing + next-gen networking needs	high-level; not focused on entanglement routing/QKD method comparison	offers a quantum-networking–focused synthesis emphasizing cross-layer optimization (physical-layer fidelity vs. network-layer routing/scheduling) and calls for reproducible benchmarking and deployment-aware evaluation—useful as a grounding lens alongside broader 6G opportunity discussions
Quantum internet building blocks state of research and development [41]	2025	quantum internet	components/protocol blocks overview	broad “building blocks” summary	less focused on RL/DRL vs. heuristic tradeoffs	adds comparative discussion elements (comparative takeaways and cross-theme synthesis) and consolidates how different lines of work address common bottlenecks (resource scarcity, fidelity/noise, scalability)

Table 2. Overview of the qRL approach’s configuration setups.

Parameters	Config1	Config2	Config3	Config4
Memory (M_i)	1.5n, 2.5n	1.5n, 2.5n	n, 1.5n	n, 1.5n
Fidelity (F_i)	0.99, 0.95	0.85, 0.90	0.99, 0.95	0.85,0.90
EPR Success (EPRP)	0.95–0.9	0.75–0.6	0.95–0.9	0.75–0.6
Baseline	qDijkstra, R1 (fidelity), R2 (EPR), R3 (qubits).
Metrics	Average E2E fidelity, request success rate.

Table 3. Overall specific setups of the study.

Category	Setup
Algorithm tested	SEER and REPS, enhance with EC and PES
Entanglement caching (EC)	Stores unused links for 10 time slots
Proactive entanglement swapping (PES)	RL-driven segment creation
Entanglement generation probability	0.6–0.9
Swap probability	0.5–0.9
Entanglement lifetime	1–10 time slots
Baseline	SEER/REEPS, Heuristic base PES

Table 4. Overall methodology setup of the DQRA model.

Category	Setup
Network topology	Grid networks $(n_{G} . n_{G}) w i t h n_{G} \in {5, 10, 15, \dots, 35}$
Qubits capacities	$c_{i} = 2, c_{i} \in [2, 4], c_{i} \in [3, 4], c_{i} = 4$
Baseline	Random request scheduling, shortest path
Training	DQN and DRN for squared means and experience replay
Metrics	Success rate, training/routing time, scability

Table 5. Experimental setup of the study [27].

Aspect	Description
Network topology	Low density parity-check grid with 4 connections per node
Error model	White noise: Werner state F = 0.99, Amplitude damping: Skewed bell states, Phase noise: Phase shift
Baselines	Dijkstra and Monte Carlo
Training	15 action per episode 10⁶ actions for dynamic scenarios

Table 6. Experimental setup of the study [31].

Category	Parameter	Description
Network Topology Generation	Nodes	V
	Edge probability (q)	edge between two nodes in the Waxman model.
	Node degree (Ed)	average number of connections
	Qubit capacity	The number of qubits each node can hold, ranging from 10–14 qubits
	Channel capacity	hold entanglement pairs, ranging from 3–7 pairs
Simulation	NSD	(S-D) pairs, ranging from 2 to 10
	Entanglement pair success probability (Ep)	successfully establishing an entanglement pair, set to Ep ± 0.01
	Swapping success rate (q)	ranging from 0.6 to 0.9

Table 7. Overall experimental setup based on the state of the art.

Metrics	ILP (SURFnet)	Heuristic/SCA (SURFnet)	ILP (ESnet)	Heuristic/SCA (ESnet)
Number of Quantum repeaters	3	3	-	142 (121 new + 21 existing)
Execution	13,481 s	0.28 s	-	10.26 s
Failure resistance	Up to (K−1) failures (robustness parameter K; K = 3 ⇒ 2 failures)	same	same	same
scalability	Limited/high	scalable	Scalable/non	Scalable/non

Note: T1–T4 denote execution time under different evaluated network instances/configurations (ILP vs. heuristic; SURFnet vs. ESnet) as reported by the original study; we keep the symbols to preserve comparability.

Table 8. Overall experimental setup based on the study.

Category	Details
Network	graph with switches and quantum links.
Switches	100 (50–200)
Quantum-user pairs	20 (10–40)
Avg. switch degree	10 (6–14)
Qubits per switch	4 (2–8)
Swapping rate	0.9 (0.1–0.9)
Link success rate	0.1 (0.1–0.4)
FER	Greedy
Q-PASS	sum of link success rates
B1	based on hop count
ALG-4	Maximizes throughput, skips STEP I
Throughput	Expected ebits per unit time

Table 9. Comparison of QKD methods based on the study.

QKD Method	Qubit Transmission	Efficiency (η)	Features
BB84	1	η < 1	single qubit polarization states.
B92	1	η < 1	two non-orthogonal states.
E91	1	η < 1	Bell state measurements.
GHZ	1 qubit N-1 key bits	η = N − 1	multi-qubit GHZ states and QND measurements.

Table 10. Experimental setup for the QKD Scheme.

Component	Description
Quantum State Preparation	Alice prepares an L + 1 qubit GHZ state using a quantum simulator.
Quantum State Preparation	Hadamard and CNOT gates are applied to create the GHZ state.
Qubit Transmission	Alice sends one qubit to Bob.
Qubit Transmission	quantum repeaters are used for entanglement swapping.
BSM	Alice performs BSM between the ancillary qubit and the first GHZ qubit.
BSM	The BSM result is sent to Bob.
QND	Bob applies XX or ZZ gates.
QND	Bob performs QND measurement to estimate the key bit.
GHZmState Reset	Alice resets the remaining qubits to the GHZ state for the next key bit transmission.
Simulation Environment	Uses the Netsquid quantum network simulator.
	128-core AMD EPYC 2.6 GHz CPU and 1 TiB of memory.
	A 12-bit key was successfully transmitted using a 13-qubit GHZ state.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shaon, M.S.H.; Akter, M.S. A Review of Routing and Resource Optimization in Quantum Networks. Electronics 2026, 15, 557. https://doi.org/10.3390/electronics15030557

AMA Style

Shaon MSH, Akter MS. A Review of Routing and Resource Optimization in Quantum Networks. Electronics. 2026; 15(3):557. https://doi.org/10.3390/electronics15030557

Chicago/Turabian Style

Shaon, Md. Shazzad Hossain, and Mst Shapna Akter. 2026. "A Review of Routing and Resource Optimization in Quantum Networks" Electronics 15, no. 3: 557. https://doi.org/10.3390/electronics15030557

APA Style

Shaon, M. S. H., & Akter, M. S. (2026). A Review of Routing and Resource Optimization in Quantum Networks. Electronics, 15(3), 557. https://doi.org/10.3390/electronics15030557

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Review of Routing and Resource Optimization in Quantum Networks

Abstract

1. Introduction

Background and Preliminaries

2. Related Papers

3. Methodological Analysis and Comparative Discussion

3.1. RL-Based Quantum Routing

3.2. Heuristic & Algorithmic Approaches

3.3. Entanglement and QKD Based Methods

3.4. Cross-Study Trends and Comparative Synthesis

4. Challenges & Future Scope

5. Conclusions

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI