Abstract
This paper presents a real-time handover and link assignment framework for low-Earth-orbit (LEO) satellite networks operating in dense urban canyons. The proposed Markov chain-guided simulated annealing (MCSA) algorithm optimizes user-to-satellite assignments under dynamic channel and capacity constraints. By incorporating Markov chains to guide state transitions, MCSA achieves faster convergence and more effective exploration than conventional simulated annealing. Simulations conducted in Ku-band urban canyon environments show that the framework achieves an average user satisfaction of about 97%, providing an approximately 10% improvement over genetic algorithm (GA) results. It also delivers 10–15% higher resource utilization, lower blocking rates comparable to integer linear programming (ILP), and superior runtime scalability with linear complexity . These results confirm that MCSA provides a scalable and robust real-time mobility management solution for next-generation LEO satellite systems.
    Keywords:
                                                                    LEO;                    satellite;                    real-time;                    handover;                    simulated annealing;                    heuristic optimization;                    Markov chains        1. Introduction
Recent advances in wireless technologies have driven the need for reliable, wide-area connectivity. While terrestrial networks perform well in populated areas, they fall short in remote, rural, and maritime regions due to deployment limitations. Satellite communications, especially those based on low-Earth-orbit (LEO) satellites, offer a promising alternative with global coverage, low latency, and energy-efficient links. Due to their proximity to Earth, LEO satellites play a central role in the vision of seamless global 6G connectivity [,,].
While LEO satellite systems enhance coverage and communication, they also pose significant challenges, most notably, frequent handovers. Unlike fixed terrestrial base stations, LEO satellites move rapidly (∼7 km/s), causing user links to change frequently, even within short sessions [,,], typically every 20–40 s for users under LEO constellations operating near 550 km altitude, resulting in dynamic user-satellite connectivity. The growth of mega-constellations like Starlink and OneWeb intensifies this issue by increasing both the number of handovers and the complexity of real-time user-satellite assignment [,,], making it essential to develop resilient handover strategies. In response to these challenges, early research explored a range of handover strategies, which are typically categorized based on whether the serving satellite changes, either as beam handovers (within the same satellite) or satellite handovers (between different satellites) []. In this context, earlier studies focused on beam handovers due to the relatively small number of satellites in legacy systems [,,]. However, the rise of mega-constellations has shifted focus toward satellite handovers. To this end, graph-based handover models have gained attention [,,]. For instance, ref. [] modeled satellite coverage and handover intervals as graph nodes and directed edges to apply path optimization techniques. The authors of [] extended them to MIMO systems for energy-efficient switching, while [] incorporated user priority into the optimization model.
While graph-based methods show promise, their scalability becomes a concern in large-scale LEO networks due to increasing computational complexity because of ever-increasing satellite numbers. To overcome this, reinforcement learning (RL) has emerged as a viable alternative. Unlike traditional optimization, RL learns optimal policies through interaction with dynamic environments, making it well suited to the variability of LEO networks []. Once trained, it can be deployed with low computational overhead, and it has shown effectiveness in related satellite tasks such as beam hopping [,,], power control [,,], and routing optimization [,,].
Despite RL’s growing role in satellite network optimization, its practical use in LEO environments remains limited. RL frameworks often entail substantial training overhead and require relatively stable environments for convergence, conditions that are difficult to meet in highly dynamic orbital geometries with rapidly changing satellite visibility and user mobility []. Recent studies further emphasize the difficulty of maintaining policy generalization under non-stationary topologies, where convergence instability and retraining cost degrade real-time feasibility [,]. Centralized training also introduces latency and communication overhead, especially in large-scale constellations [], while defining realistic state representations and reward functions remains an open challenge [,]. These constraints highlight the need for lightweight, stable, and scalable optimization strategies that maintain responsiveness under stochastic and time-sensitive LEO conditions.
In practical deployments, static propagation assumptions are unrealistic. Satellite-to-user links, particularly for ground mobile users, are affected by dynamic and unpredictable conditions. In urban areas, obstacles such as buildings and vegetation intermittently block signals, causing fluctuations in channel quality []. Maintaining a continuous quality of service (QoS) in such scenarios requires responsive switching to links with a better line of sight (LOS). However, the unpredictable propagation dynamics and high satellite density of mega-constellations pose significant challenges for real-time decision-making. The resulting increase in handover frequency and signaling complexity demands reactive, propagation-aware mechanisms that are both low-latency and scalable. These mechanisms must quickly detect link degradation and dynamically reassign users based on real-time connectivity and resource availability.
Beyond reinforcement learning approaches, heuristic optimization methods have also been investigated for LEO satellite management. For instance, Zhou et al. [] proposed the MOLM algorithm, which employs multi-objective simulated annealing for congestion-aware routing in LEO networks. While such methods demonstrate the potential of annealing for global optimization, they typically address static routing or load-balancing problems. In contrast, the proposed Markov chain-guided simulated annealing (MCSA) introduces a probabilistic exploration mechanism that adapts in real time to dynamic handover conditions, enabling both scalability and stability under mobility-induced uncertainty.
To address these challenges, we propose an adaptive real-time handover framework for LEO satellite networks based on Markov chain-guided simulated annealing (MCSA). The approach formulates user-satellite association as a constrained optimization problem that maximizes instantaneous preference scores while satisfying connectivity and resource constraints. Unlike predictive approaches, it relies on real-time metrics such as the elevation angle, data rate, and resource availability, for fast, data-driven decisions. MCSA employs a Markov Chain to guide probabilistic transitions and incorporates a dynamic cooling schedule to balance exploration and convergence in large-scale and time-sensitive scenarios. The main contributions are summarized as follows:
- A real-time, propagation-aware handover framework tailored to LEO networks, capturing the stochastic link dynamics due to user mobility and urban obstructions.
 - An MCSA algorithm that optimizes user-satellite associations under real-time resource and connectivity constraints with adaptive probabilistic transitions.
 - A dynamic exploration–exploitation balance through real-time annealing factor adjustment, improving convergence in dynamic environments.
 - A resource- and preference-aware assignment strategy that ensures scalability and user satisfaction in LEO mega-constellations.
 
In addition, the proposed framework aligns with emerging 6G Non-Terrestrial Network (NTN) directions and the standardization efforts of ITU-R M.2150-0 [] and M.2514 [], which emphasize intelligent, adaptive mobility management across integrated terrestrial–satellite infrastructures.
The remainder of this paper is organized as follows. Section 2 outlines the system model and channel characteristics. Section 3 presents the problem formulation and the proposed MCSA algorithm; Section 3.4 details the implementation of the reactive handover strategy. Section 4 presents simulation results and performance evaluation. Finally, Section 5 concludes the paper and suggests future research directions.
2. System Model
This study categorizes network nodes by altitude into space-based, aerial, and terrestrial. The focus is on LEO satellites at altitudes of 500–1200 km, operating in dense urban canyons with high-rise buildings (over 50 m) and narrow streets (under 30 m wide). Channel modeling and non-terrestrial network (NTN) analyses are based on Ku-band frequencies (10.7–21.2 GHz downlink, 29.5–30.0 GHz uplink), aligned with ITU-R P.618-13 recommendations []. Terminal deployment considers fully outdoor, fully indoor, and mixed scenarios to support realistic evaluation under diverse propagation conditions.
This study focuses on ground mobile users in dense urban canyons, aiming to design handover mechanisms that integrate real-time satellite positioning, elevation angle, and user-specific service-level agreements (SLAs) into decision-making. Geostationary (GEO) and medium-Earth-orbit (MEO) systems, non-urban environments, and inter-satellite optimization are excluded to maintain scope.
To address these challenges, we propose a real-time handover framework that dynamically assigns users to satellites based on instantaneous link quality and network load. It integrates satellite position data, 3D urban mapping, live network feedback, and user session requirements to enable reactive, propagation-aware handover decisions. This ensures continuous connectivity and resource-efficient allocation in complex, fast-changing environments. The framework illustrated in Figure 1 comprises three main components: Real-Time Monitoring, the Scoring Engine, and the Assignment Module, detailed in Section 3.
      
    
    Figure 1.
      Real-time handover and link assignment framework. AR (available resources), RT (remaining time), EA (elevation angle), DR (achievable data rate).
  
2.1. LEO Satellite Channel Model for Dense Urban Canyons
This study adopts a conventional 3D geometric multiple-input, multiple-output (MIMO) channel model to characterize the LEO satellite downlink to the user terminal. Geometric relationships among scatterers and transceivers define the channel properties. Handover is modeled as an inter-satellite event, with the communication channel assumed to be established beforehand. It targets dense urban canyon environments, characterized by high-rise buildings, trees, and narrow streets, as shown in Figure 2. In these settings, non-line-of-sight (NLoS) conditions are highly probable, with scatterers around the user terminal assumed to be uniformly distributed.
      
    
    Figure 2.
      System model for LEO downlink in urban canyon.
  
Total path loss, , is modeled as the sum of free-space path loss (FSPL) and atmospheric absorption from oxygen () and water vapor () [], with other gases considered negligible above  elevation angles [].
      
        
      
      
      
      
    
      
        
      
      
      
      
    
        where  is the atmospheric attenuation (dB/km), and d is the distance in kilometers.
2.1.1. Free-Space Path Loss (FSPL)
The LoS probability is modeled as a function of the elevation angle:
      
        
      
      
      
      
    
          where a and b are environment-specific parameters,  is the satellite altitude, and  is the slant distance between the satellite and user. The NLoS probability is defined as follows:
      
        
      
      
      
      
    
The slant distance, , illustrated in Figure 3, is geometrically derived from the satellite altitude, Earth’s curvature, and user elevation angle, expressed as follows:
      
        
      
      
      
      
    
          where  is the Earth’s radius,  the satellite altitude, and  the elevation angle from the user’s perspective.
      
    
    Figure 3.
      Slant distance, , between satellite and ground user.
  
The LoS and NLoS path loss components are defined as follows:
      
        
      
      
      
      
    
      
        
      
      
      
      
    
The expected FSPL is computed as follows:
      
        
      
      
      
      
    
Expanding this, we define the following:
      
        
      
      
      
      
    
      
        
      
      
      
      
    
Atmospheric Absorption
Atmospheric attenuation is the sum of specific attenuations due to oxygen and water vapor,
      
        
      
      
      
      
    
Signal-to-Noise Ratio and Data Rate
The received signal-to-noise ratio (SNR) under additive white Gaussian noise is,
      
        
      
      
      
      
    
            where  and  is the transmit power and noise power in dBm. The achievable data rate is as follows:
      
        
      
      
      
      
    
2.2. Handover Process
Building on the channel model, handovers in the proposed system are triggered on-demand, based on real-time detection of link quality fluctuation. Figure 4 illustrates the handover process, with three main stages. In Stage 1: Real-Time Context Monitoring, the system continuously tracks link-level metrics such as the elevation angle and the achievable data rate, to evaluate connection quality. “Real-time context” includes user-specific parameters (e.g., location, mobility, SLA), satellite attributes (e.g., visibility window, resource availability), and channel conditions (e.g., signal quality). This data is periodically gathered from the user terminal, satellite telemetry using satellite ephemeris, and urban geometry. Upon detecting degradation, such as falling elevation angles or reduced data rates, the system initiates stage 2.
      
    
    Figure 4.
      Real-time handover process.
  
Stage 2: Real-Time Handover Decision begins once degradation is detected, such as signal quality drops and SLA violations. The system then evaluates candidate satellites based on current conditions and selects the most suitable target in real time to maintain service continuity. Finally, Stage 3: Seamless Handover Execution ensures fast and efficient switching with minimal service interruption. The transition utilizes previously gathered context and live measurements to sustain QoS throughout the session.
Section 3 introduces the Real-Time Handover and Link Assignment Framework, which formalizes decision-making and resource allocation for efficient user-to-satellite associations in multi-LEO environments.
3. Real-Time Handover and Link Assignment Framework
To support real-time, resource-aware mobility management, the proposed system introduces a Reactive Handover and Link Assignment Framework that dynamically assigns users to the most suitable satellites based on instantaneous conditions, measured data rates, and resource availability.
The process starts with the Context Acquisition Module, which continuously monitors real-time parameters including user equipment (UE) location, achievable data rate, SLA requirements, and link-specific metrics such as available resources (AR), remaining visibility time (RT), and elevation angle (EA). These measurements are passed to the Metrics Weight Adjuster (MWA), which dynamically tunes metric weights based on network conditions and operator policies, ensuring that the scoring process remains responsive to real-time network dynamics.
The Scoring Engine then uses these weights and real-time inputs to compute a score for each satellite-user pair, reflecting the satellite’s ability to meet user requirements while ensuring overall network efficiency. After that, the computed scores are forwarded to the Assignment Module, which evaluates candidate pairings under real-time constraints such as satellite capacity, SLA compliance, and load conditions. It then selects the best assignment to maximize user satisfaction and overall network performance.
By integrating real-time monitoring, adaptive scoring, and resource-aware assignment, the proposed framework offers a scalable and responsive management solution. It maintains continuous service quality while promptly adapting to link degradation and mobility-induced changes.
The following subsection presents the problem formulation, modeling handover and link assignment as a real-time optimization problem constrained by user SLA and satellite resources. Subsequent subsections describe the scoring mechanism and assignment strategy that enable responsive and efficient decision-making within the proposed framework.
3.1. Problem Formulation
The real-time handover and link assignment problem is modeled as a binary combinatorial optimization problem. The objective is to determine the optimal assignment of users to satellites at time t, considering real-time user context, link quality, and resource availability. A binary decision variable, , is defined, where  if user i is assigned to satellite j, and 0 otherwise, as shown in Equation (14).
      
        
      
      
      
      
    
The objective is to maximize user satisfaction by selecting assignments that yield the highest user-satellite scores, , subject to user SLA constraints and satellite capacity.
      
        
      
      
      
      
    
      
        
      
      
      
      
    
      
        
      
      
      
      
    
      
        
      
      
      
      
    
      
        
      
      
      
      
    
      
        
      
      
      
      
    
Here, constraints (16) and (17) enforce that each user’s data rate remains within SLA-defined bounds. Constraint (18) ensures satellite resource limits are respected based on the current load, while (19) restricts each user to at most one satellite. Finally, constraint (20) guarantees that the elevation angle exceeds the minimum threshold for LOS connectivity.
It is worth noting that the weighting coefficients in the objective function (15) are not static. As formulated in Equation (22), each metric’s contribution is adaptively scaled through an entropy-based weighting factor , which evolves with the temporal variability of the corresponding metric (e.g., data rate, resource availability, elevation). This mechanism enables the optimization process to dynamically emphasize metrics that show higher discriminative power or volatility at time t, allowing the objective in (15) to remain responsive to real-time network conditions.
Rather than using conventional integer programming, the problem is solved with the proposed MCSA method. MCSA enables efficient solution space exploration and avoids premature convergence by allowing probabilistic transitions governed by a Markov Chain and accepting suboptimal solutions via an annealing schedule. This approach offers a scalable, adaptive solution that is well suited to the dynamic and time-sensitive nature of handover and link assignment in LEO networks.
3.2. Satellite Scoring Module
As part of the decision process, a scoring mechanism evaluates the suitability of each candidate satellite, , for user  at time t. The score () reflects the satellite’s ability to meet the user’s requirements while satisfying network constraints. It is computed using three measured metrics, AR, RT, and Achievable DR, each weighted according to its real-time impact on service quality.
3.2.1. Weight Calculation
To dynamically determine metric importance, an entropy-based approach is employed to quantify the variability of each metric across candidates, allowing data-driven weight adjustment in response to evolving network conditions. The entropy of metric  for user  is given as follows:
      
        
      
      
      
      
    
          where  is the probability of the  cluster within metric , and K is the number of clusters determined using the elbow method with KMeans clustering.
To address scale differences (e.g., Mbps, seconds, percentage), metrics are normalized. Once entropy is computed, the normalized weight  of metric  for user  is defined as follows:
      
        
      
      
      
      
    
          where M is the total number of considered metrics.
3.2.2. Satellite Scoring
Using the computed weights, the score for satellite  with respect to user  at time t is given as follows:
      
        
      
      
      
      
    
          where  is the visibility time,  is the achievable data rate, and  is satellite j utilization. The weights , , and  correspond to RT, DR, and AR, respectively. Each component is normalized as follows:
      
        
      
      
      
      
    
      
        
      
      
      
      
    
          where  and  are the minimum and maximum visibility durations across all satellite–user pairs at time t. Similarly,  and  define the range of achievable data rates for user  across candidate satellites.
3.3. Real-Time Handover and Assignment Strategy
The proposed real-time handover strategy integrates reactive mobility management with dynamic user-to-satellite assignment. By utilizing satellite position data, urban geometry, live user context, and satellite load metrics, the system enables on-demand handovers to ensure seamless connectivity and efficient resource use. A key challenge here is the user admission under rapidly changing conditions due to mobility and link variability. To address this, we introduce an enhanced Simulated Annealing algorithm guided by a Markov Chain model, offering an adaptive solution for real-time assignment.
In this framework, each state represents a user-to-satellite assignments at a given time. Transitions between states are guided by preference  (Section 3.2.2), which reflects the real-time suitability of satellites for individual users and influences the likelihood of moving toward higher utility configurations. This probabilistic structure enhances both user satisfaction and system performance.
A transition matrix derived from these scores directs the search process. Integrating the Markov Chain model strengthens Simulated Annealing’s capacity to explore the solution space effectively, balancing global exploration and local exploitation and enabling convergence to near-optimal solutions within the tight latency and performance bounds of LEO networks.
The next subsection presents the Adaptive MCSA algorithm, detailing the optimization procedures that support responsive, efficient, and SLA-compliant user-to-satellite assignments in real time.
3.4. Adaptive Handover via Markov Chain-Guided Simulated Annealing Algorithm
This section introduces the proposed adaptive handover framework, which reactively assigns users to the most suitable satellites by continuously assessing real-time network conditions and optimizing user associations. It combines a context-aware module with a metaheuristic optimization strategy to enhance handover decisions in dynamic, resource-constrained LEO environments, it consists of two components:
- Real-time handover algorithm: Acquires real-time user context, satellite visibility, and resource availability. It computes user-satellite preference scores based on SLAs and observed network conditions.
 - MCSA Assignment: A search-based algorithm that iteratively explores and improves user-to-satellite assignments. It employs a Markov chain to guide state transitions and a temperature-based acceptance criterion to balance exploration and exploitation.
 
The adaptive handover strategy is outlined in Algorithm 1.
        
| Algorithm 1 Real-time handover algorithm | 
Require: Set of users , and satellites 
  | 
| Algorithm 2 Real-time satellite-scoring engine | 
Require: Real-time metrics for users U and satellites S:
				   
 
  | 
| Algorithm 3 Markov chain-guided simulated annealing | 
Require:
 
  | 
3.5. Real-Time Satellite Scoring Engine
Algorithm 2 outlines the satellite scoring process used to compute  for all user–satellite pairs, reflecting the satellite’s ability to serve users based on real-time link metrics. The algorithm relies solely on live measurements, ensuring responsiveness to current network conditions.
Required inputs defined in (Lines 1–3) include the achievable data rate, , available resources, , and visibility time, , for all relevant pairs. At the beginning (Lines 4–6), KMeans clustering is applied to standardize metric scales, enabling fair entropy computation and avoiding scale dominance. After which, (Lines 7–12) computes entropy for each metric (Line 9) and normalizes the results into user-specific weights (Line 10), capturing the relative metrics importance under current conditions. Finally, (Lines 13–17),  are computed using the utility function in Equation (23) (Line 15). Afterward, the resulting score is passed to the assignment module (Algorithm 3) to find the best user-to-satellite mappings.
The matching process is handled in the following subsection using the MCSA algorithm, detailed in Algorithm 3.
3.6. Markov Chain-Guided Simulated Annealing Assignment
To address the handover and link assignment problem under SLA and resource constraints, we adopt a metaheuristic optimization strategy based on simulated annealing (SA). To enhance convergence and guide the search process, the SA procedure is integrated with a Markov chain framework, which biases transitions toward higher-scoring user-satellite configurations. This subsection outlines the problem formulation, the relaxation strategy, and the algorithmic structure of the proposed assignment method.
Let  be the set of users and  be the set of satellites. The following definitions are used:
- : binary variable equal to 1 if user i is assigned to satellite j, and 0 otherwise.
 - : current data rate between satellite j and user i.
 - , : SLA-defined minimum and maximum data rates for user i, such that .
 - : satellite j capacity.
 - : utility score for assigning user i to satellite j.
 
3.6.1. Objective Function
3.6.2. Continuous Relaxation and Penalty Method
To enable continuous optimization, the binary variable  is relaxed to the interval . A penalty term is introduced to enforce SLA and capacity constraints:
      
        
      
      
      
      
    
          where the penalty function, , penalizes violations of SLA and capacity constraints: 
      
        
      
      
      
      
    
          where  and  are large positive constants penalizing constraint violations.
3.6.3. Dual Annealing Algorithm
To minimize  (Equation (26)), we adopt dual annealing, a hybrid global-local optimization strategy combining global exploration with local refinement, allowing the algorithm to iteratively improve solutions and escape local optima.
3.6.4. Adaptive Decay Mechanism
To improve convergence, an adaptive decay mechanism is applied to the cooling rate, , adjusting its value dynamically based on the objective progress:
      
        
      
      
      
      
    
          where  and  are the lower and upper bounds of the cooling rate . This adaptive mechanism dynamically balances exploration and exploitation based on the optimization progress, encouraging a broader search in the early stages and promoting convergence as the algorithm evolves.
3.7. Markov Chain-Guided Simulated Annealing Assignment
The assignment procedure is detailed in Algorithm 3, which implements the MCSA algorithm for real-time user-to-satellite assignment. Inspired by the annealing process in metallurgy, MCSA uses probabilistic state transitions to explore the solution space and approximate the global optimum. The objective is to maximize the total satisfaction score, , subject to satellite resource constraints. The algorithm begins by defining the required input parameters (Lines 1–7), namely user and satellite sets  and , user-satellite scores  (Algorithm 2), satellite capacity , user resource demands , initial temperature , adaptive decay bounds , , and maximum iterations N. The initial temperature, , was empirically set to the mean absolute change in the scoring function between random neighboring states, ensuring an initial acceptance probability of approximately 0.8 for uphill moves, following common simulated-annealing heuristics.
The process initializes with a valid random assignment (Line 10) and score evaluation (Line 11), both stored as the current and best-known solution (Lines 12–13). The temperature T and decay factor  are initialized (Line 14), along with a counter to monitor stagnation (Lines 15–16). The initial temperature, , was empirically set to the mean absolute change in the scoring function between random neighboring states, ensuring an initial acceptance probability of approximately 0.8 for uphill moves, following common simulated-annealing heuristics.
The main loop runs for up to  iterations (Line 17), where k is a tunable scaling constant that controls the depth of exploration and scales with problem size. The impact of k is analyzed in Section 4. At each iteration, a new candidate assignment is generated (Line 18) by modifying the current state (e.g., reassigning a user), where invalid configurations are discarded (Lines 19–21). For valid candidates, the new score is computed (Line 22) and evaluated using the Metropolis criterion (Lines 23–24). If accepted, the current state is updated (Lines 25–26). If the new score improves upon the best-known solution, the best state is updated (Lines 27–30),  is increased to encourage exploration (Line 31), and the stagnation counter is reset (Line 32). Otherwise, the counter is incremented (Line 34), and  is decreased to promote exploitation (Line 35) and the temperature is updated (Line 37). Early termination occurs if no improvement is observed over a predefined threshold (Lines 38–40). Upon completion, the best assignment and score are returned (Line 42). This adaptive mechanism balances exploration and exploitation, making it well suited for dynamic LEO environments with user mobility and real-time service demands. The effectiveness of MCSA depends on the accuracy of the calculated , which is generated via the scoring engine. This engine evaluates satellite suitability based on user- and link-specific metrics and is further described in the next subsection, building on in [].
Parameter Sensitivity
To ensure the robustness of the proposed algorithm, a brief sensitivity analysis was conducted on the key parameters governing exploration and convergence. The cooling decay rate, , was varied in the range of , where lower values caused premature convergence, while higher values delayed stabilization. An intermediate setting of  offered the best trade-off between runtime and convergence quality. Similarly, the exploration depth, k, was tuned between 5 and 20, with  providing consistent convergence behavior and solution diversity, as will be detailed in Section 4.3. These parameter choices were retained for all subsequent simulations.
3.8. Discussion on MCSA Convergence and Effectiveness
We argue that the proposed algorithm with adaptive decay demonstrates strong convergence behavior and yields high-quality solutions within practical constraints. This is supported by the following key characteristics:
- Adaptive cooling for balanced search: The annealing schedule dynamically adjusts based on optimization progress. When improvements stall, cooling accelerates to promote convergence; when improvement is detected, cooling slows to allow deeper exploration of promising regions, maintaining a balance between exploration and exploitation.
 - Probabilistic acceptance: At higher temperatures, the algorithm probabilistically accepts suboptimal solutions, enabling escape from local minima. As the temperature declines, acceptance becomes more selective, guiding the search toward optimal configurations.
 - Score-guided Markov transitions: unlike purely random transitions, state changes are biased by a Markov Chain structured around utility scores, encouraging movement toward higher-quality solutions and enhancing convergence stability.
 - Periodic local refinement: Global transitions are complemented by local search, enabling rapid exploitation of promising states. This hybrid approach improves convergence speed and solution quality.
 - Empirical convergence within finite iterations: although theoretical convergence requires infinite time, empirical results show consistent convergence to near-optimal solutions within a practical number of iterations, regardless of initialization.
 - Stochastic completeness and ergodicity: assuming ergodicity, the Markov chain allows non-zero probability transitions to all feasible states, ensuring comprehensive exploration and robustness against local entrapment.
 
These properties confirm that MCSA provides a scalable and effective optimization framework for real-time user-to-satellite assignment in dynamic LEO environments.
The next section presents simulation results, evaluation metrics, and a comparative analysis that validate the framework’s effectiveness under realistic deployment scenarios.
4. Numerical Results
This section evaluates the performance of the proposed MCSA framework using key metrics, including computational complexity, assignment quality, user satisfaction, and resource utilization. Each subsection examines a specific aspect and compares MCSA with benchmark approaches such as integer linear programming (ILP) and a genetic algorithm (GA). For fair comparison, GA parameters were configured following standard convergence-balanced settings (population size = 50, crossover rate = 0.8, mutation rate = 0.1), providing a similar computational budget and termination condition to MCSA.
The effectiveness of the real-time handover and link assignment framework is demonstrated through comprehensive simulations tailored to dense urban canyon environments characterized by high user density and severe signal scattering caused by tall buildings and vegetation. The main simulation parameters are listed in Table 1. Several values are derived from Starlink specifications, while others reflect current trends in LEO satellite network design [,]. To capture urban propagation challenges, elevated values for the path loss parameters, A and B, are adopted, following 3GPP guidelines []. A relatively high minimum elevation angle, , is also enforced to ensure sufficient LOS probability, in line with 3GPP Release 15 recommendations [].
       
    
    Table 1.
    Simulation parameters.
  
4.1. Simulation Setup
The MCSA algorithm was implemented in Python 3.12, and the simulation dataset was generated using STK 17.1 by analyzing Starlink constellation coverage, as summarized in Table 2. At the time of evaluation, the study area was covered by 10 LEO satellites; however, not all were capable of meeting the required data rates, especially when the elevation angle dropped below 20 degrees.
       
    
    Table 2.
    Satellite access report.
  
Although satellite visibility durations ranged from 491 to 763 s, the periods during which data rate constraints were satisfied were significantly shorter. To emulate a group handover scenario, the dataset comprises 1000 users, enabling a comprehensive assessment of the proposed handover and assignment framework.
In addition to orbital and ephemeris computation, STK 17.1 was also used to model the three-dimensional urban propagation environment, including terrain elevation and building obstruction effects. This ensured that user–satellite visibility and handover dynamics accurately reflected realistic urban canyon conditions. To complement the empirical evaluation, we analyze MCSA time complexity to assess its scalability and suitability for real-time execution.
4.2. Time Complexity Analysis
To evaluate the computational efficiency of the MCSA, we analyze its time complexity and compare it with benchmark methods: GA and ILP. Figure 5 shows a log-scaled comparison of runtime growth with respect to the number of users U.
      
    
    Figure 5.
      Comparison of time complexity for ILP, GA, and SA with an increasing number of users.
  
In the left subfigure, ILP shows exponential growth in runtime as the user population increases, an expected outcome due to the exhaustive search space involved, which limits its scalability for large-scale assignment problems (e.g, at 1000 users, ILP becomes computationally infeasible). The right subfigure compares the performance of the proposed framework with GA. Although GA exhibits a near-logarithmic trend, its runtime remains consistently higher than that of SA. This is attributed to GA’s population-based operations, which require evolving multiple candidate solutions per generation. It is noted that these runtime trends are empirical and depend on the parameter settings used for GA and MCSA (e.g., population size, mutation rate, and initial temperature); thus, the comparison reflects observed behavior under equivalent computational budgets, rather than universal asymptotic bounds.
The time complexity of the integrated framework (Algorithm 1–3) is , where k is a tunable constant controlling the depth of exploration,  the set of users, and  is their candidate satellites. This overall complexity stems from three main components:
- Satellite filtering (Algorithm 1): operates over all users and their candidate satellites, contributing .
 - Satellite scoring (Algorithm 2): processes all user-satellite pairs, contributing .
 - MCSA assignment (Algorithm 3): dominates the overall cost with .
 
While classical simulated annealing can exhibit exponential time complexity in its unrestricted form, the proposed MCSA operates under a bounded iteration schedule , which constrains runtime growth to linear scaling with respect to the number of users and satellites. This bounded design explains the polynomial runtime trend observed in Figure 5. Compared to ILP’s exponential complexity and GA’s higher per-iteration cost, MCSA offers superior scalability with near-optimal performance, confirming its suitability for real-time use in large-scale LEO networks.
4.3. Effect of Exploration Depth, k, on Assignment Quality
The exploration depth k significantly influences the performance of the MCSA assignment framework, as it determines the total number of iterations (). This subsection analyzes the impact of varying k on assignment quality, stability, and consistency. Two key metrics are evaluated: maximum assignment score and the distribution of user satisfaction, providing practical insights for selecting an optimal k that balances performance with computational cost.
For instance, Figure 6 illustrates the impact of the exploration parameter k on assignment quality. Specifically, Figure 6a illustrates how increasing k from 1 to 20 results in a rapid rise in the assignment score, highlighting the benefits of deeper exploration. Beyond 20, improvements plateau, indicating diminishing returns and unnecessary computational overhead. These results suggest that setting k between 10 and 20 achieves an effective trade-off between solution quality and runtime.
      
    
    Figure 6.
      Effect of k on (a) highest achieved score and (b) average satisfaction and variance.
  
Moreover, Figure 6b presents average user satisfaction and its variance as functions of k. Satisfaction increases sharply up to , stabilizing around 97%, while variance decreases significantly until , indicating more consistent service quality. Beyond this point, both metrics level off, confirming that the solution space has been sufficiently explored. These findings confirm that moderate values of k not only improve assignment but also enhance fairness across users.
In summary, moderate exploration values ( to 20) deliver the highest quality, improved satisfaction, and reduced variance with manageable computational cost. Higher values yield marginal gains at the expense of runtime, while lower values risk suboptimal convergence. The next subsection examines how these gains influence resource utilization.
4.4. Satellite Resource Utilization Comparison
This subsection analyzes satellite resource utilization across different methods, namely SA, ILP, and GA, under both normal- and high-load conditions. The evaluation focuses on load balancing efficiency and the adaptability of each method to growing network demands.
4.4.1. Simulated Annealing (SA)
Figure 7 illustrates the resource utilization achieved through the SA method under both normal and high load conditions. At normal load, Figure 7a, SA achieves a balanced utilization across most satellites. Resource usage is evenly distributed, avoiding both bottlenecks and underutilization. Satellite 6 shows notably lower utilization due to its inability to meet user data rate requirements during the simulation, making it less favorable for assignment. Overall, the results demonstrate SA’s ability to optimize assignments while promoting fairness in resource consumption. Under high load, Figure 7b, SA continues to perform effectively. While overall utilization increases, the load remains evenly distributed across satellites, with no instance of critical overload. This confirms SA’s strong load-balancing capability, even under increased network demand.
      
    
    Figure 7.
      Satellite resource utilization under (a) normal- and (b) high-load conditions using SA.
  
4.4.2. Integer Linear Programming (ILP)
Figure 8 illustrates ILP satellite utilization under normal- and high-load conditions. Under normal load, Figure 8a, ILP exhibits a relatively uneven distribution. Satellites 1, 6, and 10 are heavily utilized, while others retain substantial unused capacity. This imbalance stems from ILP’s deterministic behavior, where score maximization may override fair load distribution. At high loads, Figure 8b, the imbalance becomes more pronounced. Several satellites approach saturation, while others remain underutilized. Although ILP meets overall demand, the skewed distribution may cause service bottlenecks and increased handover frequency in practical scenarios.
      
    
    Figure 8.
      Satellite resource utilization under (a) normal- and (b) high-load conditions using ILP.
  
4.4.3. Genetic Algorithm (GA)
Figure 9 shows GA satellite utilization under normal- and high-load conditions. At a normal load, Figure 9a, GA shows moderate variability in satellite utilization. While some satellites are used efficiently, others display noticeable imbalances due to the stochastic nature of GA’s evolutionary process. Under high load, Figure 9b, GA struggles to maintain balanced allocations. Resource usage becomes more erratic, with some satellites underutilized and others congested, indicating reduced reliability in convergence compared to SA and ILP.
      
    
    Figure 9.
      Satellite resource utilization under (a) normal- and (b) high-load conditions using GA.
  
In summary, the SA approach consistently outperforms ILP and GA in satellite resource utilization. Under both normal- and high-load conditions, it achieves a more balanced load distribution across the constellation, minimizing severe underutilization and overload. These results underscore the effectiveness of the proposed framework in adapting to dynamic network conditions while maintaining operational efficiency. All reported results represent the average of 20 independent STK-based simulation runs with randomized user and satellite positions. The observed variance across runs remained below 3%, confirming consistency at the 95% confidence level and supporting the statistical reliability of the presented comparisons.
4.5. Assigned vs. Required Data Rate Analysis
This subsection evaluates the accuracy of each assignment strategy in meeting user requirements under both normal- and high-load conditions. Performance is assessed by plotting assigned versus required data rates, using an equality line as the ideal reference for perfect assignment.
4.5.1. Simulated Annealing (SA)
Figure 10a,b show assigned versus required data rates using SA-based assignment under normal and high loads. In both cases, assigned rates align closely with the equality line, indicating high precision. While slight dispersion appears at higher data rates under heavy load, SA consistently maintains strong assignment accuracy. A few users remain unserved at high load due to resource constraints, but overall, SA effectively balances performance and feasibility across conditions.
      
    
    Figure 10.
      Assigned vs. required data rates under (a) normal- and (b) high-load conditions for SA.
  
4.5.2. Integer Linear Programming (ILP)
Figure 11 shows ILP-based results. Under normal load (Figure 11a), ILP aligns well with user demands. However, at a high load (Figure 11b), deviations increase, especially for users with higher data rate demand. A notable number of users were blocked due to limited satellite resources, highlighting the limitations of hard optimization approaches, such as ILP under heavy-load conditions.
      
    
    Figure 11.
      Assigned vs. required data rates under (a) normal- and (b) high-load conditions for ILP.
  
4.5.3. Genetic Algorithm (GA)
Figure 12 shows GA-based results. Under normal load (Figure 12a), GA follows the equality line, but with greater dispersion than SA and ILP. At high load (Figure 12b), assignment quality declines further, with many users underserved and a higher blocked users. GA’s stochastic nature results in less consistent performance, particularly in resource-constrained scenarios.
      
    
    Figure 12.
      Assigned vs. required data rates under (a) normal- and (b) high-load conditions for GA.
  
Overall, these findings confirm that all methods perform well under normal load. However, under high load, resource scarcity leads to increased user blocking, most notably with ILP and GA. The SA-based framework demonstrates the most consistent and resilient performance, maintaining high satisfaction levels and superior adaptability to resource constraints.
4.6. User Satisfaction Distribution
Beyond performance, it is essential to assess the distribution of user satisfaction, especially in multi-user satellite networks where fairness and consistency are crucial. This subsection compares individual user satisfaction across the three methods using violin plots to illustrate the distribution density and heatmaps to visualize per-user satisfaction.
4.6.1. Simulated Annealing (SA)
Figure 13 and Figure 14 shows the user satisfaction distribution under the SA-based assignment strategy for both normal and high load conditions.
      
    
    Figure 13.
      User satisfaction under normal load: (a) Individual user satisfaction. (b) User satisfaction density using SA.
  
      
    
    Figure 14.
      User satisfaction under high load: (a) Individual user satisfaction. (b) User satisfaction density using SA.
  
Under normal load (Figure 13a,b), most users achieve near-complete satisfaction, as indicated by the dense clustering around 100%. The violin plot reveals a narrow peak with minimal variance, confirming consistent service quality across users.
In high-load scenarios (Figure 14a,b), satisfaction remains high for a large portion of users, though a secondary cluster near 0% emerges, corresponding to users who were blocked due to resource constraints. Despite these limitations, served users maintain high satisfaction levels, demonstrating SA’s ability to prioritize quality under stress.
These trends reflect the design philosophy of the SA-based strategy: rather than spreading limited resources thinly, it maximizes overall satisfaction by fully serving as many users as capacity allows. In high-load conditions, this results in selective admission, blocking users whose demands would significantly degrade overall performance. This resource-aware approach ensures high and consistent QoS for connected users, balancing service coverage with user experience.
4.6.2. Integer Linear Programming (ILP)
Figure 15 and Figure 16 illustrates the user satisfaction distribution under the ILP-based assignment strategy for both normal- and high-load conditions.
      
    
    Figure 15.
      Users’ satisfaction under normal load: (a) Individual user satisfaction. (b) User satisfaction density using ILP.
  
      
    
    Figure 16.
      Users’ satisfaction under high load: (a) Individual user satisfaction. (b) User satisfaction density using ILP.
  
Under a normal load (Figure 15a,b), most users achieve high satisfaction levels near 100%. However, compared to SA, slightly higher variance is observed. The violin plot shows a broader peak, indicating less uniform satisfaction across users. A small subset experiences lower satisfaction, reflecting ILP’s deterministic yet less flexible handling of resource allocation.
In high-load scenario (Figure 16a,b), the number of users with near-zero satisfaction increases, reflecting those blocked due to capacity constraints. Among the served users, satisfaction remains high, at nearly 100%, although the distribution is slightly more dispersed than in the SA-based results.
These trends stem from ILP’s strict optimization behavior. While ILP seeks to maximize the global objective, it does not account for users’ fairness. Consequently, it may achieve high satisfaction for many users but leave others underserved, especially under high-demand conditions. This highlights ILP’s rigid yet precise resource allocation, which can lead to uneven user experiences when resources are constrained.
4.6.3. Genetic Algorithm (GA)
Figure 17 and Figure 18 show user satisfaction distribution for the GA-based assignment under normal- and high-load conditions. With a normal load (Figure 17a,b), most users achieve high satisfaction, though with slightly greater variability than ILP. The violin plot reveals a strong peak near 100% satisfaction, accompanied by a modest spread toward lower values, suggesting occasional inconsistencies in service allocation. With a high load (Figure 18a,b), a substantial portion of users are still fully satisfied. However, a noticeable rise in users with near-zero satisfaction is observed, reflecting increased resource contention. This suggests that, under limited capacity, some users are either blocked or receive poor assignments.
      
    
    Figure 17.
      User satisfaction under normal load: (a) Individual user satisfaction. (b) User satisfaction density using GA.
  
      
    
    Figure 18.
      User satisfaction under high load: (a) Individual user satisfaction. (b) User satisfaction density using GA.
  
These observations highlight the characteristics of GA optimization: it aims for near-optimal solutions through iterative improvement but lacks strong prioritization mechanisms under a heavy load. As a result, while many users maintain high satisfaction, greater disparities emerge compared to more structured methods such as ILP. Still, GA manages to deliver acceptable service to a substantial subset of users despite resource constraints. In contrast, SA consistently achieves higher satisfaction with greater fairness and uniformity. While ILP offers optimal solutions, it may sacrifice equitable distribution. GA shows the least consistent satisfaction profile. These findings underscore the suitability of the SA-based framework for user-centric, delay-sensitive, and high-QoS applications in dynamic, resource-limited LEO satellite environments.
4.7. Overall Comparison Across Scenarios
To consolidate the evaluation, this subsection compares the overall performance of the three assignment strategies under varying traffic loads. The analysis focuses on three key metrics: average satellite utilization, average user satisfaction, and the number of blocked users. Figure 19 shows the average utilization for each method from light to high load conditions. The SA strategy consistently achieves higher utilization than ILP and GA, highlighting its efficiency in resource allocation. Even under high load, SA maintains balanced usage across satellites, demonstrating resilience in managing limited capacity without causing underutilization or excessive congestion.
      
    
    Figure 19.
      Average satellite utilization under light and high load.
  
Figure 20 shows that SA achieves higher user satisfaction than GA and performs comparably to ILP under both light and high load conditions.
      
    
    Figure 20.
      Average user satisfaction under light to high load.
  
Finally, Figure 21 shows that SA matches ILP in blocking rate while outperforms GA under both load conditions. This highlights SA’s effectiveness in maximizing user admission while managing limited resources.
      
    
    Figure 21.
      Number of blocked users under light and high loads.
  
Collectively, these results confirm that the SA framework strikes an effective balance between user satisfaction, resource utilization, and blocking rate, making it well-suited for dynamic and resource-limited LEO satellite environments. While the proposed MCSA demonstrates consistent scalability and adaptability across simulation scenarios, real-world deployment may involve additional considerations such as signaling coordination and on-board processing constraints, which are further outlined in the conclusion.
5. Conclusions
This paper has proposed an adaptive real-time handover framework for LEO satellite networks in dynamic environments like urban canyons. By integrating a real-time scoring engine with a Markov chain-guided simulated annealing algorithm, the framework ensures SLA compliance and efficient resource allocation.
Simulation results demonstrate that the proposed approach outperforms ILP and GA in terms of user satisfaction, handover reliability, scalability, and resource balance, confirming its practicality for large-scale LEO deployments. While performance comparisons were made under identical runtime and parameter budgets, we note that, in line with the No-Free-Lunch theorem, the relative advantage of one heuristic over another may vary under different parameterizations or problem instances. On average, the MCSA framework improves user satisfaction by about 10% and satellite resource utilization by 10–15% compared to GA, enabling support for nearly 50% more users under dense-load conditions without requiring additional capacity.
Future work will focus on real-time deployment, multi-layer NTN architectures integration, and adaptive policies to address diverse mobility and service demands.
Author Contributions
Conceptualization, M.A.M., A.Y.A., and H.S.H.; methodology, M.A.M., A.Y.A., and H.S.H.; software, M.A.M.; validation, M.A.M., A.Y.A., and H.S.H.; formal analysis, M.A.M., A.Y.A., and H.S.H.; investigation, M.A.M., A.Y.A., and H.S.H.; data curation, M.A.M.; writing—original draft, M.A.M.; visualization, M.A.M.; supervision, A.Y.A. and H.S.H. All authors have read and agreed to the published version of the manuscript.
Funding
This research was also supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) under Grant RGPIN-2025-05001.
Data Availability Statement
The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.
Conflicts of Interest
The authors declare that there are no conflicts of interest.
References
- Liu, J.; Shi, Y.; Fadlullah, Z.M.; Kato, N. Space-Air-Ground Integrated Network: A Survey. IEEE Commun. Surv. Tutor. 2018, 20, 2714–2741. [Google Scholar] [CrossRef]
 - Fang, X.; Feng, W.; Wei, T.; Chen, Y.; Ge, N.; Wang, C.-X. 5G Embraces Satellites for 6G Ubiquitous IoT: Basic Models for Integrated Satellite Terrestrial Networks. IEEE Internet Things J. 2021, 8, 14399–14417. [Google Scholar] [CrossRef]
 - Lin, X.; Cioni, S.; Charbit, G.; Chuberre, N.; Hellsten, S.; Boutillon, J.-F. On the Path to 6G: Embracing the Next Wave of Low Earth Orbit Satellite Access. IEEE Commun. Mag. 2021, 59, 36–42. [Google Scholar] [CrossRef]
 - Al-Hourani, A. Session Duration Between Handovers in Dense LEO Satellite Networks. IEEE Wirel. Commun. Lett. 2021, 10, 2810–2814. [Google Scholar] [CrossRef]
 - Ali, I.; Al-Dhahir, N.; Hershey, J. Predicting the visibility of LEO satellites. IEEE Trans. Aerosp. Electron. Syst. 1999, 35, 1183–1190. [Google Scholar] [CrossRef]
 - Su, Y.; Liu, Y.; Zhou, Y.; Yuan, J.; Cao, H.; Shi, J. Broadband LEO Satellite Communications: Architectures and Key Technologies. IEEE Wirel. Commun. 2019, 26, 55–61. [Google Scholar] [CrossRef]
 - Al Homssi, B.; Al-Hourani, A.; Wang, K.; Conder, P.; Kandeepan, S.; Choi, J.; Allen, B.; Moores, B. Next Generation Mega Satellite Networks for Access Equality: Opportunities, Challenges, and Performance. IEEE Commun. Mag. 2022, 60, 18–24. [Google Scholar] [CrossRef]
 - Osoro, O.B.; Oughton, E.J. A Techno-Economic Framework for Satellite Networks Applied to Low Earth Orbit Constellations: Assessing Starlink, OneWeb and Kuiper. IEEE Access 2021, 9, 141611–141625. [Google Scholar] [CrossRef]
 - Liu, H.; Wang, Y.; Wang, Y. A Successive Deep Q-learning Based Distributed Handover Scheme for Large-Scale LEO Satellite Networks. In Proceedings of the 2022 IEEE 95th Vehicular Technology Conference: (VTC2022-Spring), Helsinki, Finland, 19–22 June 2022; pp. 1–6. [Google Scholar]
 - Chowdhury, P.K.; Atiquzzaman, M.; Ivancic, W. Handover schemes in satellite networks: State-of-the-art and future research directions. IEEE Commun. Surv. Tutor. 2006, 8, 2–14. [Google Scholar] [CrossRef]
 - Re, E.D.; Fantacci, R.; Giambene, G. Efficient dynamic channel allocation techniques with handover queuing for mobile satellite networks. IEEE J. Sel. Areas Commun. 1995, 13, 397–405. [Google Scholar] [CrossRef]
 - Maral, G.; Restrepo, J.; Re, E.D.; Fantacci, R.; Giambene, G. Performance analysis for a guaranteed handover service in an LEO constellation with a ‘satellite-fixed cell’ system. IEEE Trans. Veh. Technol. 1998, 47, 1200–1214. [Google Scholar] [CrossRef]
 - Re, E.D.; Fantacci, R.; Giambene, G. Handover queuing strategies with dynamic and fixed channel allocation techniques in low earth orbit mobile satellite systems. IEEE Trans. Commun. 1999, 47, 89–102. [Google Scholar]
 - Wu, Z.; Jin, F.; Luo, J.; Fu, Y.; Shan, J.; Hu, G. A graph-based satellite handover framework for LEO satellite communication networks. IEEE Commun. Lett. 2016, 20, 1547–1550. [Google Scholar] [CrossRef]
 - Feng, L.; Liu, Y.; Wu, L.; Zhang, Z.; Dang, J. A satellite handover strategy based on MIMO technology in LEO satellite networks. IEEE Commun. Lett. 2020, 24, 1505–1509. [Google Scholar] [CrossRef]
 - Zhang, S.; Liu, A.; Han, C.; Ding, X.; Liang, X. A network-flows-based satellite handover strategy for LEO satellite networks. IEEE Wirel. Commun. Lett. 2021, 10, 2669–2673. [Google Scholar] [CrossRef]
 - Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
 - Hu, X.; Zhang, Y.; Liao, X.; Liu, Z.; Wang, W.; Ghannouchi, F.M. Dynamic beam hopping method based on multi-objective deep reinforcement learning for next generation satellite broadband systems. IEEE Trans. Broadcast. 2020, 66, 630–646. [Google Scholar] [CrossRef]
 - Lin, Z.; Ni, Z.; Kuang, L.; Jiang, C.; Huang, Z. Dynamic beam pattern and bandwidth allocation based on multi-agent deep reinforcement learning for beam hopping satellite systems. IEEE Trans. Veh. Technol. 2022, 71, 3917–3930. [Google Scholar] [CrossRef]
 - Xu, G.; Tan, F.; Ran, Y.; Zhao, Y.; Luo, J. Joint beam-hopping scheduling and coverage control in multibeam satellite systems. IEEE Wirel. Commun. Lett. 2023, 12, 267–271. [Google Scholar] [CrossRef]
 - Tsuchida, H.; Kawamoto, Y.; Kato, N.; Kaneko, K.; Tani, S.; Uchida, S.; Aruga, H. Efficient power control for satellite-borne batteries using Q-learning in low-earth-orbit satellite constellations. IEEE Wirel. Commun. Lett. 2020, 9, 809–812. [Google Scholar] [CrossRef]
 - Huang, J.; Yang, Y.; Yin, L.; He, D.; Yan, Q. Deep reinforcement learning-based power allocation for rate-splitting multiple access in 6G LEO satellite communication system. IEEE Wirel. Commun. Lett. 2022, 11, 2185–2189. [Google Scholar] [CrossRef]
 - Li, X.; Zhang, H.; Li, W.; Long, K. Multi-agent DRL for user association and power control in terrestrial-satellite network. In Proceedings of the IEEE Global Communications Conference (GLOBECOM), Madrid, Spain, 7–11 December 2021; pp. 1–5. [Google Scholar]
 - Tsai, K.; Fan, L.; Wang, L.; Lent, R.; Han, Z. Multi-commodity flow routing for large-scale LEO satellite networks using deep reinforcement learning. In Proceedings of the IEEE Wireless Communications and Networking Conference (WCNC), Austin, TX, USA, 10–13 April 2022; pp. 626–631. [Google Scholar]
 - Zuo, P.; Wang, C.; Wei, Z.; Li, Z.; Zhao, H.; Jiang, H. Deep reinforcement learning based load balancing routing for LEO satellite network. In Proceedings of the IEEE 95th Vehicular Technology Conference (VTC), Helsinki, Finland, 19–22 June 2022; pp. 1–6. [Google Scholar]
 - Liu, J.; Zhao, B.; Xin, Q.; Su, J.; Ou, W. DRL-ER: An intelligent energy-aware routing protocol with guaranteed delay bounds in satellite mega-constellations. IEEE Trans. Netw. Sci. Eng. 2021, 8, 2872–2884. [Google Scholar] [CrossRef]
 - Jiang, C.; Zhang, H.; Ren, Y.; Han, Z.; Chen, K.-C.; Hanzo, L. Machine Learning Paradigms for Next-Generation Wireless Networks. IEEE Wirel. Commun. 2017, 24, 98–105. [Google Scholar] [CrossRef]
 - Song, M.; Tian, J.; Zhang, H.; Xu, T.; Hu, H.; Zhou, T. Deep reinforcement learning based multi-objective handover for LEO satellite networks. In Proceedings of the IEEE Vehicular Technology Conference (VTC-Spring), Oslo, Norway, 17–20 June 2025; pp. 1–6. [Google Scholar] [CrossRef]
 - Sun, Y.; Zhai, Y.; Wu, W.; Si, P.; Yu, F.R. Handover for multi-beam LEO satellite networks: A multi-objective reinforcement learning method. IEEE Commun. Lett. 2024, 28, 2834–2838. [Google Scholar] [CrossRef]
 - Liu, Y.; Ma, T.; Tang, Z.; Qin, X.; Zhou, H.; Shen, X. Ultra-Dense LEO Satellite Access Network Slicing: A Deep Reinforcement Learning Approach. In Proceedings of the GLOBECOM 2023—2023 IEEE Global Communications Conference, Kuala Lumpur, Malaysia, 4–8 December 2023; pp. 5043–5048. [Google Scholar]
 - Xie, H.; Zhan, Y.; Zeng, G.; Pan, X. LEO Mega-Constellations for 6G Global Coverage: Challenges and Opportunities. IEEE Access 2021, 9, 164223–164244. [Google Scholar] [CrossRef]
 - Zhu, H.; Cao, Y.; Wang, W.; Jiang, T.; Jin, S. Deep Reinforcement Learning for Mobile Edge Caching: Review, New Features, and Open Issues. IEEE Netw. 2018, 32, 50–57. [Google Scholar] [CrossRef]
 - Fontan, F.P.; Vazquez-Castro, M.; Cabado, C.E.; Garcia, J.P.; Kubista, E. Statistical modeling of the LMS channel. IEEE Trans. Veh. Technol. 2001, 50, 1549–1567. [Google Scholar] [CrossRef]
 - Zhou, Y.; Chen, H.; Dou, Z. MOLM: Alleviating Congestion through Multi-Objective Simulated Annealing-Based Load Balancing Routing in LEO Satellite Networks. Future Internet 2024, 16, 109. [Google Scholar] [CrossRef]
 - ITU-R Recommendation M.2150-2. Detailed Specifications of the Terrestrial Radio Interfaces of International Mobile Telecommunications-2020 (IMT-2020), December 2023. Available online: https://www.itu.int/rec/R-REC-M.2150/en (accessed on 31 October 2025).
 - ITU-R Report M.2514. Vision, Requirements and Evaluation Guidelines for Satellite Component of IMT-2020. 2022. Available online: https://www.itu.int/hub/publication/r-rep-m-2514-2022/ (accessed on 31 October 2025).
 - ITU-R. Propagation Data and Prediction Methods Required for the Design of Earth-Space Telecommunication Systems. 2017. Available online: https://www.itu.int/rec/R-REC-P.618 (accessed on 31 October 2025).
 - Rosenkranz, P.W. Absorption of Microwaves by Atmospheric Gases. In Atmospheric Remote Sensing by Microwave Radiometry; Janssen, M.A., Ed.; John Wiley & Sons, Inc.: New York, NY, USA, 1993; Chapter 2; pp. 37–74. [Google Scholar]
 - ITU-R. Recommendation ITU-R P.676—Attenuation by Atmospheric Gases. 2022. Available online: https://www.itu.int/rec/R-REC-P.676 (accessed on 31 October 2025).
 - Massad, M.A.; Alma’Aitah, A.Y.; Hassanein, H.S. STEBS: Spatio-Temporal Entropy-Based Scoring Handover Model for LEO Satellite. In Proceedings of the 2024 International Wireless Communications and Mobile Computing (IWCMC), Ayia Napa, Cyprus, 27–31 May 2024; pp. 761–767. [Google Scholar]
 - Federal Communications Commission (FCC). Application for Modification of Authorization for SpaceX NGSO Satellite System, File Number SAT-MOD-20181108-00083. 2018. Available online: https://fcc.report/IBFS/SAT-MOD-20181108-00083 (accessed on 31 October 2025).
 - Gao, Z.; Liu, A.; Liang, X. The Performance Analysis of Downlink NOMA in LEO Satellite Communication System. IEEE Access 2020, 8, 93723–93732. [Google Scholar] [CrossRef]
 - 3GPP. G. P. P. Technical Specification Group Radio Access Network; NR; Overall Description; Stage 2 (Release 18), 3GPP TS 38.300, 2021. Available online: https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=3747 (accessed on 31 October 2025).
 - 3GPP. G. P. P. Technical Specification Group Radio Access Network; Study on New Radio (NR) to Support Non-Terrestrial Networks. R15. Available online: https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=3234 (accessed on 31 October 2025).
 
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.  | 
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).