1. Introduction
With the continuous expansion of the low-Earth-orbit (LEO) satellite network, awareness of the status of multi-dimensional heterogeneous network resources can provide basic data support for delivering high-quality network services, and has become the key technology for the operation and maintenance of satellite network. However, the high dynamism of satellite network topology, coupled with the instability of inter-satellite links (ISLs) and satellite-ground links (SGLs), can result in substantial delays in the transmission of awareness data, reducing the real-time performance of network awareness [
1]. Meanwhile, traditional terrestrial network awareness methods cannot meet the differentiated and dynamic service awareness requirements of satellite networks [
2]. Therefore, efficient network awareness methods have emerged as a key area of focus in the ongoing development of LEO satellite networks.
Network awareness primarily consists of measurement and reporting process. Measurement mainly involves collecting the resource status information of nodes and transmitting awareness data in the network. Reporting refers to the process of aggregating awareness data and transmitting them to the ground control center [
3]. Research on resource-aware optimization methods for terrestrial network has lasted for decades. However, to the best of our knowledge, there are no relevant studies on LEO satellite network scenarios. Additionally, most studies of terrestrial networks have typically concentrated on expanding the awareness scope [
4] and optimization of awareness overhead [
5], while there is a lack of research focusing on the flexibility and the timeliness of awareness method [
6]. For example, Yang et al. [
7] employed an active telemetry method for giant satellite constellations, meaning that both the measurement and reporting processes utilize separate links or frequency bands for transmission. By optimizing resource allocation during the awareness process, the volume of awareness data transmitted was enhanced. However, this method occupied substantial on-board resources, thereby increasing awareness overhead and the complexity of the constellation. In order to minimize the awareness overhead, in-band network telemetry (INT) technology has been proposed as a lightweight passive telemetry approach [
8]. It conveys awareness data by utilizing the additional fields appended to the service data packet headers [
9]. The transmission route of the service concurrently serves as the awareness path, enabling operations such as reading, writing, and deleting awareness data through programmable technology [
10], which is suitable for multi-service concurrent scenarios in large-scale satellite network [
11]. Liu et al. [
12] proposed utilizing the Euler path planning algorithm to decrease the redundancy of INT paths, thereby optimizing the telemetry paths within LEO constellations. Meanwhile, regarding the reporting process, the awareness method based on INT often reports after the target node separates the awareness and service data. For long-distance paths with multi-hop, a multi node reporting method based on fixed hop intervals can also be adopted to mitigate cumulative costs and enhance the timeliness performance of reporting.
Unlike terrestrial networks, the unique, complex, and dynamic characteristics of satellite network pose three main challenges to awareness technology: (1) The real-time challenge brought by the large delay in the transmission path during the network measurement process [
13]. Due to the high-speed movement of satellite nodes and the considerable distance between satellites, the propagation delay is relatively significant when compared to terrestrial networks [
14]. Moreover, under different path length, the delay generated by each node increases further through multi-hop accumulation, greatly affecting the real-time awareness and reducing the accuracy of awareness data [
15]. (2) The instability of the SGL results in a limited number of optional reporting nodes and poor dynamic selection of reporting nodes. The SGLs of nodes have short duration and poor stability leading to a limited number of reporting nodes that can be selected on the path. At this point, the awareness data must either wait on the satellite or switch to another reporting node, thereby increasing the reporting time. (3) The high complexity of dynamic awareness caused by the dual heterogeneity of network topology and services. The dynamism of the topology reduces the timeliness of awareness and increases the difficulty of overall network awareness. The dynamism of services enhances the complexity of adjusting awareness methods in different time-varying scenarios. The fixed measurement and reporting strategy exhibits limited flexibility when it comes to optimizing the measurement process or the selection of reporting nodes in dual dynamic scenarios, making it impossible to be adjusted adaptively based on actual awareness requirements.
In response to the challenges of network awareness in LEO satellite networks, a dynamic reporting nodes selection method for network awareness based on active-passive integrated network telemetry (DRNSM-APINT) is proposed, which can flexibly adjust the telemetry mechanism and dynamically select the optimal number and position of reporting nodes for different services. Specifically, the method proposes an active-passive integrated network telemetry (APINT) model. Unlike the existing active and passive telemetry systems, which are both initiated by a single node, in this method, passive telemetry and active telemetry are initiated by the source and target nodes, respectively, converging toward the center of the path to shorten the awareness path. The optimal location for the reporting node is selected based on constraints related to the transmission time of ISLs and the status of SGLs, thereby reducing the time of measurement and reporting, and enhancing the timeliness of awareness in single-node reporting scenarios. Furthermore, considering that the single-node reporting mechanism in LEO satellite network cannot meet the awareness timeliness requirements of multi-hop and long-distance paths, an optimized selection method for multiple reporting nodes is proposed. The concept of virtual boundaries is introduced to divide the awareness path. Additionally, within each segment, the APINT model is utilized to select the optimal reporting nodes. Then, by comparing the effective-ness of selecting multiple reporting nodes across various virtual boundary partitioning scenarios, the most suitable number of reporting nodes for the current awareness path is determined, thereby enabling optimized and flexible selection of multiple reporting nodes. However, due to variations in path hops, distance of ISLs and SGLs, and the characteristics of these links among different services, the optimal number of reporting nodes for each service path varies in multi-service scenarios. Consequently, it is necessary to dynamically devise an optimal reporting scheme for each service, which further complicates the solution process. Considering the varying selection requirements for reporting nodes across diverse service scenarios, based on the model-free characteristic and the strong adaptability to dynamic scenarios of Q-learning, a lightweight Q-learning algorithm based on a pre-planned virtual boundary is proposed. The boundary is divided using the equal division method, and the selection of virtual boundaries for each service path is pre planned through dynamic boundary floating. The method can exclude discontinuous boundary combinations, decrease the number of search iterations required in Q-learning, and simplify the complexity of the algorithm’s solution process. Simultaneously, based on the proposed lightweight Q-learning algorithm, the optimal number and position of reporting nodes for different paths in multi-service scenarios are dynamically solved, achieving efficient awareness timeliness and flexible dynamic selection of reporting nodes in LEO satellite networks.
The main contributions of this paper are as follows:
- (1)
An APINT model is built for LEO satellite networks. Combining active and passive telemetry, the awareness path is shortened. Additionally, the optimal location for the reporting node is selected based on constraints related to the transmission time and status of SGLs, thereby reducing awareness time and enhancing awareness timeliness.
- (2)
A dynamic reporting nodes selection method of network awareness based on active-passive integrated network telemetry (DRNSM-APINT) is proposed, introducing the concept of virtual boundaries to divide the awareness path into segments, and using the APINT model to find the optimal reporting node of each segment. By comparing the selection results of reporting nodes across various virtual boundary settings scenarios, the best reporting solution was chosen for the current awareness path, solving the problem of flexible selection of multiple reporting nodes.
- (3)
A multi-reporting nodes selection algorithm based on lightweight Q-learning for various services is proposed. The dynamic pre-planning of virtual boundaries mitigates the computational complexity of Q-learning solutions. The proposed lightweight Q-learning algorithm dynamically determines the optimal number and location of reporting nodes for different service paths, achieving efficient timeliness of awareness and flexible selection of reporting nodes in dual time-varying scenarios of service and path.
The remainder of this paper is organized as follows:
Section 2 introduces the composition of LEO satellite network and presents the APINT model.
Section 3 provides the details of optimized selection method for multiple reporting nodes.
Section 4 introduces a lightweight Q-learning algorithm based on pre-planned virtual boundary.
Section 5 validates the proposed method through simulation experiments.
Section 6 provides a brief summary of the paper.
3. Dynamic Multi-Reporting Nodes Selection Method
In LEO satellite networks, the length of the end-to-end service path is relatively long, and the single-node reporting method can lead to significant awareness delay and overhead due to multi-hop accumulation. Therefore, multiple reporting nodes need to be selected for segmented reporting. When the INT method is applied to terrestrial networks, a fixed hop interval approach is commonly employed to select multiple reporting nodes, due to the relative stability of terrestrial network nodes and links. However, when applied to satellite networks, such methods may result in significant cumulative awareness delays due to unstable ISLs and may also prevent timely reporting of awareness data due to the intermittency of SGLs. In this section, a dynamic multi-reporting nodes selection method of network awareness based on the APINT model is proposed, which can select the optimal number and location of reporting nodes for the awareness path.
The procedure for selecting multi-reporting nodes is illustrated in
Figure 3. The selection of the optimal reporting node requires consideration of both the number of reporting nodes and the length of their respective intervals, as both factors jointly influence the telemetry time of the awareness path and the specific location of the reporting node. The issue of selecting multiple reporting nodes can be converted into an interval partitioning optimization problem.
Therefore, the concept of virtual boundaries is introduced, dividing the multi-hop path into segments, with the set of virtual boundaries for all segments represented as , where the number of satellites in each segment is . Using each segment as the optimization object, the best reporting node is searched for within each segment based on the APINT model. Specifically, upon reaching the virtual boundary starting point of the segment, the service initiates passive telemetry along the route. Conversely, upon reaching the virtual boundary endpoint, it sends active telemetry packets, which are aggregated and reported at the optimal reporting node within the segment. Then the service proceeds to transmit and enter the subsequent segment, repeating the process until it arrives at the target node.
However, the division of different virtual boundaries results in different reporting node selection results, necessitating the selection of the optimal solution from numerous results. Therefore, by considering the awareness time within segments, awareness overhead of segments, and the number of reporting nodes, an awareness evaluation function is constructed. The optimal telemetry and reporting solution for the path is determined by the minimum number of reporting nodes and their respective locations, as indicated by the awareness evaluation function across various scenarios involving different virtual boundary segmentations. The optimization objective function is:
where
,
, and
are the corresponding optimization weights, with
. The function of
is the normalization function.
is the average awareness time of segments.
is the average awareness overhead of segments. The specific calculation is as follows.
The telemetry and reporting time for each segment after selecting the optimal reporting node based on the APINT model are
. Then, the average telemetry and reporting time for the segments along the entire path is:
Meanwhile, the identifiers of the nodes within each segment are denoted as
, and the awareness overhead generated by passive and active telemetry within the segment is as follows:
where
and
, respectively, represent the lengths of the passive telemetry header and the primary telemetry header.
represents the awareness overhead generated during the metadata’s transmission process of node
within the segment, and its calculation equation is as follows:
where
is the length of the metadata generated by node
, and
represents the number of hops required for node
to reach the reporting node.
4. A Multi-Reporting Nodes Selection Algorithm Based on Lightweight Q-Learning for Various Services
The dynamic selection method for multiple reporting nodes can determine the optimal reporting solution for any service path. However, differences in path hops, distances of ISLs, and available statuses of SGLs among various services further enhance the dynamism and complexity of multi-reporting nodes selection in concurrent multi-service scenarios. Therefore, it is necessary to develop a dynamic solving algorithm to select the optimal reporting solution for various service scenarios. The difficulty of this problem lies in not only adaptively adjusting to the unique characteristics of different service paths, but also reducing the computational complexity of solving the reporting solution within each service path.
Traditional mathematical solving algorithms, such as dynamic programming and linear programming methods, rely on precise environmental models. For multi-service scenarios, such methods necessitate continuous re-modeling and are incapable of facilitating dynamic strategy updates. Machine learning and other artificial intelligence algorithms require substantial data support and lack proactive exploration mechanisms. Q-learning is a model-free reinforcement learning algorithm based on finite Markov decision processes that learns directly from interactions with the environment of uncertain or unknown settings [
22]. It is particularly suitable for solving dynamic and complex problems and supports the timeliness of the decision-making process.
In response to the above problems, the Q-learning algorithm is introduced to dynamically solve the dynamic selection problem of multiple reporting nodes in multiple service scenarios. However, during the dynamic partitioning of virtual boundaries, the Q-learning algorithm may encounter the problem of non-adjacent virtual boundary combinations, meaning that the virtual boundaries are discontinuous and the segments fail to achieve comprehensive coverage of nodes along the path. The Q-learning-based dynamic optimization algorithm traverses the effects of all potential actions on state and generates the corresponding -value table. As the number of virtual segments grows, the scope of search and calculation exhibits an exponential expansion, posing a significant challenge to the limited computational capabilities of satellite.
A Q-learning algorithm based on virtual boundary preplanning is proposed, which can optimize the Q-learning search process and reduce the number of iterations. The algorithm flow is shown in
Figure 4. Specifically, considering that the flexible selection method for multiple reporting nodes requires awareness of all nodes, and the APINT model needs to start telemetry from the starting points of the virtual boundaries of the segments separately, the virtual boundaries of the sub segments must be adjacent, with no nodes present between them. Meanwhile, the APINT model can expand the awareness range, and to enhance the average telemetry duration of segments, the lengths of these segments are relatively similar. Thus, the equal division method for virtual boundary initialization division is used. The number of satellites within the path is
, and the satellite number of each segment is
, with
. Then, starting from the end of the first segment and the starting point of the second segment, the boundary nodes float back and forth within a certain range
, where
is the range adjustment factor, which can dynamically adjust the virtual boundary. The process is repeated, and each segment boundary can float based on
. Considering that the number of nodes within the segment after dynamic adjustment is:
when
is fixed,
, which means
is monotonically decreasing with respect to
. Therefore, to ensure the search scope and maximize both the floating range and the number of segment nodes, the floating range can be set as follows:
At the same time, by dynamically adjusting the number of reporting nodes
, repeating the virtual boundary partitioning and dynamic floating process mentioned above, the optimization of the Q-learning search process can be achieved. Taking the division of the virtual boundary as state
. The increase or decrease in the number of segments and the changes to the virtual boundaries are regarded as the action space
. The reward function based on the awareness evaluation function can be constructed as:
where
is the awareness evaluation function under the new state
, where
is the regularization penalty coefficient, and
represents the number of segments and the division of virtual boundaries in the new state. The
-values are updated using
, and the update formula is as follows:
where
is the learning rate;
is the update rate, and
,
. The pseudocode of the algorithm is presented in Algorithm 1:
Algorithm 1: Dynamic selection of multiple reporting node based on the improved Q-learning |
Input: Number of services, ; transmission path of each service, ; maximum number of segments for each service, ; 1 for i 1 to do 2 for j 1 to do (adjust the number of segments) 3 Divide equally into segments with virtual boundary ; |
4 Calculate floating range with Equation (15), and the number of potential combinations of virtual boundary is ; 5 for k 1 to do 6 Adjust the virtual boundary with to generate new , which is the state 7 for h 1 to do (using APINT model for each segment) 8 Find potential reporting nodes within segment using Equation (6); 9 Identify the optimal reporting nodes in subsegment based on Equations (8) and (9), which is the action ; 10 Calculate telemetry and reporting time, as well as awareness overhead within subsegment; 11 end 12 Calculate the awareness evaluation function as the reward based on Equation (10); 13 Update the Q value table by using Equation (17); 14 end 15 Find the best virtual boundary division range and the best reporting node location; 16 end 17 Compare the maximum Q-values associated with the optimal virtual boundary divisions in scenarios with different number of segments; 18 Output the optimal reporting nodes’ number, location, awareness time, awareness overhead of current service; 19 end Output: Generate optimal reporting node selections for various services. |
5. Simulation Results and Analysis
In this section, the proposed DRNSM-APINT method will be evaluated through simulation methods. A network of 1584 satellites has been simulated, with the structure similar to the first shell of Starlink, at an average orbital altitude of 550 km. There are 72 orbital planes, each containing 22 satellites. The network comprises 300 traffic services, with an average path length of 20 hops. The telemetry packet is placed at the head of the service packet and the telemetry header has a fixed size of 32 bytes. The maximum cumulative hop for telemetry data is 8 hops. Each node can insert 4 to 8 bytes of metadata, with an insertion time of 0.01 ms, and the propagation time for the active telemetry packet is uniformly set to 0.08 ms.
To the best of our knowledge, limited studies have been conducted on satellite network measurement and selection of reporting nodes. Therefore, this paper adopts the latest methods in terrestrial network awareness research for application in satellite network scenarios, and compares them with the methods presented in this paper.
- (1)
Flexible and active network telemetry scheme (FAINT): telemetry packets can be actively gathered and measured from nodes along the path, as specified, through the utilization of programmable technology [
23].
- (2)
INT method for segmented routing (SRINT): hop-by-hop awareness of each node along the path utilizing the INT method [
20].
- (3)
Energy-efficient data aggregation method (EEDA): the nodes are selected where there is relatively high remaining of bandwidth on the SGLs for reporting [
24].
Figure 5 shows a comparison of the awareness time among the FAINT, SRINT, EEDA, and APINT model at different length of segments. From the figure, it can be seen that due to the FAINT method initiating active telemetry from the reporting node and requiring round-trip collection, the awareness time exhibits a notable increase as the number of hops grows, and is significantly higher than that of the other methods. The SRINT method employs INT to reduce the awareness time, but the awareness time remains relatively higher under the mechanism of target node reporting. The EEDA method selects nodes with abundant resources for reporting, thereby sacrificing a portion of the awareness time. The awareness time of the APINT method is notably shorter than that of the other comparative methods, and the increase in awareness time remains steady as the hop increases. When the hop length is 9, the awareness time of the APINT model is reduced by 34.5% compared to FAINT, 13.7% compared to SRINT, and 8.4% compared to the EEDA method, indicating a significant improvement in awareness timeliness.
Figure 6 illustrates the comparison of awareness evaluation function values (EFV) and the awareness overhead under varying numbers of reporting nodes based on the DRNSM-APINT method. As can be seen from
Figure 6a, when the path length is less than 10, the EFV corresponding to a single reporting node scenario is smaller. The DRNSM-APINT will prioritize selecting the optimal single reporting node for reporting.
Figure 6b illustrates that within the path length interval of 11 to 16, the EFV is minimal in the scenario with two reporting nodes, followed by the scenario with three reporting nodes. And the EFV under single reporting node scenario is the highest. The DRNSM-APINT will prioritize selecting two reporting nodes. From
Figure 6c, it can be observed that when the length of path further increases from 18 to 30, the DRNSM-APINT will give priority to deploying three reporting nodes. The corresponding EFV is smaller compared to scenarios where there are two or four reporting nodes. To sum up, the DRNSM-APINT can flexibly and dynamically adjust the number of reporting nodes based on varying service path lengths, and determine the optimal location of the reporting nodes in corresponding scenarios.
Figure 6d shows the awareness overhead associated with various reporting node scenarios across different path lengths. The graph reveals that an increase in the number of reporting nodes leads to a reduction in segment length, subsequently decreasing the awareness overhead caused by accumulation of awareness data within the segments. However, it is noticeable that when there are three or four reporting nodes, the difference in awareness overhead between the two scenarios is small, indicating that the same awareness effect can be achieved by reducing the number of reporting nodes.
Figure 7 shows the performance comparison of the optimal reporting scheme selected by the DRNSM-APINT and other different reporting node selection algorithms. The FHINT method, based on the official INT specification [
11], conducts periodic reporting after a fixed number of hops. The SRINT and EEDA method are the same as before.
Figure 7a presents a comparison of EFV across four methods. Specifically, the DRNSM-APINT algorithm selects varying numbers of reporting nodes across different paths. Therefore, the EFV values presented in the figure represent the optimal solutions achieved by the DRNSM-APINT algorithm in various scenarios. From the figure, it can be seen that the FHINT method has the highest EFV, while the SRINT and EEDA methods have smaller EFV. The EFV of DRNSM-APINT algorithm is lower than that of the comparison algorithm under various path scenarios. Taking the 20 hops as an example, DRNSM-APINT can reduce EFV by 17.1% compared to FHINT, 12.3% compared to SRINT, and 8.2% compared to EEDA. This indicates that the DRNSM-APINT method can maintain better awareness performance across different path scenarios.
Figure 7b compares the average awareness time of segments obtained through different methods. It can be seen that the awareness time reported periodically by the FHINT method is generally large, but the fluctuation is small. The SRINT method is limited by the size of the onboard awareness data, resulting in differences in the number of service hops and significant fluctuations in awareness time. EEDA aims to maximize reporting capability, which may result in an increase in some segment links, thereby increasing awareness time. The awareness time of DRNSM-APINT is significantly lower than other compared methods. Taking the 20 hops as an example, DRNSM-APINT reduces awareness time by 17.4% compared to FHINT, 12.4% compared to SRINT, and 13% compared to EEDA, indicating that the proposed method can significantly reduce awareness time. Meanwhile, the DRNSM-APINT has relatively small time fluctuations, indicating that the algorithm has good robustness in complex dynamic scenes.
Figure 7c presents a comparison of the awareness overhead associated with various methods. From the figure, there is no significant difference among the four methods when the service path is short. However, as the path length increases, the overhead of the FHINT method gradually increases, while the overhead of the SRINT and DRNSM-APINT algorithms is relatively close, and the EEDA method is the smallest. This is because DRNSM-APINT adopts active-passive joint telemetry method, which introduces additional telemetry packet headers and increases the awareness overhead. EEDA sacrifices some awareness time to achieve low overhead measurement and reporting, while DRNSM-APINT still has lower overhead than the other two methods.
Figure 7d compares the total number of reporting nodes selected by the four methods across varying numbers of services. From the figure, it can be seen that the number of reporting nodes of EEDA is the largest. This is due to the fact that the method strives to select as many idle nodes with abundant resources as possible for reporting, leading to a substantial and fluctuating number of reporting nodes. The number of reporting nodes for FHINT and SRINT methods is relatively small. The DRNSM-APINT can select fewer reporting nodes than the other comparison methods. When the number of services is 300, the total number of reporting nodes in DRNSM-APINT decreases by 34% compared to FHINT, 20.1% compared to SRINT, and 42.8% compared to EEDA, which can significantly optimize the number of reporting nodes.
Figure 8 presents a comparison of the number of combinations of virtual segment divisions using various methods. The combinations of virtual segment divisions affect the calculation complexity of DRNSM-APINT. The traditional solving algorithm traverses all possible segment partitions along the path, encompassing scenarios where segments are discontinuous. Therefore, with the growth of the path, the number of partition combinations grows exponentially. In this case, the number of combinations can be calculated as the second type of Stirling number [
25], and the calculation equation is
, where
is the number of satellites and
is the number of reporting nodes. The Q-learning algorithm also traverses all the possible outcomes, but it can optimize the problem into a combinatorial planning task of continuous segments division based on the reward function. The calculation formula of combination quantity is
. The dynamic partitioning of continuous segments division is conducted using a heuristic algorithm, resulting in a further reduction of the number of combinations to
, but it is easy to getting trapped in local optimal solutions. The lightweight Q-Learning algorithm based on virtual boundary preplanning that proposed in this paper, combined with the APINT model’s feature of being less influenced by the segment range in selecting reporting nodes, can markedly decrease the number of combinations. Compared with Q-Learning and heuristic algorithm, this method can reduce the number of combinations by one or two orders of magnitude. Meanwhile, considering that the figure only illustrates the calculation complexity involved in selecting reporting nodes within a single service path, the algorithm proposed in this paper demonstrates significant lightweight capabilities in massive service scenarios, alleviating the computational burden in large-scale satellite constellation.