1. Introduction
Jamming resource scheduling is a core technology in communication countermeasures, aiming to efficiently accomplish jamming tasks while conserving jamming resources [
1,
2,
3]. In air–ground dynamic communication countermeasures, the number of communication nodes and jammers is large, and their spatial layout is complex. Communication parties exhibit sophisticated wireless links, strong adaptive power adjustment, and robust self-organizing network capabilities, which pose challenges to the system operating duration required for comprehensive communication jamming.
To disrupt air–ground communication network systems, jammers must apply dynamic jamming to all communication-receiving nodes. Unlike centralized, high-power jammers, distributed jammers feature spatially dispersed deployment and limited energy; their power synthesis and cooperative jamming rely more heavily on resource scheduling. Physical-layer security (PLS) techniques have emerged as effective approaches to address security challenges in wireless communication systems, including UAV communications and cooperative jamming [
4]. Studies such as Hamamreh and Arslan’s comprehensive survey [
5] have classified PLS techniques into signal-to-interference-plus-noise ratio-based and complexity-based approaches, providing a foundational framework for designing secure communication and jamming strategies. Specifically, artificial noise injection and channel adaptation techniques proposed in PLS research [
5,
6,
7] offer valuable insights for optimizing jamming resource allocation, as they leverage wireless channel characteristics to enhance the effectiveness of intended signals (or jamming signals) while suppressing unintended receivers (or communication nodes). Additionally, UAV communication security studies [
4,
8] highlight the unique challenges of airborne communication scenarios, such as dominant line-of-sight propagation and high mobility, which are highly relevant to air–ground joint jamming and require tailored resource scheduling strategies. Novel physical-layer key generation methods [
9,
10] that exploit subcarrier indices and channel gain characteristics also inspire innovations in jamming effect estimation and resource coordination for distributed jammers.
Existing research on resource scheduling has predominantly focused on radar jamming [
11,
12,
13,
14,
15,
16,
17,
18,
19], yet their mathematical models and scheduling strategies are not applicable to communication jamming. Within communication jamming studies, greater attention has been paid to static resource scheduling problems. Progress has been achieved through conventional optimization approaches such as global search algorithms [
20], convex optimization theory [
21], various genetic algorithms [
22,
23,
24], intelligent optimization algorithms [
25,
26,
27,
28], and knowledge-based Bayesian neural network algorithms [
29]. These methods establish mathematical models for communication countermeasures by taking communication links, jamming power, and jamming frequency bands as optimization objectives, and seek optimal jamming solutions within the solution space. Most of these studies start from aspects such as jamming patterns, targets, and power, and explore optimal jamming strategies by constructing adversarial models between communicators and jammers. However, when dynamically scheduling jamming resources, the aforementioned methods suffer from shortcomings such as long search time and poor effectiveness. Path-loss-based prioritization has been widely used as a heuristic in wireless communication and cooperative jamming [
30]; for example, Deng et al. [
30] proposed a resource hopping mechanism in OTFS-SCMA systems that leverages channel characteristics to suppress jamming, demonstrating the effectiveness of path loss-related strategies in dynamic scenarios. A notable innovation of this work is its integration of resource hopping with OTFS’s delay-Doppler domain advantages, which provides a new perspective for dynamic jamming resource scheduling. Building on this idea, this paper extends path-loss-based scheduling to the distributed air–ground joint jamming domain, addressing the unique challenges of multi-node coordination and dynamic power adjustment that are not fully covered in existing studies.
The rapid development of machine learning technologies has facilitated research on communication countermeasure algorithms [
31,
32,
33,
34]. Nevertheless, the difficulty in acquiring communication countermeasure datasets has limited the application of deep learning techniques [
35]. Reinforcement learning algorithms can autonomously interact with the environment without prior information to learn optimal strategies, thus being widely used in dynamic resource scheduling [
36,
37,
38,
39,
40,
41,
42,
43]. However, most of these studies assume that jammers can directly and accurately obtain jamming effects. From a practical scenario perspective, two limitations exist: first, they overlook implementation approaches (e.g., jamming index estimation and communication quality assessment); second, they fail to consider counter-jamming strategies adopted by communicators, such as power adjustment and channel switching [
44,
45]. Proactive eavesdropping and monitoring strategies, such as the UAV-based scheme proposed by Mobini et al. [
46], have also explored dynamic resource optimization in wireless systems, but their focus on information collection rather than jamming resource scheduling leaves gaps in addressing air–ground joint jamming requirements.
To address such issues, some scholars have modeled the communication countermeasure process as a Markov Decision Process (MDP) based on the decision-making principles of communication jamming and proposed corresponding jamming methods [
47]. They use channel alignment as an evaluation metric for effective jamming [
48], employ Q-learning algorithms [
49] to predict changes in communication channels, and develop jamming effectiveness metrics by integrating fundamental jamming principles and variations in communication target behaviors [
50]. However, these jamming effect evaluation methods remain inadequate for air–ground joint jamming. They target independent and deterministic bidirectional communication links, assuming that jammers can quickly decipher communication link information and identify both the transmitting and receiving ends of communications. In complex communication networks, each communication node receives signals from multiple links, making it difficult for jammers to identify specific communication links. Notably, the aforementioned studies do not take the maximum received power of communication nodes as the jamming target.
The optimization objectives of the aforementioned algorithms focus on physical-layer parameters (e.g., jamming patterns and power) for single jammers, and their objective functions or reward functions rarely involve distributed jamming resource scheduling. These methods rely on heuristic iterative search strategies of algorithms or pre-trained neural networks in machine learning, failing to leverage information such as communication jamming scenarios and electromagnetic propagation characteristics to reduce decision dimensionality. By simply migrating relevant algorithms, they lack research on distributed communication jamming strategies and generally suffer from slow convergence and susceptibility to local optima.
References [
51,
52,
53,
54,
55] have adopted techniques such as multi-agent reinforcement learning and hierarchical reinforcement learning to study a small number of high-power, long-range ground-deployed communication jammers. Reference [
56] designed a proximal policy optimization algorithm to address the problem of scheduling jamming resources for a small number of airborne mobile jammers. However, the application scenarios of these algorithms do not involve a large number of communication nodes or jammers, and they pay little attention to the coordination between airborne and ground jammers.
In summary, after the static deployment of jammers, existing studies consider limited environmental information and exhibit poor practicality. They cannot rapidly perform dynamic jamming resource scheduling under conditions of sparse scenarios, multi-dimensionality, and incomplete prior information, leading to waste of jamming resources and degradation of jamming effectiveness. The reasons are as follows:
The training and actual deployment of algorithms such as reinforcement learning and deep reinforcement learning are separated. When facing unknown and dynamic communication power scheduling strategies, first, the rapid decision-making of algorithms relies on long training durations and known air–ground communication scheduling strategies, making them unable to adapt to flexible and changeable battlefield environments; second, after jammers are deployed, multiple interactions with the environment are required, making it impossible to quickly achieve the goal of comprehensive communication jamming and rendering them unsuitable for jammers with limited power.
Communication jammers can identify information about communication nodes and jammers from the electromagnetic environment and locate the source direction of signals. However, there is a lack of specific algorithmic strategies for distributed jammers to use reconnaissance information for environmental cognition and jamming effect evaluation.
There is a scarcity of mathematical models suitable for air–ground joint jamming, as well as strategies for jamming power superposition and operational timing scheduling.
To address the above issues, the contributions of this paper are as follows:
Based on the requirements of high-speed communication countermeasures, a deterministic power scheduling strategy is adopted, eliminating the need for complex round-by-round iterative searches in intelligent algorithms and training strategies with poor interpretability in machine learning algorithms.
Based on communication information reconnoitered by jammers, a jamming effect evaluation strategy is designed by integrating mathematical estimation and changes in communication targets.
A simulation experiment is designed: based on differentiated electromagnetic propagation models, the traditional method of selecting jammers based on spatial distance is abandoned, and a strategy of selecting jammers by sorting transmission path loss in ascending order is proposed.
2. Communication Countermeasure Mode
The air–ground communication network system has self-organizing and encryption functions. It is difficult for the jamming side to obtain the transmitting and receiving ends of the communication link through reconnaissance [
57,
58,
59,
60,
61,
62]. Therefore, it is necessary to apply jamming to the maximum receiving power of communication nodes. In a complex electromagnetic environment, a single jammer has limited information and cannot obtain overall feedback for decision-making, which makes cooperation difficult, as shown in
Figure 1.
2.1. Mathematical Model of the Air–Ground Communication Network System
The path loss between aerial devices, as well as between aerial and ground devices, follows line-of-sight (LoS) propagation [
23]. In Equation (1),
represents the path loss of the LoS propagation model,
denotes the electromagnetic wave frequency (MHz),
is the environmental factor that varies with propagation conditions, and
stands for the LoS transmission distance (km).
The path loss between ground devices follows two-ray propagation [
28]. In Equation (2),
represents the path loss of the two-ray propagation model,
denotes the terrain influence exponent (which varies with propagation conditions),
indicates the two-ray transmission distance (km), and
and
correspond to the transmitter height and receiver height of the electromagnetic wave respectively (m).
In Equation (3),
represents the transmission power of the communication signal,
denotes the received power of the communication signal (dBW),
indicates the antenna gain of the communication transmitter in the direction of the communication receiver,
stands for the antenna gain of the communication receiver in the direction of the communication transmitter,
corresponds to the path loss of the communication transmission, and
represents the cable and connector loss at the communication receiver (all units in dB).
The received power at the communication receiver must satisfy the communication link margin requirement, taking into account the environmental noise power. In Equation (4),
represents the receiver sensitivity of the communication equipment,
denotes the environmental noise power, and
corresponds to the System Fade Margin.
2.2. Mathematical Model of Air–Ground Joint Communication Jamming
2.2.1. Jamming Range
Distributed communication jamming has the effect of power superposition. In order to reduce the calculation of weak jamming power, according to the environment and device performance, an upper limit
of the calculable jamming spacing is set. The communication jamming signal is line-of-sight communication, and the distance of the jammer should not be greater than the line-of-sight propagation distance
. In Equation (5), the jamming range for calculation between the
-th jammer and the
-th communication device is
, which takes the minimum value of the two distance limitations.
2.2.2. Quantized Jammer-to-Signal Ratio
Within the computable jamming range, the ratio of the total received jamming power (from both aerial and ground jammers) to the maximum communication received power is denoted as
. In Equation (6), if the
-th jammer is an aerial jammer, its jamming power is expressed as
, with being the total number of jammers
. If the
-th jammer is a ground jammer, its jamming power is denoted as
. Here,
represents the maximum received power from aerial communication devices for the
-th communication receiver, while
indicates the maximum received power from ground communication devices for the
-th communication receiver (all units in dBW).
To intuitively evaluate the impact of jamming signals on communication signals and facilitate calculation of the required jamming power for successful jamming, the jammer-to-signal ratio (JSR) needs to be normalized. If the power ratio of the jamming signal to the received communication signal is not less than the threshold
; the jamming is considered successful, the normalized
for the
-th communication device equals 1, as shown in Equation (7).
The air–ground communication network employs anti-jamming measures such as multi-hop relaying, requiring the jamming system to effectively disrupt received signals at all communication devices. In Equation (8),
represents the total number of communication devices. The equality holds if and only if comprehensive jamming is achieved across the entire system; otherwise, it fails to hold.
2.2.3. System Operating Duration
If comprehensive communication jamming cannot be achieved during system operation, the operational duration of the jamming system is considered terminated. The total operational duration of the jamming system is the sum of the durations of individual scheduling instances.
In Equation (9), after the
-th jammer undergoes the
-th scheduling,
is its operating duration this time, with the unit of hours (H).
is the battery power limit required for normal jamming,
is the battery operating voltage, with the unit of volt (V),
is the battery power consumed in this scheduling, with the unit of ampere-hours (AH),
is the jamming output power of the
-th jammer, and
is the power consumed by the jammer to maintain other functions, with the unit of watts (W).
2.3. Strategies of the Air–Ground Communication Network System
In the absence of jamming, the communication system’s initial power scheduling strategy employs reduced transmission power to maintain basic transmitting and receiving functions, achieving dual objectives of energy conservation and minimized electromagnetic signature exposure.
During signal transmission, communication devices select the path with minimal transmission loss and employ the lowest power level sufficient to meet communication requirements. In Equation (10),
denotes the minimum transmission power for the
-th communication device;
represents the path loss for communication with other devices.
When comprehensive communication jamming is achieved, the received power at all communication nodes becomes suppressed. In response, the communication system adaptively adjusts its transmission power based on the current conditions: it may increase power to counteract the jamming, cease transmission to evade jamming, or reduce power to maintain operations while confusing the jammer.
2.4. Air–Ground Communication Jamming Strategy
Intelligent jamming adopts two approaches for effect evaluation. First, based on the information reconnoitered by jammers, the JSR is estimated according to the positional relationship between jammers and communication devices, as well as electromagnetic wave propagation characteristics. Second, changes in communication parameters such as transmission rate, power, and channel are reconnoitered to determine jamming effectiveness. However, existing studies only introduce the above approaches in the background section and do not mention them in the algorithm design. There is a lack of effective strategies for processing reconnaissance information and estimating the JSR in “many-to-many” communication jamming scenarios.
This paper uses historical and real-time communication information reconnoitered by jammers to estimate parameters such as the source of communication signals, transmit power, receive power, and JSR, as detailed in Algorithm 1. The estimation of communication signal sources involves the following process: after jammers detect electromagnetic signals from the electromagnetic environment, they identify the direction of the signal source, compare it with the pre-stored positions of communication nodes and jammers, and determine whether the detected signals are communication signals. On this basis, the transmit power and receive power of communication nodes are estimated according to the spatial positional relationship between communication nodes and electromagnetic propagation characteristics. Jammers maintain unobstructed communication among themselves, and the jamming power of each jammer is known, enabling the estimation of the JSR at communication nodes. Compared with the simple judgment method that only relies on changes in communication state information, the proposed method—after estimating the jamming-to-signal ratio—can identify misleading information (e.g., communication parties deliberately reducing communication power to feign jamming effectiveness), resulting in a more comprehensive cognitive function.
denotes transmission power for the
-th communication device;
is the max received power of the communication signal, with the unit of watts (W).
| Algorithm 1: Cognitive and Jamming Effect Estimation Strategy |
| Input: The detected communication information |
| Output: Estimated communication and jamming effects |
| 1: | Select in order, for = 1 to |
| 2: | Select in order, for = 1 to |
| 3: | If the distance between and is less than |
| 4: | Store the azimuth angle and elevation angle information of relative to |
| 5: | End if |
| 6: | If the distance between and is less than |
| 7: | detected the source direction and power of |
| 8: | End if |
| 9: | Select in order, for = 1 to |
| 10: | Compare and , and infer the information of |
| 11: | | Estimate (round it to two decimal places using the method of rounding down for 0 and rounding up for 1) |
| 12: | Jammer Information Sharing |
| 13: | Select in order, for = 1 to |
| 14: | Update |
| 15: | Select in order, for = 1 to |
| 16: | Update (with an estimation margin of 0.3 dB) |
| 17: | Select in order, for = 1 to |
| 18: | Estimate |
| 19: | If |
| 20: | The communication jamming is successful |
| 21: | Else: |
| 22: | The communication jamming fails |
After estimating the status of the communication network system, TLOA is designed to leverage the advantages of short distance and power superposition. Assuming that all communication nodes operate at full power, aiming at the maximum received power of all communication devices, all jammers execute full-power jamming to achieve an overall jamming layout, which is a static layout. On this basis, TLOA performs dynamic jamming resource scheduling by randomly selecting communication nodes as jamming targets, preferentially using jammers with smaller transmission path loss based on the transmission path loss sorting to save energy, and using an estimation strategy to schedule the jamming power of jammers, avoiding the inefficiency of random search, as shown in Algorithm 2.
| Algorithm 2: Transmission Loss Order Algorithm |
| Input: Estimated communication information and real-time information of the interfering party |
| Output: New real-time information of the interfering party |
| 1: | Initialize: The power required of for successful jamming is , the path loss between and is |
| 2: | Select out of order, for = 1 to |
| 3: | Select in order, for = 1 to |
| 4: | If the distance between and is less than |
| 5: | Calculate , sort them in ascending order, and form an jamming list |
| 6: | End if |
| 7: | Select in order, for = 1 to |
| 8: |
|
| 9: | Traverse within the jamming list. |
| 10: | Estimate (the jamming power requirement of ) |
| 11: | If the battery power cannot guarantee |
| 12: | = 0 |
| 13: | Else: |
| 14: |
|
| 15: | If (upper limit of the jammer power) |
| 16: |
|
| 17: |
|
| 18: | Else: |
| 19: |
|
| 20: |
|
| 21: | If > 0 |
| 22: | If |
| 23: | The jamming fails, break |
| 24: | Else: |
| 25: |
|
| 26: | Else: |
| 27: | If |
| 28: | The jamming is successful, break |
| 29: | Else: |
| 30: |
|
In TLOA, the operations are carried out sequentially. First, traverse the jammers and then the communication nodes. Second, traverse the communication nodes and then the jamming list. Third, traverse the communication nodes twice.
In Algorithm 2, the jamming party adopts real-number encoding, which can represent continuous real values and reduces coding conversion errors when scheduling jamming resources. Transmission path loss ordering refers to sorting the transmission path losses between jammers and communication nodes in ascending order, and selecting jammers one by one according to this order. This strategy effectively solves the problems of heterogeneous jammer decision-making, jammer selection, and continuous power value scheduling, while reducing the decision dimension.
By using the above four algorithms, this thesis designs the confrontation scenarios of the air–ground communication network system and the air–ground joint jamming, as shown in Algorithm 3.
| Algorithm 3: Air–ground Communication Countermeasure Simulation |
| Input: Initial State of the Air-ground Communication Network System |
| Output: |
| 1: | Initial state scheduling of the air-ground communication network system |
| 2: | The jammer perceives the environment and schedules jamming resources first |
| 3: | Loop |
| 4: | The communication party adjusts its power after being jammed |
| 5: | The jamming party schedules its jamming resources |
| 6: | If the jamming is successful and the remaining battery power is sufficient |
| 7: | Continue the loop |
| 8: | Else: |
| 9: | Break |
| 10: | Calculate |
2.5. Conversion from a Dynamic-Stochastic Problem to a Static-Deterministic Problem
The air–ground joint jamming resource scheduling problem is inherently dynamic and stochastic, stemming from two key factors. On the one hand, communication systems adaptively adjust their transmission power based on actual scenarios. This time-varying power adjustment directly leads to continuous fluctuations in the received power of communication nodes, keeping the signal strength of jamming targets in a dynamic state. On the other hand, communication nodes switch dynamically between active and dormant states. Meanwhile, the uncertainty of environmental factors during electromagnetic propagation (such as environmental impact parameters in line-of-sight propagation scenarios) results in random characteristics in the scope of jamming target sets and path loss values, further increasing the complexity of the problem.
From the perspective of optimization objectives, the core of this problem is to maximize the effective operating duration of the entire jamming system while satisfying various constraints. These constraints mainly include the following: the output power of each jammer must be controlled within its minimum and maximum rated power range; the total energy consumption of all jammers cannot exceed the total energy limit of their on-board batteries; the jamming effect must meet the preset JSR threshold, meaning the ratio of jamming signal power to communication signal power at each communication node must reach the specified standard. Due to the aforementioned dynamic and stochastic factors, key parameters such as the JSR and effective jammer sets change over time, making the entire problem a time-varying optimization problem with stochastic constraints.
The core logic of TLOA in converting a dynamic-stochastic problem to a static-deterministic one lies in two design concepts: “cognitive certainty” and “decision-time decoupling”.
In terms of cognitive certainty of dynamic parameters, instead of tracking the dynamic changes in the received power of communication nodes in real time, the algorithm estimates the maximum possible received power of communication nodes by fusing historical reconnaissance data and real-time electromagnetic perception information. This approach is fully justified: the transmission power adjustment of communication systems is always limited by their maximum hardware-rated power (25 W for ground communication nodes and 10 W for airborne communication nodes), so the upper limit of their received power is deterministic. The maximum received power estimated based on the maximum transmission power can serve as a stable reference benchmark.
For the handling of random path loss, the algorithm uses the expected value of path loss instead of its random fluctuation value. This expected value is calculated through preset electromagnetic propagation models, where the environmental factor for line-of-sight propagation is fixed at 3, and the terrain influence exponent for two-ray propagation is fixed at 4. The rationality of this simplification is mainly reflected in two aspects: first, in short-term battlefield scenarios, environmental conditions are relatively stable (the power adjustment interval of the communication system is 90 s), so environmental factors will not change drastically; second, the random fluctuation range of path loss is extremely small (the fluctuation value corresponding to the variance is less than 1 dB), which is much smaller than the preset JSR threshold (4.77 dB), and its impact on the final jamming effect is negligible.
In terms of decision–time decoupling, the algorithm no longer performs separate optimization decisions on jamming power and activation status for each time slot. Instead, it designs a set of static jamming resource allocation schemes, including a fixed jammer activation set and a unified jamming power allocation strategy. This scheme must satisfy that the JSR of all communication nodes meets the preset threshold. The calculation of JSR is based on the estimated maximum received power and the expected value of path loss, thereby converting the originally time-varying dynamic optimization problem into a static-deterministic problem that can be solved in one go.
Regarding the proof of the effectiveness of the conversion strategy, first, consider the impact of communication power fluctuations. Since the actual transmission power of the communication system will never exceed its maximum rated power, the actual received power of communication nodes must be less than or equal to the maximum received power estimated by the algorithm. The JSR is inversely proportional to the received power; the smaller the actual received power, the larger the JSR. Therefore, the actual JSR must be greater than or equal to the JSR calculated based on the maximum received power. Second, consider the impact of path loss fluctuations. The greater the path loss, the more severe the attenuation of the jamming signal, so the most unfavorable scenario is when the path loss takes the maximum value. At this time, the JSR will decrease accordingly. However, since the TLOA reserves a 0.3 dB margin when designing the static solution, even considering the most unfavorable path loss fluctuations, the final JSR can still meet the preset threshold.
3. Simulation Experiment
3.1. Parameter Setting of the Simulation Scenario
The differences in the composition structure of the air–ground communication network system and the jammer layout will affect dynamic jamming resource scheduling. In response, this paper designs four simulation scenarios, as shown in
Figure 2,
Figure 3,
Figure 4 and
Figure 5 and
Table 1. The scenario design is not arbitrary but is strategically tailored to mimic real-world battlefield characteristics of “dynamic node activation” and cover key environmental variables, ensuring comprehensive verification of the TLOA’s effectiveness, robustness, and scalability.
In
Figure 2,
Figure 3,
Figure 4 and
Figure 5, the green triangles represent airborne communication nodes, the blue triangles denote ground communication nodes, the black pentagrams signify ground jammers, and the red pentagrams indicate airborne jammers. The horizontal axis and vertical axis represent spatial distance, with the unit of kilometers (km).
The key design logic of the four scenarios is as follows:
First, dynamic activation of deployed nodes mimics real-world communication networks where different air–ground communication nodes are activated at varying times based on mission requirements. This forms a time-varying air–ground communication network system, which directly alters the network’s anti-jamming capabilities (e.g., the number of active nodes influences the effect of jamming signal superposition and the self-organizing recovery capability of the communication network) and the characteristics of electromagnetic propagation paths (e.g., activation of airborne nodes increases line-of-sight links, while ground node activation enhances two-ray propagation dominance).
Second, heterogeneous jammer deployments (Scenarios 1 vs. 2) test the algorithm’s optimization performance when airborne jamming resources are adjusted—a common tactical adjustment in air–ground coordinated jamming missions.
Third, scaled node counts and density (Scenarios 3 vs. 4) validate the algorithm’s scalability: Scenario 4 increases the total number of communication and ground jammer nodes to simulate large-scale joint operations, introducing higher decision dimensionality and electromagnetic complexity to challenge the algorithm’s efficiency and robustness.
In the experiment, an Intel (R) Core (TM) i5-8300H CPU @ 2.30 GHz processor with 16.0 GB of RAM and an NVIDIA GTX1080Ti graphic card were used, and the environment of Anaconda 23.7.4 was selected for verification. The parameters of the simulation experiment are shown in
Table 2.
3.2. Results and Analysis of the Comparative Experiment
In this thesis, the TLOA is compared with DQN [
63], DDQN [
64], the probability mutation artificial bee colony algorithm (PMABCA) [
26], the simple random search algorithm (SRSA), and the genetic algorithm (GA) [
65]. After each algorithm runs 10 times, the average value is taken.
DQN approximates the action-value function (Q-function) using a deep neural network (DNN) and adopts experience replay to break the correlation between training samples, while DDQN mitigates the overestimation bias of Q-values by separating the target network (for calculating target Q-values) and the evaluation network (for selecting actions) [
63,
64]. PMABCA is an improved variant of the artificial bee colony (ABC) algorithm; it introduces a probability-based mutation mechanism to enhance global search capability. The algorithm divides bees into employed bees, onlooker bees, and scout bees: employed bees exploit food sources (candidate solutions), onlooker bees select food sources based on fitness values, and scout bees abandon poor-quality food sources (via probability mutation) to avoid local optima. It is widely used in continuous optimization problems such as resource scheduling [
26]. SRSA is a search algorithm with minimal computational complexity. It randomly generates candidate solutions within the feasible solution space without leveraging prior information or iterative optimization. Each solution is independently sampled, and the optimal solution is selected based on the fitness function. While simple to implement, it suffers from low search efficiency and is prone to missing global optima in complex high-dimensional scenarios. GA is a population-based evolutionary algorithm inspired by biological natural selection and genetic variation. It initializes a population of candidate solutions, and iteratively optimizes via three core operations: selection (reserving high-fitness individuals), crossover (combining genetic information of parent individuals), and mutation (randomly altering individual genes to maintain population diversity). It is suitable for complex combinatorial optimization problems but may face slow convergence in dynamic scheduling scenarios [
65].
In terms of computational complexity, the key characteristics of each algorithm are analyzed as follows:
DQN/DDQN: The time complexity is O(T × B × (D + H)), where T denotes the number of training steps (set to 1000 in this study), B is the batch size (32), D represents the state dimension (equal to the number of jammers), and H is the number of hidden layer neurons (64). The complexity is dominated by the forward and backward propagation of the deep neural network, and DDQN introduces an additional target network update overhead (O(U × (D + H)), U = 100 update frequency), which is negligible compared to the overall complexity.
PMABCA: Its time complexity is O(G × N × (D + Tlimit)), where G = 500 (maximum number of generations), N = 10 (population size), D = M, and Tlimit = 100 (food source trial limit). The complexity is higher than that of GA due to the additional trial count monitoring and probability mutation operations for scout bees.
SRSA: With a time complexity of O(S × D) (S = 1000 random sampling times, D = M), it exhibits the lowest complexity among all algorithms. This is because it avoids iterative optimization and only involves random sampling and fitness evaluation of candidate solutions.
GA: The time complexity is O(G × N × D), where G = 100 (maximum number of generations), N = 10 (population size), and D = M. The complexity is linearly related to the number of jammers, dominated by iterative selection, crossover, and mutation operations, as well as fitness evaluation for each individual in the population.
TLOA: The time complexity is O(n × m + mlogm), where n is the number of communication nodes and m is the number of jammers. The mlogm term originates from the sorting of transmission path losses, and n × m comes from traversing the jammer–communication node pairs. The overall complexity is linear with the scale of nodes and jammers, ensuring efficient operation in large-scale scenarios.
Table 3 presents the detailed parameter settings of all comparative algorithms to ensure the reproducibility of the experiment.
The TLOA can schedule continuous jamming power values, the number of jammers, and operational timing sequences—a capability absent in the compared algorithms. Therefore, the comparison algorithms are configured with fixed power levels and utilize all available jammers. Before executing the first jamming operation, distributed jammers have ample computation time, allowing extended decision-making duration for the algorithm’s initial deployment. However, since communication systems employ power-adjustment countermeasures and dynamic jamming requires high timeliness while aiming for prolonged comprehensive jamming effects, each subsequent scheduling cycle is constrained by significantly shorter time limits.
Reinforcement learning algorithms have been rarely studied for dynamic distributed communication jamming, lacking corresponding reward functions. Therefore, reward functions for DQN and DDQN are designed as shown in Equation (11).
In Equation (11), is the sum of the jamming powers of the current jammers, and is the sum of the upper limit values of the jamming powers of the jammers. A positive reward is obtained when the jamming is successful; otherwise, a negative reward is obtained.
In
Table 4, the comparison of first scheduling time demonstrates that TLOA, as a deterministic algorithm, exhibits robustness in solving problems through a single iteration. Compared with DQN and DDQN algorithms, it has the advantages of eliminating the need for training and iterative search for solutions.
Figure 6 shows the scheduling time per iteration. Compared with the benchmark algorithms, our proposed algorithm reduces the scheduling time by 42%, 28.5%, −42%, and 80% in the four scenarios, respectively. The results indicate that the scheduling time of DQN and DDQN reaches the predefined time limit. This is because reinforcement learning algorithms struggle to converge under small-sample and short-time conditions, often requiring multiple iterations to improve solution quality. In most scenarios, TLOA demonstrates significant advantages in decision-making speed, addressing the inefficiency issue of stochastic search strategies.
The “time of each decision” refers to the average time for the algorithm to complete one jamming resource scheduling iteration (including jammer selection, power adjustment, and jamming effect estimation). Its brevity is critical to adapting to the dynamic countermeasures of practical feasibility of distributed jammers. The comparison between Scenario 1 and Scenario 2 reveals that our algorithm maintains superior computational efficiency when dealing with different proportions of airborne and ground communication nodes. In Scenario 4, where the number of air–ground communication nodes increases and the network structure becomes more complex, the solution efficiency of intelligent optimization algorithms declines sharply due to the increased iteration number required to search for optimal solutions. In contrast, TLOA exhibits a much smaller increase in time cost, highlighting its robustness and scalability in complex environments.
Figure 6 illustrates the comparison of system operation duration. Compared with Scenarios 1, 2, 3, and 4, the algorithm in this chapter extends the operation duration by 41.6%, 102.3%, 82.2%, and 2827.3%, respectively. The comparison of the four scenarios indicates that the system operation duration of dynamic jammer scheduling depends on the static layout of jammers that can achieve overall communication jamming. In the complex Scenario 4, the benchmark algorithms fail to solve the problem quickly, resulting in a short system operation duration that cannot meet the continuous, dynamic, and overall jamming requirements of the jamming side, while TLOA operates stably and efficiently.
In air–ground joint jamming, the large number of communication nodes and jammers, wide power range, sparse scenarios, and complex signal propagation links lead to an increase in decision dimensions. In traditional centralized jamming and simple jamming scenarios, heuristic and traversal-based stochastic search strategies exhibit low efficiency. The lack of cognition about distributed communication jamming and power scheduling strategies causes iterative search in genetic algorithms, simple random search, and probability mutation artificial bee colony algorithms to be inefficient, making it difficult to quickly obtain feasible solutions in complex scenarios.
Reinforcement learning algorithms rely on the setting of reward functions and require training strategies and dynamic jamming strategies adapted to air–ground joint jamming. The existing research lacks such adaptations, leading to long training time, poor training effect, low search efficiency in dynamic scheduling, and low solution quality of reinforcement learning algorithms.
3.3. Results and Analysis of Experiments Under Non-Ideal Channel State Information
To address the concerns raised by the reviewers regarding channel state information (CSI) assumptions and simulation validity, we supplement the original experiments with a set of comparative tests under non-ideal CSI scenarios. Specifically, two key modifications are implemented to simulate practical wireless channel characteristics:
To further simulate the channel attenuation fluctuation in non-ideal scenarios, we modify the path loss calculation models by adding a random disturbance term in the range of −2 to 2 dB for both the LoS and two-ray propagation loss functions [
5]. The modified models are expressed as follows:
where
∼U(−2, 2) denotes the uniform random disturbance term for simulating non-ideal channel attenuation fluctuations.
Meanwhile, we adjust the algorithm constraint conditions: the benchmark algorithms (DQN, DDQN, PMABCA, SRSA, GA) are not forced to adopt fixed power levels and full jammer activation, while our proposed TLOA retains the advantages of continuous power control and adaptive jammer selection.
The performance comparison results of six algorithms under non-ideal CSI scenarios are shown in
Table 5. The evaluation metrics include three core indicators: first decision time, each decision time, and system operating duration.
It should be clarified that the original design of all involved algorithms (including TLOA, DQN, DDQN, PMABCA, SRSA, and GA) is based on the assumption of ideal CSI. Under non-ideal CSI scenarios with ±2 dB power estimation error and channel attenuation disturbance, the performance of all algorithms shows varying degrees of degradation compared with the ideal CSI scenario. This indicates that non-ideal CSI is a common challenge for interference optimization algorithms in wireless communication systems, rather than a defect specific to the proposed TLOA. Despite the performance degradation caused by non-ideal CSI, TLOA still maintains significant advantages over all benchmark algorithms in terms of decision latency and system operating stability.
Decision Latency: In all four test scenarios, TLOA’s first decision time (12.7 ms–21 ms) and each decision time (9.2 ms–18 ms) are 2–3 orders of magnitude lower than those of DQN and DDQN (1 s–10 s), and 1.2–8.1 times lower than those of PMABCA, SRSA, and GA.
System Operating Duration: TLOA maintains a stable operating duration of 0.05 H across all scenarios, which is lower than most benchmark algorithms (0.09 H–0.85 H). Even in Scenario 4 where DQN, DDQN, and GA achieve short operating durations (0.01 H), their decision latency is still far higher than TLOA, which means they sacrifice decision efficiency for shorter operating time.
The core reason for TLOA’s superiority is its continuous power control and adaptive jammer selection mechanism, which avoids the performance bottleneck caused by fixed power levels and full jammer activation in benchmark algorithms.
4. Conclusions and Future Prospects
This paper has established mathematical models for air–ground communication networks and air–ground joint communication jamming. It has also designed cognition and jamming effect estimation strategies to reduce the dimensionality of jamming decisions and evaluate jamming effects based on jammer reconnaissance information. The TLOA has been proposed to schedule the number of jammers, jamming power, and operation time slots by leveraging cognitive communication node information, eliminating redundant operations of stochastic search and inefficient training processes of reinforcement learning to accelerate algorithmic speed.
Simulation results demonstrate that compared with DQN, DDQN, stochastic search algorithms, genetic algorithms, and improved artificial bee colony algorithms, the TLOA exhibits superior capabilities in finding feasible solutions, faster jamming decision-making, and efficient resource allocation by exploiting propagation path loss differences. This approach conserves jamming power and extends system operation time.
In the future, it is possible to study the estimation strategies of the jamming path loss under complex terrains and the noise in the inhomogeneous electromagnetic environment [
66,
67]. In addition, in terms of communication countermeasure games, when the communication nodes are unknown, it is important to study the cognitive decision-making of the jammers to gradually detect and estimate the information such as the positions and types of the communication nodes [
68], and study the multi-agent strategies when the communication network between the jammers is disconnected. It is possible to combine the distributed communication countermeasure scenarios, design strategies to reduce the decision-making dimension, give full play to the advantages of reinforcement learning algorithms in game confrontation, and introduce intelligent algorithms such as deep learning and image recognition to enrich the methods for scheduling jamming resources [
69,
70].
This study admits that the performance of the proposed algorithm is affected by non-ideal CSI, which is a key limitation of the current work. In practical applications, the inaccuracy of CSI measurement will directly lead to the deviation of power control and jammer selection decisions. Therefore, the design of interference optimization algorithms robust to non-ideal CSI will be the core direction of our future research. The follow-up work will focus on integrating CSI estimation error compensation mechanisms into the algorithm framework, and verifying the algorithm performance through hardware-in-the-loop experiments to further improve the practical value of the research.