Sightless but Not Blind: A Non-Ideal Spectrum Sensing Algorithm Countering Intelligent Jamming for Wireless Communication

: Aiming at the existing intelligent anti-jamming communication methods that fail to consider the problem that sensing is inaccurate, this paper puts forward an intelligent anti-jamming method for wireless communication under non-ideal spectrum sensing (NISS). Under the malicious jamming environment, the wireless communication system uses Q-learning (QL) to learn the change law of jamming, and considers the false alarm and missed detection probability of jamming sensing, and selects the channel with long-term optimal reporting in each time slot for communication. The simulation results show that under linear sweep jamming and intelligent blocking jamming, the proposed algorithm converges faster than QL with the same decision accuracy. Compared with wide-band spectrum sensing (WBSS), an algorithm which failed to consider non-ideal spectrum sensing, the decision accuracy of the proposed algorithm is higher with the same convergence rate.


Introduction
In the past 20 years, wireless communication technology was widely used. However, due to the openness of the wireless channel, the challenge of artificial interference in wireless communication is becoming more and more serious. Artificial interference mainly includes two types. One is unintentional interference [1], and the other one is intentional jamming. Intentional jamming refers to the jamming behavior taken for the purpose of destroying the information transmission process of a wireless communication system. According to whether the strategy is fixed or not, artificial intentional jamming can be divided into fixed-strategy jamming and dynamic strategy jamming. Fixed-strategy jamming mainly includes multi-tone jamming, partial-band jamming, periodic-pulse jamming, linear-sweep jamming, etc., and its strategy is fixed and the law is easily perceived. Dynamic strategy jamming mainly includes dynamic probability jamming, intelligent blocking jamming, etc., and its jamming law is not easily obtained through simple observation or sensing.

Related Works
In recent years, the continuous development of machine learning algorithms provided new intelligent ideas for communication anti-jamming. In the frequency domain, the author in [2] modeled the problem of multi-channel jamming and anti-jamming as a Markov decision process (MDP). The best defense strategy is obtained through value iteration under the channel transition probability, and rewards are completely known. Similarly, in [3], the author modeled the game problem between the secondary user and the jammer in the cognitive radio system as MDP, using Q-learning and maximum likelihood estimation to obtain attacker parameters and obtain the optimal channel switching strategy. In [4], the author also modeled the jamming and anti-jamming process as MDP, and proposed a game • This paper proposes a NISS algorithm, which combines the advantages of Q-learning and the WBSS algorithm. The proposed algorithm has a fast convergence rate and high decision accuracy. • This paper takes the probability of false alarm and missed detection into account in antijamming communication for the first time, which is closer to the actual electromagnetic environment and fills the blank of intelligent anti-jamming wireless communication in the case of non-ideal sensing.
The remainder of this paper is organized as follows. Section 2 presents the system model and problem formulation. In Section 3, we introduce the detailed derivation of the NISS algorithm. The simulation results and analysis are presented in Section 4. Our concluding remarks are given in Section 5. ditional Q-learning [16] and the conventional wide-band spectrum sensing algorithm [14 through the MATLAB simulation results.

Contribution and Structure
The contribution of this paper is as follows: • This paper proposes a NISS algorithm, which combines the advantages o Q-learning and the WBSS algorithm. The proposed algorithm has a fast conver gence rate and high decision accuracy. • This paper takes the probability of false alarm and missed detection into account in anti-jamming communication for the first time, which is closer to the actual elec tromagnetic environment and fills the blank of intelligent anti-jamming wireles communication in the case of non-ideal sensing.
The remainder of this paper is organized as follows. Section 2 presents the system model and problem formulation. In Section 3, we introduce the detailed derivation of th NISS algorithm. The simulation results and analysis are presented in Section 4. Our con cluding remarks are given in Section 5.     Figure 2 shows the structure of communication time slot. Each jamming time slot corresponds to a communication time slot, which can be divided into transmission sub-slot, sensing sub-slot and learning sub-slot. In the transmission sub-slot, the transmitter selects an undisturbed channel to transmit information to the receiver according to the judgment of the previous time slot on the jamming channel. In the sensing sub-slot, the receiver senses each channel, and transmits the sensing results to the agent for learning. The agent obtains the judgment of the available channels of the same time slot in the next jamming mitter selects an undisturbed channel to transmit information to the receiver according to the judgment of the previous time slot on the jamming channel. In the sensing sub-slot, the receiver senses each channel, and transmits the sensing results to the agent for learning. The agent obtains the judgment of the available channels of the same time slot in the next jamming period. The transmitter selects the optimal channel for the communication transmission of the next time slot according to the judgment.

System Model
Jamming slot t1 Jamming slot t2

Problem Formulation
To simplify the research, the following assumptions are made:

•
The communication frequency band is divided into N channels with the same bandwidth, and there is no frequency overlap between the channels, and the fading characteristics of each channel are the same and flat fading.

•
The sensing result is only affected by false alarm and missed detection, which leads to inaccuracy, and there is no inaccuracy caused by other factors.

•
In the same time slot, the channel of jamming does not change.
Based on the above assumptions, since the agent cannot accurately perceive the state of the system, we use the improved Markov decision process (IMDP) for modeling and solving. IMDP can be expressed as a five tuple   Figure 3 is a state transition diagram between the actual state and the observed (sensing) state for channel i due to the existence of false alarm and missed detection. Where p i is the missed detection probability, indicating that the observation state is no jamming, while the actual state is jamming. q i is the false alarm probability, indicating that the observation state is jamming, while the actual state is no jamming.
the receiver senses each channel, and transmits the sensing results to the ag learning. The agent obtains the judgment of the available channels of the same t in the next jamming period. The transmitter selects the optimal channel for the c nication transmission of the next time slot according to the judgment.

Problem Formulation
To simplify the research, the following assumptions are made:

•
The communication frequency band is divided into N channels with th bandwidth, and there is no frequency overlap between the channels, and th characteristics of each channel are the same and flat fading.

•
The sensing result is only affected by false alarm and missed detection, whi to inaccuracy, and there is no inaccuracy caused by other factors.

•
In the same time slot, the channel of jamming does not change.
Based on the above assumptions, since the agent cannot accurately perceive of the system, we use the improved Markov decision process (IMDP) for model solving. IMDP can be expressed as a five tuple , , , , S A P O r , in which, in additio state space S , action space A , state transition probability P , and real-time function r of the general MDP, there is also observation space O .
The state space S can be expressed as:

Problem Formulation
To simplify the research, the following assumptions are made:

•
The communication frequency band is divided into N channels with the same bandwidth, and there is no frequency overlap between the channels, and the fading characteristics of each channel are the same and flat fading.

•
The sensing result is only affected by false alarm and missed detection, which leads to inaccuracy, and there is no inaccuracy caused by other factors.

•
In the same time slot, the channel of jamming does not change.
Based on the above assumptions, since the agent cannot accurately perceive the state of the system, we use the improved Markov decision process (IMDP) for modeling and solving. IMDP can be expressed as a five tuple S, A, P, O, r , in which, in addition to the state space S, action space A, state transition probability P, and real-time reward function r of the general MDP, there is also observation space O.
The state space S can be expressed as: where n k = j indicates that the k th channel to be interfered is channel j, and N channels can be interfered at most in the same time slot. Action space A can be expressed as: where a = i indicates that the transmitter chooses to communicate on channel i. Observation space O can be expressed as: where o i t = 0 and o i t = i, respectively, indicate that the observation status of channel i is undisturbed and disturbed in time slot t.
The reward function r is defined as follows: where r i t represents the reward function of channel i in time slot t, and −L represents the loss in case of message transmission failure. E represents the return of successful message transmission. s t represents the jamming status of each channel at time t, and s t ∈ S.
The observed state and actual state transition diagram of each time slot are shown in Figure 4.
where k n j = indicates that the th k channel to be interfered is channel j , and N channels can be interfered at most in the same time slot.
Action space A can be expressed as: where a i = indicates that the transmitter chooses to communicate on channel i . Observation space O can be expressed as: respectively, indicate that the observation status of channel i is undisturbed and disturbed in time slot t .
The reward function r is defined as follows:  Where the observed states in time slot t and time slot 1 t + can be obtained through wide-band spectrum sensing. At the same time, the false alarm probability f P and missed detection probability d P can be calculated according to the energy detection theory [17].
where u TW = is time domain bandwidth product, and γ is jamming to noise ratio (JNR), and λ is decision threshold of energy detection.

Detailed Derivation of Algorithm
Q-learning is a form of typical model-free learning. Its basic idea is to establish a Q table. The values in the table represent the long-term cumulative rewards of executing the current strategy after the state t s selects the action t a . The long-term cumulative reward can be expressed as follows: Where the observed states in time slot t and time slot t + 1 can be obtained through wide-band spectrum sensing. At the same time, the false alarm probability P f and missed detection probability P d can be calculated according to the energy detection theory [17].
where u = TW is time domain bandwidth product, and γ is jamming to noise ratio (JNR), and λ is decision threshold of energy detection.

Detailed Derivation of Algorithm
Q-learning is a form of typical model-free learning. Its basic idea is to establish a Q table. The values in the table represent the long-term cumulative rewards of executing the current strategy after the state s t selects the action a t . The long-term cumulative reward can be expressed as follows: where γ is the discount factor, indicating the importance of future returns, and r t is the immediate reward value obtained in step t. The goal of Q learning is to find a strategy π to maximize the long-term cumulative rewards under this strategy.
To solve the optimal strategy, the state value function V and the state action value function Q are defined as follows: Since the MDP model is satisfied, it can be converted into a recursive form as follows: Q π (s, a) = ∑ s ∈S P s |s, a r s |s, a + γV π s where P(s |s, a ) represents the probability of taking action a represents the probability of taking action in state s and transferring the state to s , and r(s |s, a ) represents the corresponding reward.
According to the Bellman optimization principle, the optimal value Q π * (s, a) can be obtained as follows [18]: Therefore, the optimal strategy π * can be obtained as follows: Since the Q-learning algorithm does not need prior knowledge, such as state transition probability, its update formula is as follows: where α is the learning rate. The reference [19] and [20] proved that if α meets the conditions: then the Q-learning algorithm can converge after finite iterations. When the Q table converges, the action corresponding to the maximum Q value in each state is the optimal action in that state. Wide-band spectrum sensing algorithm (WBSS) senses multiple channels in the same time slot, obtains the actual state of each channel (whether it is jammed) according to the sensing, and updates the Q value of each channel at the same time slot. Therefore, compared with the conventional Q-learning algorithm [16], which only updates the Q value of the selected channel in one time slot, the convergence rate of the WBSS algorithm will be greatly improved.
The Q value of the WBSS algorithm is updated as follows: where n i refers to different channels, and its value is {1, 2, 3, . . . , M}. That is, for any time slot, the Q values of the M channels are updated at the same time. r t represents the instant benefit of the time slot t + 1 selecting channel n i for communication. It can be seen from Equations (11) and (12) that to update the Q value and obtain the optimal strategy, it is necessary to know the state of the current time slot. However, in the actual electromagnetic environment, due to the inaccuracy of observation, the state of the current time slot is not a completely determined state, but there are multiple possible states related to the probability of false alarm and missed detection. Therefore, different from the conventional WBSS algorithm, the proposed algorithm takes the false alarm and missed detection probability into account when calculating the Q value and making decisions, obtaining the NISS algorithm.
Since there are different states, such as s 1 t = {1, 0, 0, . . . , 0} (only channel 1 is jammed) and s 2 t = {1, 2, 0, . . . , 0} (only channel 1 and channel 2 are jammed) for actions such as a = 1 (select channel 1 for communication), the communication results are the same and the benefits are the same. Therefore, we change the state from the set of all channel states as one state to the time slot as the state. When calculating the immediate return r, we only need to consider the actual state of the selected communication channel, not the actual state of other channels.
Then, for the time slots n t−1 and n t , where the system is located, o t−1 , o t ∈ O is observed. For each a t ∈ A, the Q value is calculated as follows: If the observation of the selected channel is jamming, that is a t ∈ o t−1 , update the Q value as Equation (18). (18) If the observation of the selected channel is no jamming, that is a t / ∈ o t−1 , update the Q value as Equation (19). Q(n t−1 , a t ) = Q(n t−1 , a t ) + α (1 − q i )r 1 +q i r 2 + γ max a t+1 ∈A Q(n t , a t+1 ) − Q(n t−1 , a t ) (19) For the time slot n t , the observation is o t . We obtain the actions as Equation (20).
Algorithm 1 is the flow of intelligent anti-jamming communication decision algorithm based on NISS.  Table 1. The Q table is initialized as a zero matrix with N T rows and M columns, that is, for any n T and a, let Q(n T , a) = 0. 2. for t = 1, 2, . . . T do 3. In the current transmitter state s t , the transmitter performs the optimal policy selection action a t obtained in the last timeslot or the initial action a 0 . 4. The transmitter detects the energy of each channel. 5. Calculate the probability of false alarm p f and missed detection P m according to the detection results. 6. According to the detection results, false alarm, and missed detection, the real-time reward r is calculated and the next state s t+1 is predicted to obtain the optimal communication channel a t+1 . 7. The agent updates the Q value according to (18) and (19). 8. The agent obtains the optimal strategy π * according to (20) and instructs the transmitter to transmit in the next time slot. 9. t = t + 1 10: end for First, initialize the system. Second, according to the last decision result, the optimal communication channel is selected. Third, the transmitter detects the energy of all channels. Fourthly, the false alarm probability and missed detection probability are calculated according to the energy detection results, the reward of each channel is obtained, and the Q table is updated. Finally, the optimal communication channel of the next slot is selected according to the Q table, and an iteration is completed. Table 1 shows the simulation parameters. According to the parameters in Table 1, the jamming noise ratio (JNR) can be calculated as 10 dB, and the time bandwidth product is 5, so we can calculate the false alarm probability as P f = 0.0549 and missed detection probability as P m = 0.1021 according to Equations (5) and (6). To evaluate the performance of this algorithm, this algorithm is compared with the traditional QL algorithm and WBSS algorithm.

Parameter Settings
In this paper, the effectiveness and universality of the algorithm will be verified by simulation from both fixed-strategy jamming and dynamic strategy jamming. The first is the fixed-strategy jamming, and the linear sweep jamming is selected as the research object. Figure 5 shows the time-frequency distribution of linear sweep jamming, and the red background indicates jamming. The jamming channel changes linearly at any time slot, and 10 time slots are a jamming period. In the same time slot, the jamming channel does not change.
The second is the dynamic strategy jamming, and the intelligent blocking jamming is selected as the research object. Intelligent blocking jamming refers to a jamming strategy in which the jammer selects the channels with the highest number of communications to interfere with the relative number of communications in each channel in the previous period of time. Figure 6 shows the probability distribution of intelligent blocking jamming. The number represents the jamming probability of the channel in that time slot, which is determined by the proportion of the communication times of each channel in the total communication time of the previous period. A jamming period is 10 time slots. the fixed-strategy jamming, and the linear sweep jamming is selected ject. Figure 5 shows the time-frequency distribution of linear sweep ja ackground indicates jamming. The jamming channel changes linear and 10 time slots are a jamming period. In the same time slot, the jam not change. The second is the dynamic strategy jamming, and the intelligen is selected as the research object. Intelligent blocking jamming refers egy in which the jammer selects the channels with the highest num tions to interfere with the relative number of communications in e previous period of time. Figure 6 shows the probability distribution of intelligent bloc number represents the jamming probability of the channel in that tim termined by the proportion of the communication times of each c communication time of the previous period. A jamming period is 10 t     Figure 7 is a comparison of the decision accuracy (ratio of successful transmission times to total communication time) of the proposed algorithm anti-linear sweep jamming with traditional Q-learning and wide-band spectrum sensing algorithms. The decision accuracy of traditional Q-learning converges to 100% after about 30 rounds of algorithm iteration. To accelerate the convergence of the algorithm, aiming at the problem that Qlearning only updates one Q value of the state action pair at a time, the WBSS algorithm senses the jamming states of all channels in each time slot, and updates the Q value of all actions at the same time in each state, which greatly accelerates the convergence rate of the algorithm. However, due to false alarm P f = 0.0549 and missed detection P m = 0.1021, the sensing results cannot be completely accurate. For example, in a certain time slot, the channel state perceived is free of jamming, but the sensing result may be caused by missed detection, and the actual channel may be jammed. Therefore, the WBSS algorithm takes the sensing result directly as the actual state of the system and does not consider the impact of false alarm and missed detection on the sensing result. Its accuracy after convergence is only 90%, which is lower than the Q-learning algorithm. By taking the false alarm and missed detection probability into account, the inaccuracy of the sensing results and the decision of the optimal channel is more accurate and reasonable. Therefore, the decision accuracy of the NISS algorithm for the channel of linear sweep jamming can also reach 100%.

Result Analysis
Q-learning only updates one Q value of the state action pair at a time, the W rithm senses the jamming states of all channels in each time slot, and updates th of all actions at the same time in each state, which greatly accelerates the con rate of the algorithm. However, due to false alarm 0.0549 f P = and missed =0.1021 m P , the sensing results cannot be completely accurate. For example, in time slot, the channel state perceived is free of jamming, but the sensing resu caused by missed detection, and the actual channel may be jammed. Therefore, algorithm takes the sensing result directly as the actual state of the system and consider the impact of false alarm and missed detection on the sensing result. Its after convergence is only 90%, which is lower than the Q-learning algorithm. the false alarm and missed detection probability into account, the inaccura sensing results and the decision of the optimal channel is more accurate and re Therefore, the decision accuracy of the NISS algorithm for the channel of line jamming can also reach 100%.  Since the actual jamming cannot be such regular linear sweep jamming, to verify the applicability of the algorithm under complex jamming patterns, the simulation verification against intelligent blocking jamming is carried out, and the results are shown in Figure 8. nics 2022, 11, x FOR PEER REVIEW Since the actual jamming cannot be such regular linear sweep jamming, to applicability of the algorithm under complex jamming patterns, the simulatio tion against intelligent blocking jamming is carried out, and the results are Figure 8. As can be seen from Figure 8, since the intelligent blocking jamming inter different channels in each time slot with probability, it is impossible to fully p jamming channel of the next time slot. Therefore, even with Q-learning, the de curacy is only 80%.
From the results of Figures 7 and 8, we can see that the NISS algorithm can faster than Q-learning with the same decision accuracy, and has higher decision than the WBSS algorithm with the same convergence rate. As can be seen from Figure 8, since the intelligent blocking jamming interferes with different channels in each time slot with probability, it is impossible to fully predict the jamming channel of the next time slot. Therefore, even with Q-learning, the decision accuracy is only 80%.
From the results of Figures 7 and 8, we can see that the NISS algorithm can converge faster than Q-learning with the same decision accuracy, and has higher decision accuracy than the WBSS algorithm with the same convergence rate. Figure 9 is a comparison diagram of the change rule of successful transfer rate with JNR when the false alarm probability of the NISS algorithm and the WBSS algorithm is P f = 0.0549. It can be seen from Figure 9 that with the increase in JNR, the decision accuracy of both algorithms increases. When JNR is taken from 2dB to 12dB, the decision accuracy of the NISS algorithm is significantly higher than that of the WBSS algorithm, which proves that the performance of the NISS algorithm is better than that of the WBSS algorithm. It should be noted that when JNR is low, the reason why the decision accuracy of the two algorithms is similar is that it is difficult to distinguish whether the channel contains jamming due to the low JNR. When JNR is very high, both decision accuracies reach the same maximum. The reason is that the energy of the jamming signal is very strong, the probability of missed detection is almost negligible, and the observation accuracy is only related to the probability of false alarm. The probability of false alarm is equal, so the decision accuracy is equal. The results show that the performance of NISS is greater than WBSS at the medium JNR.
From the results of Figures 7 and 8, we can see that the NISS algorithm can faster than Q-learning with the same decision accuracy, and has higher decision than the WBSS algorithm with the same convergence rate. Figure 9 is a comparison diagram of the change rule of successful transfer JNR when the false alarm probability of the NISS algorithm and the WBSS alg 0.0549 f P = . It can be seen from Figure 9 that with the increase in JNR, the de curacy of both algorithms increases. When JNR is taken from 2dB to 12dB, the accuracy of the NISS algorithm is significantly higher than that of the WBSS a which proves that the performance of the NISS algorithm is better than that of t algorithm. It should be noted that when JNR is low, the reason why the decision of the two algorithms is similar is that it is difficult to distinguish whether th contains jamming due to the low JNR. When JNR is very high, both decision a reach the same maximum. The reason is that the energy of the jamming sign strong, the probability of missed detection is almost negligible, and the observ curacy is only related to the probability of false alarm. The probability of false equal, so the decision accuracy is equal. The results show that the performance greater than WBSS at the medium JNR.   Figure 10 is a channel decision diagram of the WBSS algorithm against intelligent blocking jamming. The vertical coordinate indicates the channel serial number, the horizontal coordinate indicates the current communication slot. The green area indicates that the current channel is predicted to be jammed, but there is no jamming in the actual channel. The light red area indicates that the current channel is jammed but was not predicted. The dark red area indicates that the current channel has jamming and was successfully predicted. For the WBSS algorithm, when the algorithm converges, some channels will be jammed, but the agent was not successfully predicted. If the channel is selected for communication at this time, the communication will fail and cause great losses. Figure 11 is a channel decision diagram of the proposed algorithm against intelligent blocking jamming. Compared with Figure 10, after the algorithm converges, the NISS algorithm in this paper can accurately judge those channels that are actually jammed, and there is no missed detection. Although there will be no jamming but predicted jamming, the loss caused by this false alarm is very small. Therefore, the algorithm proposed in this paper can effectively solve the problem of wireless communication intelligent anti-jamming in the case of non-ideal spectrum sensing.
Comparing the performance of the algorithm in the anti-jamming channel decision under the two jamming patterns, we can see that: Although the accuracy of the Q-learning algorithm is high, the convergence rate of the algorithm is slow. For jammers with dynamic jamming patterns, they may not be able to learn the jamming rules in a short time and make effective anti-jamming decisions. In the anti-jamming channel decision of the WBSS algorithm, although the convergence rate of the algorithm is much higher than that of Q-learning, there is a major defect in the sensing algorithm, that is, due to the problems of false alarm and missed detection, the sensing results are not necessarily accurate, so the accuracy of the jamming channel decision after convergence is low. The NISS algorithm is improved on the WBSS algorithm. By taking the sensing inaccuracy caused by false alarm and missed detection probability into account and the confidence of the actual channel state, it more accurately describes the current channel state in the time slot. Therefore, the NISS algorithm has the same convergence rate as the WBSS algorithm and does not lose the accuracy of the jamming channel decision.
tronics 2022, 11, x FOR PEER REVIEW 12 o Figure 10 is a channel decision diagram of the WBSS algorithm against intellig blocking jamming. The vertical coordinate indicates the channel serial number, the h izontal coordinate indicates the current communication slot. The green area indicates t the current channel is predicted to be jammed, but there is no jamming in the act channel. The light red area indicates that the current channel is jammed but was predicted. The dark red area indicates that the current channel has jamming and w successfully predicted. For the WBSS algorithm, when the algorithm converges, so channels will be jammed, but the agent was not successfully predicted. If the channe selected for communication at this time, the communication will fail and cause gr losses.  Figure 11 is a channel decision diagram of the proposed algorithm against intellig blocking jamming. Compared with Figure 10, after the algorithm converges, the N algorithm in this paper can accurately judge those channels that are actually jammed, a there is no missed detection. Although there will be no jamming but predicted jammi the loss caused by this false alarm is very small. Therefore, the algorithm proposed in paper can effectively solve the problem of wireless communication intelligent ti-jamming in the case of non-ideal spectrum sensing. Comparing the performance of the algorithm in the anti-jamming channel decis under the two jamming patterns, we can see that: Although the accuracy of Q-learning algorithm is high, the convergence rate of the algorithm is slow. For jamm with dynamic jamming patterns, they may not be able to learn the jamming rules i short time and make effective anti-jamming decisions. In the anti-jamming channel d predicted. The dark red area indicates that the current channel has jamming and w successfully predicted. For the WBSS algorithm, when the algorithm converges, so channels will be jammed, but the agent was not successfully predicted. If the channe selected for communication at this time, the communication will fail and cause g losses.  Figure 11 is a channel decision diagram of the proposed algorithm against intellig blocking jamming. Compared with Figure 10, after the algorithm converges, the N algorithm in this paper can accurately judge those channels that are actually jammed, there is no missed detection. Although there will be no jamming but predicted jamm the loss caused by this false alarm is very small. Therefore, the algorithm proposed in paper can effectively solve the problem of wireless communication intelligent ti-jamming in the case of non-ideal spectrum sensing. Comparing the performance of the algorithm in the anti-jamming channel decis under the two jamming patterns, we can see that: Although the accuracy of Q-learning algorithm is high, the convergence rate of the algorithm is slow. For jamm with dynamic jamming patterns, they may not be able to learn the jamming rules i short time and make effective anti-jamming decisions. In the anti-jamming channel d sion of the WBSS algorithm, although the convergence rate of the algorithm is m higher than that of Q-learning, there is a major defect in the sensing algorithm, tha due to the problems of false alarm and missed detection, the sensing results are not n Figure 11. NISS algorithm anti-intelligent blocking jamming channel decision diagram.

Conclusions
This paper proposes a NISS intelligent anti-jamming algorithm. The main purpose of the proposed algorithm is to solve two problems. One is the problem of low convergence rate of Q-learning because of updating the Q value of each channel one-by-one, and the other problem is the non-ideal perception of the WBSS algorithm. By referring to the Q value update strategy of the WBSS algorithm and taking the probability of false alarm and missed detection into the calculation of the Q value, the proposed algorithm achieves good anti-jamming effect. The simulation is carried out under the conditions of linear sweep jamming and intelligent blocking jamming. The results show that compared with the traditional Q-learning algorithm, the proposed algorithm converges faster with the same decision accuracy; compared with the WBSS algorithm, when the convergence rate is the same, the accuracy of jamming channel decision making is higher, which fully shows that this algorithm has better anti-jamming performance in the face of complex and changeable intelligent jamming.