A Centralized Multi-User Anti-Composite Intelligent Interference Algorithm Based on Improved Q-Learning
Abstract
:1. Introduction
2. System Model and Problem Modeling
2.1. System Model
- Atypical multi-user wireless communication network has transmission channels accessible to every active user of the network, in addition to active communication users. Generally, a typical multi-user wireless communication network provides a sufficient number of channels available for access by users, so set . To improve the success rate of the access channels in this network, in this paper, a central base station is added to the multi-user wireless network. This is used to unify the channel coordination and allocation for each user in the network. The structure schematic is shown in Figure 1.
- All network users are equipped with broadband spectrum-perceived capability and have the ability to obtain the channel where current malicious interference is located through perception. The central base station serves as the decision-making center of the network and has the capacity to learn and make decisions. It can also coordinate users to access the transmission channel through a command signal, thereby enabling the combined anti-jamming of the system. All system users are within usable range of the central base station. None of the system users, nor the central base station, have prior information relating to external malicious interference.
- Competition arises when all system users simultaneously use the same channel for transmission, and users in competition cannot successfully transmit data. For the central base station to be able to reasonably coordinate and uniformly allot internal user access channels to avoid competition during transmission, users of the system must have the ability to communicate with each other and exchange the perceived results. In addition, this paper sets the channel noise so it is not sufficient for affecting the communication performance of users within the system.
- Communication time is divided into time slots of duration as the minimum time unit for continuous transmission, and interference time is divided in the same way. The time slots are further divided according to the responsibilities of each element in the system. The communication time slots of each system user are divided into observation sub-time slot and action sub-time slot , which are used for observing the external interference and the actions of other users in the system and communication transmission, as can be seen in Figure 2a. The communication time slots of the central base station are divided into decision sub-time slot and learning sub-time slot , which are used for transmitting decision information, such as user information, channel selection information, and execution of algorithms, as can be seen in Figure 2b.
- External malicious interference is set as high-power malicious interference in this paper, and the communication used by users is within the effective range of the malicious jammer. The external jammer senses the channel where each network user is located and selects the channel with the highest time slot utilization up to the current time slot—a single interference lasts for communication slots. The interference style is Multi-channel Probabilistic Tracking jamming (MPT-jamming). Based on this, sweep interference is added for a specific user in the network to squeeze the central base station to coordinate and allocate the choice space of the user for channel access. The interference style is defined as compound intelligent jamming (CIJ).
- The constituent elements within this network, which include the central base station and each user, share communication time. Each time slot is strictly synchronized, and the perceived capability and perceived results for each user are kept consistent. The external malicious interference time is the same and is synchronized with the communication time within the network.
2.2. Problem Modeling
- 1.
- State space : this mainly reflects the current state of the channel that is provided by the environment. Taking a channel as an example, when it is idle, the channel status is (idle), when it is jammed, the channel status is (jamming), and when it is occupied by users in the system, the channel status is (transmission). In the last two cases, the channel is defined as (busy) in this paper, which means the channel is occupied and busy. The communication decision of the system in the environment of external malicious interference is based on the perception result, or whether or not each channel is occupied. However, occupied channel information, which includes interference and information occupied by users in the system, is the knowledge that must be learned and applied by the system. Therefore, the system state space is defined as:
- 2.
- Action space : in the multi-user scenario, the action space is a collection of actions that are independently chosen by each system user. According to the setting in this paper, the action of each communication user is choosing a single channel from the optional channels provided by the environment as a means of completing communication transmission. Therefore, the independent action subspace that belongs to each user in the joint action space of the system is the same, i.e., . Independent action space for a single user can be defined as follows.
- 3.
- State transition probability : represents the probability of the set of agents transitioning to state after taking joint action in the channel state.
- 4.
- Reward function : the reward that is obtained by action taken by communication user in channel state and is dependent on the crowded transmission channel. A single transmission reward function for users in system is defined in the following way:
3. Central Anti-Jamming Algorithm Based on Improved Q-Learnings
Algorithm 1: CAJA |
1: Initialisation: , ; For any , , set in their respective current states |
2: for do |
3: Each user executes action according to joint action that is sent back by the central base station of the last time slot |
4: Each user perceives the channel where current external interference occurs and obtains the current state of the external environment |
5: The user perception results are shared to the central base station, which updates the status of each user according to the serial number |
6: Each user receives instant reward from the environment by performing action in their respective state |
7: The central base station makes sequential decisions based on user numbers in the system, updating their respective Q values according to Formula (12) |
8: The central base station makes a unified summary and calculates the overall Q value of the system at this time using the following formula: |
9: The central base station assigns the next time slot to each user based on the following action selection strategy : |
10: Update the respective states and actions: , ; the central base station distributes the command signal to the respective users in the system; |
11: |
12: end for |
4. Simulation Results and Experimental Analysis
4.1. Parameter Settings
4.2. Analysis of Simulation Results
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Alsabah, M.; Naser, M.A.; Mahmmod, B.M.; Abdulhussain, S.H.; Eissa, M.R.; Al-Baidhani, A.; Noordin, N.K.; Sait, S.M.; Al-Utaibi, K.A.; Hashim, F. 6G Wireless Communications Networks: A Comprehensive Survey. IEEE Access 2021, 9, 148191–148243. [Google Scholar] [CrossRef]
- Klaus, W.; Puttnam, B.J.; Luís, R.S.; Sakaguchi, J.; Mendinueta, J.-M.D.; Awaji, Y.; Wada, N. Advanced Space Division Multiplexing Technologies for Optical Networks. J. Opt. Commun. Netw. 2017, 9, C1–C11. [Google Scholar] [CrossRef]
- Jia, L.; Yao, F.; Sun, Y.; Xu, Y.; Feng, S.; Anpalagan, A. A Hierarchical Learning Solution for Anti-Jamming Stackelberg Game with Discrete Power Strategies. IEEE Wirel. Commun. Lett. 2017, 6, 818–821. [Google Scholar] [CrossRef]
- Grover, K.; Lim, A.; Yang, Q. Jamming and anti-jamming techniques in wireless networks: A survey. Int. J. Ad Hoc Ubiquitous Comput. IJAHUC 2014, 17, 197–215. [Google Scholar] [CrossRef] [Green Version]
- Pursley, M.B.; Stark, W.E. Performance of Reed-Solomon Coded Frequency-Hop Spread-Spectrum Communications in Partial-Band Interference. IEEE Trans. Commun. 1985, 33, 767–774. [Google Scholar] [CrossRef]
- Kavehrad, M.; Ramamurthi, B. Direct-Sequence Spread Spectrum with DPSK Modulation and Diversity for Indoor Wireless Communications. IEEE Trans. Commun. 1987, 35, 224–236. [Google Scholar] [CrossRef]
- Noels, N.; Moeneclaey, M. Performance of advanced telecommand frame synchronizer under pulsed jamming conditions. In Proceedings of the 2017 IEEE International Conference on Communications (ICC), Paris, France, 21–25 May 2017. [Google Scholar] [CrossRef] [Green Version]
- Trigui, E.; Esseghir, M.; Merghem-Boulahia, L. On Using Multi Agent Systems in Cognitive Radio Networks: A Survey. Int. J. Wirel. Mob. Netw. 2013, 4, 639–643. [Google Scholar] [CrossRef]
- Chen, Y.; Niu, Y.; Chen, C.; Zhou, Q. Conservative But Stable: A SARSA-Based Algorithm for Random Pulse Jamming in the Time Domain. Electronics 2022, 11, 1456. [Google Scholar] [CrossRef]
- Shalev-Shwartz, S. Understanding Machine Learning: From Theory to Algorithms; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
- Busoniu, L.; Babuska, R.; De Schutter, B. A Comprehensive Survey of Multiagent Reinforcement Learning. IEEE Trans. Syst. Man Cybern. Part C 2008, 38, 156–172. [Google Scholar] [CrossRef] [Green Version]
- Lowe, R.; Wu, Y.; Tamar, A.; Harb, J.; Abbeel, P.; Mordatch, I. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Zhang, X.; Wang, H.; Ruan, L.; Xu, Y.; Feng, Z. Joint channel and power optimisation for multi-user anti-jamming communications: A dual mode Q-learning approach. IET Commun. 2022, 16, 619–633. [Google Scholar] [CrossRef]
- Zhang, G.; Li, Y.; Niu, Y.; Zhou, Q. Anti-Jamming Path Selection Method in a Wireless Communication Network Based on Dyna-Q. Electronics 2022, 11, 2397. [Google Scholar] [CrossRef]
- Aref, M.A.; Jayaweera, S.K. A cognitive anti-jamming and interference-avoidance stochastic game. In Proceedings of the 2017 IEEE International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC), Oxford, UK, 26–28 July 2017. [Google Scholar]
- Zhou, Q.; Li, Y.; Niu, Y. Intelligent Anti-Jamming Communication for Wireless Sensor Networks: A Multi-Agent Reinforcement Learning Approach. IEEE Open J. Commun. Soc. 2021, 2, 775–784. [Google Scholar] [CrossRef]
- Wang, X.; Xu, Y.; Chen, J.; Li, C.; Liu, X.; Liu, D.; Xu, Y. Mean Field Reinforcement Learning Based Anti-Jamming Communications for Ultra-Dense Internet of Things in 6G. In Proceedings of the 12th International Conference on Wireless Communications and Signal Processing (WCSP), Nanjing, China, 21–23 October 2020. [Google Scholar]
- Yao, F.; Jia, L. A Collaborative Multi-Agent Reinforcement Learning Anti-Jamming Algorithm in Wireless Networks. IEEE Wirel. Commun. Lett. 2019, 8, 1024–1027. [Google Scholar] [CrossRef] [Green Version]
Parameter | Numerical Value |
---|---|
Number of users | 3 |
Number of available channels | 10 |
Length of communication time slot | 0.3 ms |
Observation sub-time slot | 0.1 ms |
Decision sub-time slot | 0.1 ms |
Action sub-time slot | 0.2 ms |
Learning sub-time slot | 0.2 ms |
Selective number of tracking jamming | 2 |
Number of continuous time slots of follower jamming | 3 |
Discount factor | 0.6 |
Learning rate factor | 0.8 |
Greedy factor | |
Total number of time slots | 2000 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Niu, Y.; Wan, B.; Chen, C. A Centralized Multi-User Anti-Composite Intelligent Interference Algorithm Based on Improved Q-Learning. Electronics 2023, 12, 1803. https://doi.org/10.3390/electronics12081803
Niu Y, Wan B, Chen C. A Centralized Multi-User Anti-Composite Intelligent Interference Algorithm Based on Improved Q-Learning. Electronics. 2023; 12(8):1803. https://doi.org/10.3390/electronics12081803
Chicago/Turabian StyleNiu, Yingtao, Boyu Wan, and Changxing Chen. 2023. "A Centralized Multi-User Anti-Composite Intelligent Interference Algorithm Based on Improved Q-Learning" Electronics 12, no. 8: 1803. https://doi.org/10.3390/electronics12081803
APA StyleNiu, Y., Wan, B., & Chen, C. (2023). A Centralized Multi-User Anti-Composite Intelligent Interference Algorithm Based on Improved Q-Learning. Electronics, 12(8), 1803. https://doi.org/10.3390/electronics12081803