Deep Reinforcement Learning-Based Relay Selection Algorithm in Free-Space Optical Cooperative Communications
Abstract
:1. Introduction
1.1. Related Works
1.2. Motivation and Contribution
- A Markov Decision Process (MDP) model was developed to describe the relay selection problem.
- Different from the abovementioned literature, the switching loss between relay nodes was considered. The corresponding expression of channel capacity was also derived.
- We propose a DQN-RS algorithm based on the dueling DQN method, which deals with the relay selection issue in a cooperative DF-FSO system in order to achieve higher channel capacity.
- In our proposed DQN-RS algorithm, a state contained both the previous selected relay and the current CSI. An action presented the current selected relay, where the corresponding reward was equal to the derived channel capacity.
1.3. Paper Structure
2. System Structure and Problem Formulation
2.1. System Structure
2.2. Problem Formulation
3. DRL-Based Solution
3.1. RL Framework-Based Optimization Problem
3.2. Markov Decision Process
- State Space: The current state space of the whole system at the -th TS is defined as , which consists both the previous selected relay node and the current CSI. represents the selected relay in the -th TS by one-hot code.
- Action Space: In our system model, the agent, as the decision maker, makes an optimal decision on the relay selection. The current action space of the whole system at -th TS is defined as .
- Immediate reward function: After taking action in state , the system will transfer to the next state ; meanwhile, the agent will obtain an immediate reward . In our optimization problem, the immediate reward was defined as the current capacity defined in Equation (3), i.e., .
3.3. The DQN-RS Algorithm
Algorithm 1. The pseudocode diagram of the proposed DQN-RS algorithm. | |
Input: The cooperative DF-FSO simulator and its parameters. Output: Optimal action of each time slot. | |
1: | Initialize experience replay memory with size . |
2: | Initialize , and with random weights and initialize , and by . |
3: | Initialize the minibatch size with . |
4: | for episode = 1,2… do as follows |
5: | Initialize the environment and observe the environment initial state . |
6: | for do as follows |
7: | Select a random action with probability or otherwise select action . |
8: | Execute action and receive immediate reward , observe . |
9: | Store the transition data in the buffer . |
10: | if is full, do as follows |
11: | Sample a random minibatch of sets of transition data from . |
12: | Update the online network by (14). |
13: | Update the target network by (15). |
14: | end for |
15: | end for |
4. Simulation Results
5. Conclusions and Prospects
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
channel gain from the source node to the -th relay node | |
channel gain from the -th relay node to the destination node | |
channel capacity of the -th TS | |
capacity gain by our proposed algorithm over the greedy algorithm | |
switching loss when switching between neighboring relay nodes | |
switching loss between -th relay node and -th relay node | |
number of relay nodes | |
, , | parameters of the convolutional layers and two streams of fully connected layers in the online network |
, | parameters of the convolutional layers and two streams of fully connected layers in the target network |
, , | current state space, action space, reward of the whole system at -th TS |
discount factor of the DQN-RS algorithm | |
, | learning rate of the online network and target network |
References
- Al-Kinani, A.; Wang, C.X.; Zhou, L.; Zhang, W. Optical Wireless Communication Channel Measurements and Models. IEEE Commun. Surv. Tutor. 2018, 20, 1939–1962. [Google Scholar] [CrossRef]
- Zhou, H.; Fu, D.; Dong, J.; Zhang, P.; Chen, D.; Cai, X.; Li, F.; Zhang, X. Orbital angular momentum complex spectrum analyzer for vortex light based on the rotational Doppler effect. Light Sci. Appl. 2017, 6, e16251. [Google Scholar] [CrossRef] [PubMed]
- Marbel, R.; Yozevitch, R.; Grinshpoun, T.; Ben-Moshe, B. Dynamic Network Formation for FSO Satellite Communication. Appl. Sci. 2022, 12, 738. [Google Scholar] [CrossRef]
- Chauhan, I.; Bhatnagar, M.R. Performance of Transmit Aperture Selection to Mitigate Jamming. Appl. Sci. 2022, 12, 2228. [Google Scholar] [CrossRef]
- Taher, M.A.; Abaza, M.; Fedawy, M.; Aly, M.H. Relay Selection Schemes for FSO Communications over Turbulent Channels. Appl. Sci. 2019, 9, 1281. [Google Scholar] [CrossRef] [Green Version]
- Wang, P.; Wang, R.; Guo, L.; Cao, T.; Yang, Y. On the performances of relay-aided FSO system over m distribution with pointing errors in presence of various weather conditions. Opt. Commun. 2016, 367, 59–67. [Google Scholar] [CrossRef]
- Li, A.; Wang, W.; Wang, P.; Pang, W.; Qin, Y. SER performance investigation of a MPPM relay-aided FSO system with three decision thresholds over EW fading channel considering pointing errors. Opt. Commun. 2021, 487, 126803. [Google Scholar] [CrossRef]
- Agarwal, D.; Bansal, A.; Kumar, A. Analyzing selective relaying for multiple-relay–based differential DF-FSO network with pointing errors. Trans. Emerg. Telecommun. Technol. 2018, 29, e3306. [Google Scholar] [CrossRef]
- Dabiri, M.T.; Sadough, S. Performance Analysis of All-Optical Amplify and Forward Relaying Over Log-Normal FSO Channels. J. Opt. Commun. Netw. 2018, 10, 79–89. [Google Scholar] [CrossRef]
- Mohammad, T.; Nasim, M.; Chahé, N. Performance analysis of an asymmetric two-hop amplify-and-forward relaying RF–FSO system in a cognitive radio with partial relay selection. Opt. Commun. 2022, 505, 127478. [Google Scholar]
- Xing, F.; Yin, H.; Ji, X.; Leung, V. Joint Relay Selection and Power Allocation for Underwater Cooperative Optical Wireless Networks. IEEE Trans. Wirel. Commun. 2020, 19, 251–264. [Google Scholar] [CrossRef]
- Hassan, M.M.; Rather, G.M. Innovative relay selection and optimize power allocation for free space optical communication. Opt. Quant. Electron. 2021, 53, 689. [Google Scholar] [CrossRef]
- Boluda-Ruiz, R.; García-Zambrana, A.; Castillo-Vázquez, B.; Castillo-Vázquez, C. Impact of relay placement on diversity order in adaptive selective DF relay-assisted FSO communications. Opt. Express 2015, 23, 2600–2617. [Google Scholar] [CrossRef]
- Prasad, G.; Mishra, D.; Tourki, K.; Hossain, A.; Debbah, M. QoS and Energy Aware Optimal Resource Allocations in DF Relay-Assisted FSO Networks. IEEE Trans. Green Commun. Netw. 2020, 4, 914–926. [Google Scholar] [CrossRef]
- Tan, Y.; Liu, Y.; Guo, L.; Han, P. Joint relay selection and link scheduling in cooperative free-space optical system. Opt. Eng. 2016, 55, 111604. [Google Scholar] [CrossRef]
- Halima, N.B.; Boujemâa, H. Round Robin, Centralized and Distributed Relay Selection for Free Space Optical Communications. Wireless Pers. Commun. 2019, 108, 51–66. [Google Scholar] [CrossRef]
- Abou-Rjeily, C. Improved Buffer-Aided Selective Relaying for Free Space Optical Cooperative Communications. IEEE Trans. Wirel. Commun. 2022. [Google Scholar] [CrossRef]
- Dang, S.; Tang, J.; Li, J.; Wen, M.; Abdullah, S.; Li, C. Combined Relay Selection Enabled by Supervised Machine Learning. IEEE Trans. Veh. Technol. 2021, 70, 3938–3943. [Google Scholar] [CrossRef]
- Gao, Z.; Eisen, M.; Ribeiro, A. Resource Allocation via Model-Free Deep Learning in Free Space Optical Communications. IEEE Trans. Wirel. Commun. 2022, 70, 920–934. [Google Scholar] [CrossRef]
- Su, Y.; Lu, X.; Zhao, Y.; Huang, L.; Du, X. Cooperative Communications with Relay Selection based on Deep Reinforcement Learning in Wireless Sensor Networks. IEEE Sens. J. 2019, 19, 9561–9569. [Google Scholar] [CrossRef]
- Guo, S.; Zhao, X. Deep Reinforcement Learning Optimal Transmission Algorithm for Cognitive Internet of Things with RF Energy Harvesting. IEEE Trans. Cogn. Commun. Netw. 2022. [Google Scholar] [CrossRef]
- Chatzidiamantis, N.D.; Michalopoulos, D.S.; Kriezis, E.E.; Karagiannidis, G.K.; Schober, R. Relay Selection Protocols for relay-assisted Free Space Optical systems. J. Opt. Commun. Netw. 2013, 5, 92–103. [Google Scholar] [CrossRef]
- Li, M.; Yu, F.; Si, P.; Wu, W.; Zhang, Y. Resource optimization for delay-tolerant data in blockchain-enabled IoT with edge computing: A deep reinforcement learning approach. IEEE Internet Things J. 2020, 7, 9399–9412. [Google Scholar] [CrossRef]
Parameters | Values |
---|---|
Number of Relay Nodes | 4 |
Number of Time Slots | 10,100 |
Switching Loss | 0.2 |
Photodetector Responsivity | 0.9 |
Fading parameters | 5.9776, 4.3980, 0.0032, 6.2552 |
Normalized Power | 5 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gao, S.-J.; Li, Y.-T.; Geng, T.-W. Deep Reinforcement Learning-Based Relay Selection Algorithm in Free-Space Optical Cooperative Communications. Appl. Sci. 2022, 12, 4881. https://doi.org/10.3390/app12104881
Gao S-J, Li Y-T, Geng T-W. Deep Reinforcement Learning-Based Relay Selection Algorithm in Free-Space Optical Cooperative Communications. Applied Sciences. 2022; 12(10):4881. https://doi.org/10.3390/app12104881
Chicago/Turabian StyleGao, Shi-Jie, Ya-Tian Li, and Tian-Wen Geng. 2022. "Deep Reinforcement Learning-Based Relay Selection Algorithm in Free-Space Optical Cooperative Communications" Applied Sciences 12, no. 10: 4881. https://doi.org/10.3390/app12104881
APA StyleGao, S.-J., Li, Y.-T., & Geng, T.-W. (2022). Deep Reinforcement Learning-Based Relay Selection Algorithm in Free-Space Optical Cooperative Communications. Applied Sciences, 12(10), 4881. https://doi.org/10.3390/app12104881