Two-Dimensional Thompson Sampling for Joint Beam and Power Control for Uplink Maritime Communications
Abstract
1. Introduction
2. System Model
Statistical Modeling of Ocean Beam Channel
3. Joint Beamwidth and Power Selection Using 2DTS
3.1. Standard Thompson Sampling
3.1.1. Bayesian Multi-Armed Bandit Formulation
3.1.2. Bayesian Perspective and Thompson Sampling Logic
- (1)
- Initialization.
- For Bernoulli rewards (i.e., rewards are either 0 or 1), a beta distribution is typically selected:where and are positive hyperparameters. This is because the beta distribution is the conjugate prior for the Bernoulli likelihood, ensuring a simple form of posterior updates.
- For Gaussian rewards, a normal prior is often chosen:where is the initial mean, and is the initial variance. The normal distribution is a conjugate prior for the Gaussian likelihood, which keeps posterior updates tractable.
- (2)
- Sampling.
- Bernoulli rewards (Beta prior):where is the total number of observed “successes” (reward = 1) and the total number of observed “failures” (reward = 0) for arm k. Each time arm k is played, the parameters are updated accordingly.
- Gaussian rewards (Normal prior):Here, is the sum of all rewards observed from arm k, is the number of times arm k has been selected, and is the known variance of the underlying reward distribution.
- (3)
- Action Selection.
- (4)
- Update.
- Bernoulli rewards: If , update
- Gaussian rewards: Update the mean and variance parameters of the normal posterior using standard Gaussian conjugate updating formulas, incorporating the new data point .
- This feedback loop is repeated in each subsequent round, enabling the algorithm to progressively refine its understanding of each arm’s reward potential.
3.2. Problem Formulation
3.3. Thompson Sampling with Joint Beamwidth and Power Selection
- Initialize the parameters and for the beta distribution for each beamwidth i and each power level j, with , , , and for all i and j.
- For each trial t
- –
- Sample from for each beamwidth i.
- –
- Select the beamwidth with the highest sampled value:
- –
- For the selected beamwidth , establish the power value corresponding to the maximum sampled probability from the beta distribution from for each power level j, then select the power level with the highest sampled value:The algorithm selects the power level , aiming to minimize power usage by choosing the lowest possible power level that still satisfies the target threshold r. Here, denotes the instantaneous channel capacity, while the threshold r is set strictly below this maximum capacity to represent the actual transmission rate. Thus, the algorithm checks whether the achievable rate exceeds r, treating r as the success criterion for transmission.
- –
- Calculate the SNR for the selected beamwidth and power level:where is the effective channel gain associated with beamwidth and N is the noise power. Using this SNR, compute the rate as follows:
- –
- Determine the reward based on whether the achievable rate meets or exceeds the target threshold r:
- –
- Update the beta distribution parameters for the selected beamwidth and power level based on the observed reward:
- *
- For the selected beamwidth
- *
- For the selected power level of the selected beamwidth
- If , this means success (the rate threshold is met or exceeded); increase for the selected power level and the level immediately below it (if it exists). This update encourages the selection of lower power levels that can meet the threshold, improving power efficiency:
- If , this means fail (the rate threshold is not met); increase for the selected power level and all power levels below it. This penalizes lower power levels that fail to meet the threshold, encouraging the selection of higher power levels when necessary:
| Algorithm 1: 2DTS for Beamwidth and Power selection | |
| Require: Threshold r, Noise power , Beamwidth options , Power levels P, Rate R . | |
| 1: | Initialize , for all , and , for all , . |
| 2: | for to T do |
| 3: | Sample for each . |
| 4: | Select beamwidth . |
| 5: | For the selected , sample for each . |
| 6: | Select power level . |
| 7: | Compute SNR: . |
| 8: | Compute rate: . |
| 9: | Compute reward . |
| 10: | if then |
| 11: | Update . |
| 12: | Update . |
| 13: | if exists then |
| 14: | Update . |
| 15: | end if |
| 16: | else |
| 17: | Update . |
| 18: | Update . |
| 19: | end if |
| 20: | end for |
3.4. Energy Efficiency
4. Simulation Results
5. Discussion
5.1. Key Findings
5.2. Joint vs. Independent Selection
5.3. Effect of Explore–Then–Commit
5.4. Sensitivity and Limitations
5.5. Practical Implications and Future Work
Summary and Novelty
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Palma, D. Enabling the Maritime Internet of Things: CoAP and 6LoWPAN Performance Over VHF Links. IEEE Internet Things J. 2018, 5, 5205–5212. [Google Scholar] [CrossRef]
- Yang, T.; Zheng, Z.; Liang, H.; Deng, R.; Cheng, N.; Shen, X. Green Energy and Content-Aware Data Transmissions in Maritime Wireless Communication Networks. IEEE Trans. Intell. Transp. Syst. 2015, 16, 751–762. [Google Scholar] [CrossRef]
- Huo, Y.; Dong, X.; Beatty, S. Cellular Communications in Ocean Waves for Maritime Internet of Things. IEEE Internet Things J. 2020, 7, 9965–9979. [Google Scholar] [CrossRef]
- Zhou, Z.; Ge, N.; Wang, Z. Two-Timescale Beam Selection and Power Allocation for Maritime Offshore Communications. IEEE Commun. Lett. 2021, 25, 3060–3064. [Google Scholar] [CrossRef]
- Guan, S.; Wang, J.; Jiang, C.; Duan, R.; Ren, Y.; Quek, T.Q.S. MagicNet: The Maritime Giant Cellular Network. IEEE Commun. Mag. 2021, 59, 117–123. [Google Scholar] [CrossRef]
- Kim, H.J.; Tiwari, S.V.; Chung, Y.H. Multi-hop relay-based maritime visible light communication. Chin. Opt. Lett. 2016, 14, 050607. [Google Scholar] [CrossRef]
- Wang, W.; Gill, E.W. Evaluation of Beamforming and Direction Finding for a Phased Array HF Ocean Current Radar. J. Atmos. Ocean. Technol. 2016, 33, 2599–2613. [Google Scholar] [CrossRef]
- Romdhane, I.; Kaddoum, G. A Reinforcement-Learning-Based Beam Adaptation for Underwater Optical Wireless Communications. IEEE Internet Things J. 2022, 9, 20270–20281. [Google Scholar] [CrossRef]
- Jo, S.W.; Shim, W.S. LTE-Maritime: High-Speed Maritime Wireless Communication Based on LTE Technology. IEEE Access 2019, 7, 53172–53181. [Google Scholar] [CrossRef]
- Zhang, L.; Wang, Z.; Zhang, H.; Min, M.; Wang, C.; Niyato, D.; Han, Z. Anti-Jamming Colonel Blotto Game for Underwater Acoustic Backscatter Communication. IEEE Trans. Veh. Technol. 2024, 73, 10181–10195. [Google Scholar] [CrossRef]
- Ibrahim, S.; Mostafa, M.; Jnadi, A.; Salloum, H.; Osinenko, P. Comprehensive Overview of Reward Engineering and Shaping in Advancing Reinforcement Learning Applications. IEEE Access 2024, 12, 175473–175500. [Google Scholar] [CrossRef]
- Russo, D.J.; Roy, B.V.; Kazerouni, A.; Osband, I.; Wen, Z. A Tutorial on Thompson Sampling. Found. Trends Mach. Learn. 2018, 11, 1–96. [Google Scholar] [CrossRef]
- Chapelle, O.; Li, L. An Empirical Evaluation of Thompson Sampling. In Proceedings of the Advances in Neural Information Processing Systems; Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., Weinberger, K., Eds.; Curran Associates, Inc.: San Francisco, CA, USA, 2011; Volume 24, Available online: https://proceedings.neurips.cc/paper_files/paper/2011/file/e53a0a2978c28872a4505bdb51db06dc-Paper.pdf (accessed on 12 December 2011).
- Deng, W.; Kamiya, S.; Yamamoto, K.; Nishio, T.; Morikura, M. Thompson Sampling-Based Channel Selection Through Density Estimation Aided by Stochastic Geometry. IEEE Access 2020, 8, 14841–14850. [Google Scholar] [CrossRef]
- Tong, J.; Fu, L.; Wang, Y.; Han, Z. Model-Based Thompson Sampling for Frequency and Rate Selection in Underwater Acoustic Communications. IEEE Trans. Wireless Commun. 2023, 22, 6946–6961. [Google Scholar] [CrossRef]
- Komiyama, J.; Honda, J.; Nakagawa, H. Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 7–9 July 2015; Volume 37, pp. 1152–1161. [Google Scholar] [CrossRef]
- Bai, L.; Han, R.; Liu, J.; Choi, J.; Zhang, W. Random Access and Detection Performance of Internet of Things for Smart Ocean. IEEE Internet Things J. 2020, 7, 9858–9869. [Google Scholar] [CrossRef]
- Goodwin, M.; Elko, G. Constant beamwidth beamforming. In Proceedings of the 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing, Minneapolis, MN, USA, 27–30 April 1993; Volume 1, pp. 169–172. [Google Scholar] [CrossRef]
- Duan, R.; Wang, J.; Zhang, H.; Ren, Y.; Hanzo, L. Joint Multicast Beamforming and Relay Design for Maritime Communication Systems. IEEE Trans. Green Commun. Netw. 2020, 4, 139–151. [Google Scholar] [CrossRef]
- Wang, J.; Zhou, H.; Li, Y.; Sun, Q.; Wu, Y.; Jin, S.; Quek, T.Q.S.; Xu, C. Wireless Channel Models for Maritime Communications. IEEE Access 2018, 6, 68070–68088. [Google Scholar] [CrossRef]
- Yau, K.L.A.; Syed, A.R.; Hashim, W.; Qadir, J.; Wu, C.; Hassan, N. Maritime Networking: Bringing Internet to the Sea. IEEE Access 2019, 7, 48236–48255. [Google Scholar] [CrossRef]
- Love, D.; Heath, R. Grassmannian beamforming on correlated MIMO channels. In Proceedings of the IEEE Global Telecommunications Conference, GLOBECOM ’04. Dallas, TX, USA, 29 November–3 December 2004; Volume 1, pp. 106–110. [Google Scholar] [CrossRef]



| Sea State | Optimal Beamwidth (°) | Minimum Power (W) |
|---|---|---|
| 1 | 2.81 | 0.0013 |
| 2 | 5.62 | 0.0031 |
| 3 | 11.25 | 0.0106 |
| 4 | 11.25 | 0.0304 |
| 5 | 22.50 | 0.0416 |
| 6 | 22.50 | 0.0563 |
| 7 | 22.50 | 0.0779 |
| 8 | 45.00 | 0.1373 |
| 9 | 45.00 | 0.1625 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lee, K.J.; Jo, J.-H.; Cho, S.; Kwon, K.-W.; Kim, D. Two-Dimensional Thompson Sampling for Joint Beam and Power Control for Uplink Maritime Communications. J. Mar. Sci. Eng. 2025, 13, 2034. https://doi.org/10.3390/jmse13112034
Lee KJ, Jo J-H, Cho S, Kwon K-W, Kim D. Two-Dimensional Thompson Sampling for Joint Beam and Power Control for Uplink Maritime Communications. Journal of Marine Science and Engineering. 2025; 13(11):2034. https://doi.org/10.3390/jmse13112034
Chicago/Turabian StyleLee, Kyeong Jea, Joo-Hyun Jo, Sungyoon Cho, Ki-Won Kwon, and DongKu Kim. 2025. "Two-Dimensional Thompson Sampling for Joint Beam and Power Control for Uplink Maritime Communications" Journal of Marine Science and Engineering 13, no. 11: 2034. https://doi.org/10.3390/jmse13112034
APA StyleLee, K. J., Jo, J.-H., Cho, S., Kwon, K.-W., & Kim, D. (2025). Two-Dimensional Thompson Sampling for Joint Beam and Power Control for Uplink Maritime Communications. Journal of Marine Science and Engineering, 13(11), 2034. https://doi.org/10.3390/jmse13112034

