A Multi-Stage Deep Learning Framework for Antenna Array Synthesis in Satellite IoT Networks
Abstract
1. Introduction
1.1. Related Works
1.2. Contributions
- Data efficiency: Ensemble ML reduces the volume of RL training data by constraining the search space to plausible regions.
- Computational scalability: Offline RL eliminates repeated EM solver calls during learning, unlike optimization heuristics or online RL.
- Adaptability: The combination generalizes across diverse conformal geometries and mission profiles, something single-stage ML or heuristic optimization alone struggles with.
2. Materials and Methods
2.1. Overview
2.2. Dataset
2.3. Related Works
2.3.1. Antenna Gain Calculation
- is the wavelength of the signal;
- d is the physical dimension (e.g., diameter of the antenna aperture);
- e is the efficiency factor;
- The term inside the logarithm represents the physical and geometric factors contributing to the antenna’s directivity and efficiency.
2.3.2. Resonant Frequency and Bandwidth
- Resonant frequency:
- Bandwidth:
2.3.3. Reflection Coefficient
- k is a sensitivity constant related to antenna design;
- is the antenna’s effective radius or a related geometric feature;
- is a baseline reference for effective radius;
- is an offset constant related to baseline reflection characteristics.
2.3.4. Synthetic Dataset Generation Procedure
2.3.5. Reinforcement Learning Dataset
2.4. Stacking Ensemble Model
2.4.1. Base Learner
2.4.2. Primary Learners
- Support Vector Regression (SVR): Captures nonlinear relationships using kernel methods [35].
- Gradient Boosting (GB): An ensemble of weak learners to model complex data patterns [36].
- Extreme Gradient Boosting (XGBoost): An optimized boosting algorithm that enhances model robustness and generalization [37].
2.4.3. Meta-Learner
2.4.4. Input Features
2.4.5. Output
2.5. Reinforcement Learning Optimization
2.5.1. Markov Decision Process Formulation
- States (S): Current geometric parameters of the antenna array.
- Actions (A): Adjustments to antenna parameters (e.g., element spacing or orientation changes) [11].
- Rewards (R): Feedback based on the improvement in beam-steering quality [7].
- Policy (): Mapping from states to actions.
- Value function (V): Expected cumulative reward for states following a policy [40].
2.5.2. Deep Q-Network (DQN)
2.5.3. Batch DQN with Offline Learning
- Input layer: Dimension equal to the number of antenna elements (4), fully connected to 128 neurons.
- Hidden layer 1: Fully connected, 128 neurons with ReLU activation.
- Hidden layer 2: Fully connected, 128 neurons with ReLU activation.
- Output layer: Fully connected, 256 neurons corresponding to the discrete action space of the array, where each element can shift its phase by , 0, or .
2.5.4. Loss Function: Huber Loss
2.5.5. Batch DQN Algorithm
- Network initialization: Both the Q-network and target network are initialized with identical weights. The target network is updated periodically to stabilize training, preventing oscillations and divergence.
- Experience replay buffer: Transitions are stored in a fixed-size replay buffer, which allows mini-batch sampling. This breaks the correlation between consecutive transitions and improves convergence stability.
- Epsilon-greedy policy: Actions are selected randomly with probability to encourage exploration, while the remainder are chosen according to the policy (max Q-value). The exploration rate decays over episodes to shift gradually from exploration to exploitation.
- Target calculation (Double DQN): For non-terminal states, the target Q-value is computed using the target network to mitigate overestimation bias inherent in standard Q-learning.
- Huber loss: The Huber loss is applied to the TD error to provide robustness against outliers in reward estimation and stabilize gradient updates.
- Weight updates: Gradients computed from the loss are backpropagated to update the Q-network weights. Periodic synchronization with the target network ensures stable learning.
Algorithm 1 Batch DQN algorithm |
|
3. Results
3.1. Ensemble Model Performance
Stacking Ensemble Model
3.2. Reinforcement Learning-Based Optimization for IoT Beam Steering
3.3. Generalization and Robustness
3.4. Limitations and Future Directions
4. Discussion
4.1. Strengths of the Approach
4.2. Comparison with Traditional Methods
4.3. Robustness and Assumptions
4.4. Practical Implications and Future Work
5. Conclusions
6. Data and Code Availability
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
AI | Artificial Intelligence |
DE | Differential Evolution |
DNN | Deep Neural Network |
DQN | Deep Q-Network |
GB | Gradient Boosting |
GA | Genetic Algorithm |
IoT | Internet of Things |
LEO | Low Earth Orbit |
LR | Linear Regression |
MDP | Markov Decision Process |
MIMO | Multiple-Input and Multiple-Output |
ML | Machine Learning |
MSE | Mean Squared Error |
PPO | Proximal Policy Optimization |
PSO | Particle Swarm Optimization |
RL | Reinforcement Learning |
SA | Simulated Annealing |
SVR | Support Vector Regression |
UAV | Unmanned Aerial Vehicle |
XGBoost | Extreme Gradient Boosting |
References
- Balanis, C.A. Antenna Theory: Analysis and Design; John Wiley & Sons: Hoboken, NJ, USA, 2016. [Google Scholar]
- Ferreira, D.B.; de Paula, C.B.; Nascimento, D.C. Design Techniques for Conformal Microstrip Antennas and Their Arrays. In Advancement in Microstrip Antennas with Recent Applications; InTech: London, UK, 2013. [Google Scholar]
- Veera, S.A.; Suganthi, J.; Kavitha, T. Conformal Antenna for Aircraft Applications. In Proceedings of the 2023 7th International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS), Bangalore, India, 2–4 November 2023; pp. 1–7. [Google Scholar] [CrossRef]
- Jensen, N.S.; Christiansen, L.H. Real-time Antenna Array Synthesis Using Machine Learning. TICRA News, 27 May 2024. [Google Scholar]
- Usmani, W.U.; Chietera, F.P.; Mescia, L. Flexible Phased Antenna Arrays: A Review. Sensors 2025, 25, 4690. [Google Scholar] [CrossRef]
- Goudos, S. Swarm intelligence algorithms for antenna design and wireless communications. In Swarm Intelligence—Volume 3: Applications; The Institution of Engineering and Technology: Stevenage, UK, 2018; pp. 755–784. [Google Scholar] [CrossRef]
- Valdez-Cervantes, L.; Núñez, C.; Ripoll, L.; Guerrero-Granados, B. Optimizing Linear Antenna Arrays with Genetic Algorithms. In Proceedings of the 2024 IEEE Colombian Conference on Communications and Computing (COLCOM), Barranquilla, Colombia, 21–23 August 2024; pp. 1–4. [Google Scholar] [CrossRef]
- Kirkpatrick, S.; Gelatt, C.D.; Vecchi, M.P. Optimization by Simulated Annealing. Science 1983, 220, 671–680. [Google Scholar] [CrossRef] [PubMed]
- Suman, B.; Kumar, P. A survey of simulated annealing as a tool for single and multiobjective optimization. J. Oper. Res. Soc. 2006, 57, 1143–1160. [Google Scholar] [CrossRef]
- El Misilmani, H.; Naous, T. Machine Learning in Antenna Design: An Overview on Machine Learning Concept and Algorithms. In Proceedings of the 2019 International Conference on High Performance Computing & Simulation (HPCS), Dublin, Ireland, 15–19 July 2019. [Google Scholar] [CrossRef]
- Gajbhiye, P.; Singh, S.; Kumar Sharma, M. A comprehensive review of AI and machine learning techniques in antenna design optimization and measurement. Discov. Electron. 2025, 2, 46. [Google Scholar] [CrossRef]
- Ramasamy, R.; Bennet, M.A. An Efficient Antenna Parameters Estimation Using Machine Learning Algorithms. Prog. Electromagn. Res. C 2023, 130, 169–181. [Google Scholar] [CrossRef]
- Benoni, A.; Poli, L. Pattern Matching Approach for the Synthesis of Sub-Arrayed Linear Antenna Arrays. In Proceedings of the 2022 IEEE International Symposium on Antennas and Propagation and USNC-URSI Radio Science Meeting (AP-S/URSI), Denver, CO, USA, 10–15 July 2022; pp. 1620–1621. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Zhang, B.; Jin, C.; Cao, K.; Lv, Q.; Mittra, R. Cognitive Conformal Antenna Array Exploiting Deep Reinforcement Learning Method. IEEE Trans. Antennas Propag. 2022, 70, 5094–5104. [Google Scholar] [CrossRef]
- Hessel, M.; Modayil, J.; Van Hasselt, H.; Schaul, T.; Ostrovski, G.; Dabney, W.; Horgan, D.; Piot, B.; Azar, M.; Silver, D. Rainbow: Combining Improvements in Deep Reinforcement Learning. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 2–7 February 2018; pp. 3216–3224. [Google Scholar]
- Zhang, S.; Huang, D.; Niu, B.; Bai, M. High-efficient Optimisation Method of Antenna Array Radiation Pattern Synthesis Based on Multi-layer Perceptron Network. IET Microwaves Antennas Propag. 2022, 16, 763–770. [Google Scholar] [CrossRef]
- Rengarajan, D.; Ragothaman, N.; Kalathil, D.; Shakkottai, S. Federated Ensemble-Directed Offline Reinforcement Learning. In Proceedings of the NeurIPS Proceedings, Vancouver, BC, Canada, 10–15 December 2024. [Google Scholar]
- Zhao, K.; Hao, J.; Ma, Y.; Liu, J.; Zheng, Y.; Meng, Z. ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles. arXiv 2024, arXiv:2306.06871. [Google Scholar] [CrossRef]
- Fakharian, M.M. Machine Learning Approach for Evaluation of Beam-String in a Metasurface-Based Terahertz Antenna for 6G Networks. Mater. Today Commun. 2025, 43, 111671. [Google Scholar] [CrossRef]
- Gao, P.; Chen, Z. Hybrid Sparse Array Design Based on Pseudo-Random Algorithm and Convex Optimization with Wide Beam Steering. Electronics 2024, 13, 4422. [Google Scholar] [CrossRef]
- Huang, Z.; Sun, X.; Wang, Y.; Wei, Z.; Wang, C.; Fan, Y.; Zhao, J. A soft actor–critic reinforcement learning approach for over the air active beamforming with reconfigurable intelligent surface. Phys. Commun. 2024, 66, 102474. [Google Scholar] [CrossRef]
- Yuan, Y.; Zhang, L.; Wang, J. Actor-Critic Learning-Based Energy Optimization for UAV-Assisted Networks. J. Wirel. Commun. Netw. 2021, 2021, 78. [Google Scholar] [CrossRef]
- Sadiq, M.; Sulaiman, N.; Isa, M.; Hamidon, M.N. A Review on Machine Learning in Smart Antenna: Methods and Techniques. TEM J. 2022, 11, 695–705. [Google Scholar] [CrossRef]
- Rao, S.C.; McAllister, P.E.; Kelsall, T. Antenna Engineering Handbook, 4th ed.; McGraw-Hill: New York, NY, USA, 1999. [Google Scholar]
- Lu, Y.; Chen, L.; Zhang, Y.; Shen, M.; Wang, H.; Wang, X.; van Rechem, C.; Fu, T.; Wei, W. Machine Learning for Synthetic Data Generation: A Review. arXiv 2025, arXiv:2302.04062v10. [Google Scholar] [CrossRef]
- Kraus, J.D.; Marhefka, R.J. Antennas: For All Applications, 3rd ed.; McGraw-Hill: New York, NY, USA, 2002. [Google Scholar]
- Tse, D.; Viswanath, P. Fundamentals of Wireless Communication; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
- Rana, M.; Rahman, M. Study of Microstrip Patch Antenna for Wireless Communication System. In Proceedings of the 2022 International Conference for Advancement in Technology (ICONAT), Goa, India, 21–22 January 2022; pp. 1–4. [Google Scholar] [CrossRef]
- Pozar, D.M. Microwave Engineering, 4th ed.; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
- Landron, O.; Feuerstein, M.; Rappaport, T. A comparison of theoretical and empirical reflection coefficients for typical exterior wall surfaces in a mobile radio environment. IEEE Trans. Antennas Propag. 1996, 44, 341–351. [Google Scholar] [CrossRef]
- Rappaport, T.S. Wireless Communications: Principles and Practice, 2nd ed.; Prentice Hall: Upper Saddle River, NJ, USA, 2014. [Google Scholar]
- Bishop, C.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Vapnik, V. The Nature of Statistical Learning Theory; Springer: Berlin/Heidelberg, Germany, 1995. [Google Scholar]
- Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
- Wolpert, D.H. Stacked Generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Bellman, R. Dynamic Programming; Princeton University Press: Princeton, NJ, USA, 1957. [Google Scholar]
- Fujimoto, S.; Meger, D.; Precup, D. Off-Policy Deep Reinforcement Learning without Exploration. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; Volume 3, pp. 2051–2060. [Google Scholar]
- Huber, P.J. Robust Estimation of a Location Parameter. Ann. Math. Stat. 1964, 35, 73–101. [Google Scholar] [CrossRef]
- Mattar, S.E.; Baghdad, A. Design and optimization of a rectangular microstrip patch antenna for dual-band 2.45 GHz/5.8 GHz RFID application. Int. J. Electr. Comput. Eng. (IJECE) 2022, 12, 5114–5122. [Google Scholar] [CrossRef]
- Cullen, A. Microstrip Antenna Theory and Design. Electron. Power 1982, 28, 193. [Google Scholar] [CrossRef]
- Shah, R.; Haque, M.J.; Samsuzzaman, M.; Masud, M.A.; Azim, R.; Hossain, I. Patch Antenna Design and Optimization Using Machine Learning Techniques. In Proceedings of the 2024 6th International Conference on Sustainable Technologies for Industry 5.0 (STI), Narayanganj, Bangladesh, 14–15 December 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Jin, N.; Rahmat-Samii, Y. Particle Swarm Optimization for Antenna Designs in Engineering Electromagnetics. J. Artif. Evol. Appl. 2008, 2008, 728929. [Google Scholar] [CrossRef]
- Schlosser, E.R.; Tolfo, S.M.; Heckler, M.V.T. Particle Swarm Optimization for antenna arrays synthesis. In Proceedings of the 2015 SBMO/IEEE MTT-S International Microwave and Optoelectronics Conference (IMOC), Porto de Galinhas, Brazil, 3–6 November 2015; pp. 1–6. [Google Scholar] [CrossRef]
- Anchidin, L.; Lavric, A.; Mutescu, P.-M.; Petrariu, A.I.; Popa, V. The Design and Development of a Microstrip Antenna for Internet of Things Applications. Sensors 2023, 23, 1062. [Google Scholar] [CrossRef] [PubMed]
- Singh, S.; Singh, H.; Mittal, N.; Kaur Punj, G.; Kumar, L.; Fante, K.A. A hybrid swarm intelligent optimization algorithm for antenna design problems. Sci. Rep. 2025, 15, 4444. [Google Scholar] [CrossRef] [PubMed]
- Ye, X.; Mao, Y.; Yu, X.; Sun, S.; Fu, L.; Xu, J. Integrated Sensing and Communications for Low-Altitude Economy: A Deep Reinforcement Learning Approach. arXiv 2024, arXiv:2412.04074. [Google Scholar] [CrossRef]
- Xie, C.; Xiu, Y.; Yang, S.; Miao, Q.; Chen, L.; Gao, Y.; Zhang, Z. Deep Reinforcemnet Learning for Robust Beamforming in Integrated Sensing, Communication and Power Transmission Systems. Sensors 2025, 25, 388. [Google Scholar] [CrossRef]
- Zhou, X.; Chen, X.; Tong, L.; Wang, Y. Attention-deep reinforcement learning jointly beamforming based on tensor decomposition for RIS-assisted V2X mmWave massive MIMO system. Complex Intell. Syst. 2024, 10, 145–160. [Google Scholar] [CrossRef]
Component | Description |
---|---|
State | Phase distribution of a patch antenna array. Discrete values represent the state. |
Action | Phase change applied to each element: , 0, or . |
Next State | Resulting state after applying the action to the current state. |
Gain Array | Output from the stacking ensemble model: 360-length array, each index representing gain at a specific angle. |
Max Gain Direction | Angle corresponding to the maximum value in the gain array for the given configuration. |
Reward | Maximum gain in the direction computed from the next state. |
Model | MSE |
---|---|
Base Model (Linear Regression) | 0.48 |
Ensemble Model | 0.20 |
Meta-Learner | 0.22 |
Overall Model (IoT Antenna Prediction) | 0.06 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Arunachalam, V.; Rosen, L.; Akinsiku, M.R.; Dey, S.; Gomes, R.; Mitra, D. A Multi-Stage Deep Learning Framework for Antenna Array Synthesis in Satellite IoT Networks. AI 2025, 6, 248. https://doi.org/10.3390/ai6100248
Arunachalam V, Rosen L, Akinsiku MR, Dey S, Gomes R, Mitra D. A Multi-Stage Deep Learning Framework for Antenna Array Synthesis in Satellite IoT Networks. AI. 2025; 6(10):248. https://doi.org/10.3390/ai6100248
Chicago/Turabian StyleArunachalam, Valliammai, Luke Rosen, Mojisola Rachel Akinsiku, Shuvashis Dey, Rahul Gomes, and Dipankar Mitra. 2025. "A Multi-Stage Deep Learning Framework for Antenna Array Synthesis in Satellite IoT Networks" AI 6, no. 10: 248. https://doi.org/10.3390/ai6100248
APA StyleArunachalam, V., Rosen, L., Akinsiku, M. R., Dey, S., Gomes, R., & Mitra, D. (2025). A Multi-Stage Deep Learning Framework for Antenna Array Synthesis in Satellite IoT Networks. AI, 6(10), 248. https://doi.org/10.3390/ai6100248