A Multi-Stage Deep Learning Framework for Antenna Array Synthesis in Satellite IoT Networks

Arunachalam, Valliammai; Rosen, Luke; Akinsiku, Mojisola Rachel; Dey, Shuvashis; Gomes, Rahul; Mitra, Dipankar

doi:10.3390/ai6100248

Open AccessArticle

A Multi-Stage Deep Learning Framework for Antenna Array Synthesis in Satellite IoT Networks

by

Valliammai Arunachalam

¹

,

Luke Rosen

¹

,

Mojisola Rachel Akinsiku

²

,

Shuvashis Dey

²

,

Rahul Gomes

^3,*

and

Dipankar Mitra

^1,*

¹

Department of Computer Science and Computer Engineering, University of Wisconsin-La Crosse, La Crosse, WI 54601, USA

²

Department of Electrical and Computer Engineering, North Dakota State University, Fargo, ND 58105, USA

³

Department of Computer Science, University of Wisconsin-Eau Claire, Eau Claire, WI 54701, USA

^*

Authors to whom correspondence should be addressed.

AI 2025, 6(10), 248; https://doi.org/10.3390/ai6100248

Submission received: 20 August 2025 / Revised: 18 September 2025 / Accepted: 24 September 2025 / Published: 1 October 2025

(This article belongs to the Topic Innovations in AI and Signal Processing for Advanced Sensing, Radar, RFID, and Communication Systems)

Download

Browse Figures

Versions Notes

Abstract

This paper presents an innovative end-to-end framework for conformal antenna array design and beam steering in Low Earth Orbit (LEO) satellite-based IoT communication systems. We propose a multi-stage learning architecture that integrates machine learning (ML) for antenna parameter prediction with reinforcement learning (RL) for adaptive beam steering. The ML module predicts optimal geometric and material parameters for conformal antenna arrays based on mission-specific performance requirements such as frequency, gain, coverage angle, and satellite constraints with an accuracy of 99%. These predictions are then passed to a Deep Q-Network (DQN)-based offline RL model, which learns beamforming strategies to maximize gain toward dynamic ground terminals, without requiring real-time interaction. To enable this, a synthetic dataset grounded in statistical principles and a static dataset is generated using CST Studio Suite and COMSOL Multiphysics simulations, capturing the electromagnetic behavior of various conformal geometries. The results from both the machine learning and reinforcement learning models show that the predicted antenna designs and beam steering angles closely align with simulation benchmarks. Our approach demonstrates the potential of combining data-driven ensemble models with offline reinforcement learning for scalable, efficient, and autonomous antenna synthesis in resource-constrained space environments.

Keywords:

conformal antenna array; Low Earth Orbit (LEO); satellite IoT; beam steering; antenna synthesis; machine learning; reinforcement learning; Deep Q-Network (DQN); offline reinforcement learning; electromagnetic simulation; CST studio; COMSOL multiphysics

1. Introduction

Phased antenna arrays are essential components in modern communication systems, radar technologies, and satellite networks due to their ability to dynamically steer beams and control radiation patterns [1]. Among these, conformal antenna arrays—those mounted on curved surfaces such as satellite bodies or Unmanned Aerial Vehicle (UAV) fuselages—are gaining increased attention for their mechanical adaptability and seamless integration into complex platforms [2,3]. However, designing efficient conformal arrays remains challenging due to the geometric constraints introduced by non-planar surfaces. When arrays are mounted on cylindrical or spherical structures, conventional design methodologies often fail to meet stringent performance requirements [4,5].

Traditional approaches for antenna array synthesis have primarily relied on manual tuning or semi-automated optimization methods. Techniques such as particle swarm optimization (PSO), genetic algorithms (GAs), and differential evolution (DE) have been employed to optimize array parameters according to predefined objectives [6,7]. While effective in some scenarios, these methods are computationally expensive, require repeated electromagnetic (EM) simulations, and exhibit limited scalability in high-dimensional or nonconvex design spaces. Gradient-based algorithms and simulated annealing (SA) face similar drawbacks, particularly when applied to conformal geometries [8,9].

Recent advances in machine learning (ML) have introduced new possibilities for automating and accelerating antenna design. Supervised learning using deep neural networks (DNNs) has been applied to predict optimal parameters from labeled datasets, achieving significant speedups compared to iterative optimization [10,11]. However, supervised models are heavily dependent on large and diverse datasets and often struggle to generalize to novel antenna geometries [12]. Unsupervised methods, such as clustering, have been explored for design space exploration, but they do not directly address performance optimization [13].

A more promising direction involves reinforcement learning (RL), which enables agents to iteratively improve performance through feedback-driven interaction with the environment [14,15]. In antenna engineering, RL has been applied for beam steering and array configuration optimization, showing the ability to adaptively learn from communication performance metrics [16]. Nonetheless, conventional RL approaches are hindered by their dependence on large numbers of real-time interactions, making them impractical for EM-driven antenna synthesis due to the computational cost of simulations. To address these limitations, offline reinforcement learning, particularly the Deep Q-Network (DQN) algorithm, has emerged as a resource-efficient solution. By leveraging precomputed datasets, offline RL significantly reduces simulation overhead while maintaining robust policy learning [17].

1.1. Related Works

The integration of machine learning (ML) and reinforcement learning (RL) into antenna synthesis has gained significant attention in recent years. Early works by Liang et al. [10] and Weng et al. [11] demonstrated the use of deep learning for predicting optimal parameters of planar antenna arrays, while Lu et al. [13] employed clustering techniques for design space exploration, highlighting the role of unsupervised ML in pattern recognition. In the domain of RL, Raj et al. [16] applied Q-learning to beam-steering tasks, showing promise but facing scalability and efficiency challenges. More recently, Zhang et al. [18] surveyed hybrid optimization methods for conformal arrays, underscoring the need for algorithms that balance accuracy, efficiency, and adaptability. Despite these advances, few works have specifically addressed conformal antenna synthesis in resource-constrained environments such as Low Earth Orbit (LEO) satellite networks.

Several other approaches explore offline RL and hybrid optimization relevant to broader engineering tasks. Federated Ensemble-Directed Offline RL [19] investigates distributed agents leveraging ensembles to enhance learning stability and efficiency; however, it is not antenna-specific and does not address conformal geometries. Methods that improve offline-to-online RL [20] enhance learning stability during online fine-tuning but lack integration with antenna parameter prediction. Machine learning for beam-steering evaluation [21] demonstrates the potential of ML to optimize antenna performance but does not incorporate reinforcement learning for adaptive control. Hybrid sparse array optimization [22] combines pseudo-random and convex optimization for planar arrays but does not generalize to curved or conformal surfaces.

Recent works have also explored actor–critic RL for antenna and wireless system optimization. Actor–critic approaches, such as the Soft Actor–Critic method applied to beamforming with reconfigurable intelligent surfaces [23] or UAV-assisted networks [24], offer continuous action space optimization and improved stability compared to traditional Q-learning. Similarly, actor–critic RL has been used for adaptive antenna beamforming in dynamic environments. While these methods advance RL for wireless systems, they typically require online interactions with the environment and high computational cost, limiting applicability to resource-constrained LEO satellite antenna design.

In contrast, our framework uniquely combines ensemble ML for rapid prediction of geometric and material parameters with offline RL for adaptive beam steering. By leveraging precomputed simulation data, it avoids costly online interaction while explicitly accounting for conformal geometries and satellite mission constraints, offering superior scalability, data efficiency, and adaptability compared to prior works that either consider only ML, traditional optimization, or generic RL techniques.

1.2. Contributions

While prior works have explored optimization heuristics (e.g., PSO, GA, DE) and even hybrid ML–optimization methods [6,18], these approaches still suffer from scalability issues, heavy reliance on EM solvers, or lack of adaptability to new environments. Similarly, reinforcement learning has been applied in antenna design [16], but traditional online RL requires extensive real-time interaction with high-fidelity simulations, making it impractical for resource-constrained environments such as LEO satellite networks.

In contrast, our framework combines ensemble ML with offline RL to overcome these challenges in a complementary manner. The ensemble ML stage rapidly predicts near-optimal geometric and material parameters from mission requirements, thereby reducing the dimensionality of the design space. The offline RL stage then leverages precomputed simulation data to efficiently learn beam-steering strategies without costly online interaction. This synergy provides three key advantages over existing approaches:

Data efficiency: Ensemble ML reduces the volume of RL training data by constraining the search space to plausible regions.
Computational scalability: Offline RL eliminates repeated EM solver calls during learning, unlike optimization heuristics or online RL.
Adaptability: The combination generalizes across diverse conformal geometries and mission profiles, something single-stage ML or heuristic optimization alone struggles with.

This work highlights the power of combining ensemble machine learning with offline reinforcement learning for intelligent, scalable, and resource-efficient antenna design. It presents a viable alternative to traditional optimization techniques, particularly suited to constrained environments such as space-based IoT networks, where adaptability, efficiency, and autonomous operation are critical [25].

2. Materials and Methods

2.1. Overview

This work proposes a two-stage model to optimize antenna array design for conformal applications by combining machine learning and reinforcement learning techniques, as outlined in Figure 1. The first stage involves predicting antenna geometric parameters using a stacking ensemble model. The second stage optimizes these parameters through a reinforcement learning approach to maximize beam-steering performance.

2.2. Dataset

To mitigate the challenges posed by the limited availability of real-world antenna datasets, we generate a synthetic dataset by applying statistical distributions to key antenna parameters, including resonant frequency, bandwidth, gain, and reflection coefficient [1,26]. These parameters are derived from fundamental antenna theory [1], enabling the creation of a diverse dataset that significantly reduces the computational costs associated with traditional simulation-based methods [27].

2.3. Related Works

The synthetic dataset generated for this study consists of 50,000 unique antenna configurations, covering a wide range of resonant frequencies (2–10 GHz), bandwidths, gains, and reflection coefficients. To ensure diversity, multiple antenna types—including patch, horn, and spiral antennas—were considered. Geometric parameters such as element spacing, orientation, array shape (linear and cylindrical), and surface curvature were uniformly sampled across realistic ranges derived from CST 2025.04 and COMSOL 6.1 simulations. This extensive coverage enables the model to generalize across different antenna types and conformal geometries relevant to satellite and IoT applications.

For training the stacking ensemble model, 80% of the dataset was used for training and 20% for validation. The model was trained over 150 epochs with early stopping applied based on validation loss. The offline reinforcement learning model was trained for 100,000 episodes using a batch size of 64 and a learning rate of 0.001, with an epsilon-greedy exploration strategy decaying from 1.0 to 0.01.

To account for practical deployment scenarios, we explicitly consider the conformal setting of the antenna array surface. In this study, we focus on a

1 \times 4

microstrip patch array implemented on a flexible NinjaFlex substrate with Electrifi conductive traces, which enables bending and surface attachment. The conformal geometry is modeled primarily on cylindrical surfaces, as this represents the most common form of bending encountered in satellite, aerospace, and wearable applications. Conformality is not a trivial aspect: bending alters the inter-element spacing, current distribution, and mutual coupling, which in turn impacts key performance metrics, such as gain, reflection coefficient, and side-lobe levels. By incorporating conformal configurations in our dataset generation and analysis, we ensure that the proposed machine learning and reinforcement learning framework generalizes beyond ideal flat arrays to realistic curved surfaces relevant to space environments.

2.3.1. Antenna Gain Calculation

A key performance indicator of antenna design is the antenna’s gain, which measures how well the antenna directs energy in a particular direction. To calculate gain, we use an aperture efficiency model based on fundamental principles of antenna theory [1,28]. The gain G is given by

G = 10 {log}_{10} (\frac{π {(\frac{d}{λ})}^{2} \cdot e^{2}}{2})

(1)

where

$λ$ is the wavelength of the signal;
d is the physical dimension (e.g., diameter of the antenna aperture);
e is the efficiency factor;
The term inside the logarithm represents the physical and geometric factors contributing to the antenna’s directivity and efficiency.

This model approximates the antenna’s gain based on its physical dimensions and efficiency, key factors influencing antenna performance [1,28,29].

2.3.2. Resonant Frequency and Bandwidth

The resonant frequency and bandwidth are crucial parameters determining the antenna’s operational range [1,28]. These parameters are uniformly sampled within ranges observed from our CST simulations and the reported literature [1,30] to ensure realistic variability.

Resonant frequency:

$f_{res} = Uniform (f_{\min}, f_{\max})$

(2)

where $f_{\min}$ and $f_{\max}$ define the frequency range of interest.
Bandwidth:

$B = Uniform (B_{\min}, B_{\max})$

(3)

where $B_{\min}$ and $B_{\max}$ define the bandwidth range.

The use of uniform distributions ensures exploration of a broad design space reflecting real-world antenna performance [30].

2.3.3. Reflection Coefficient

The reflection coefficient

S_{11}

characterizes how well the antenna matches the transmission line [1,31]. Lower values indicate better matching and signal transmission efficiency.

We estimate

S_{11}

using the empirical relation [32]

S_{11} = k \times (e_{dr} - e_{drd}) + e_{r 11}

(4)

where

k is a sensitivity constant related to antenna design;
$e_{dr}$ is the antenna’s effective radius or a related geometric feature;
$e_{drd}$ is a baseline reference for effective radius;
$e_{r 11}$ is an offset constant related to baseline reflection characteristics.

This formulation provides realistic reflection characteristics based on physical dimensions and design parameters [31,32].

2.3.4. Synthetic Dataset Generation Procedure

To create the synthetic dataset, the following procedure is applied:

Resonant frequency and bandwidth are sampled uniformly within ranges obtained from CST simulations and the literature [1,31].
Gain is computed using the aperture efficiency model to ensure realistic antenna performance [1,33].
Reflection coefficient is estimated via the empirical formula to simulate antenna matching characteristics [31,32].

This approach yields a comprehensive dataset capturing the complex relationships among antenna parameters while avoiding the computational expense of full electromagnetic simulations. The dataset serves as a robust foundation for training machine learning models to optimize antenna array performance across diverse design spaces [1,31].

2.3.5. Reinforcement Learning Dataset

The dataset used for training both the convolutional neural network and the offline reinforcement learning model consists of tuples with the following components in addition to the output from the stacking ensemble model. The structure of the reinforcement learning dataset is described in Table 1.

Figure 2 illustrates the gain distribution for the first 50 samples in the dataset. Each row corresponds to one antenna configuration, while the color intensity encodes the relative gain values. The horizontal axis represents gain levels, and the vertical axis indexes the samples. From the heatmap, it can be observed that the gain is not uniformly distributed across configurations. Instead, certain samples exhibit localized regions of higher gain, suggesting that antenna geometry and parameter variations have a strong influence on performance. This visualization highlights the diversity in the dataset, which is crucial for training and evaluating the proposed learning framework.

Figure 3 illustrates the 2D radiation patterns of three representative samples from our reinforcement learning dataset. In each plot, the blue curve denotes the radiation pattern corresponding to the current antenna state, while the yellow curve represents the next antenna state obtained after applying an action. This comparison highlights how the reinforcement learning framework captures state transitions in terms of beam steering and gain distribution.

2.4. Stacking Ensemble Model

The stacking ensemble model predicts the geometric parameters of the antenna array based on its input features. It integrates multiple learners to improve prediction accuracy.

2.4.1. Base Learner

A Linear Regression (LR) model serves as the base learner, capturing linear relationships between input features and antenna parameters [34].

2.4.2. Primary Learners

The primary learners consist of the following:

Support Vector Regression (SVR): Captures nonlinear relationships using kernel methods [35].
Gradient Boosting (GB): An ensemble of weak learners to model complex data patterns [36].
Extreme Gradient Boosting (XGBoost): An optimized boosting algorithm that enhances model robustness and generalization [37].

2.4.3. Meta-Learner

A Linear Regression meta-learner combines the outputs from the primary learners to generate final geometric parameter predictions [38].

2.4.4. Input Features

The model inputs include key antenna geometric parameters, such as array shape (e.g., linear, cylindrical), element spacing, element orientation, element size, surface curvature, operational frequency range, beamwidth, and radiation pattern.

2.4.5. Output

The output is the predicted set of antenna geometric parameters used as input for the reinforcement learning optimization stage.

2.5. Reinforcement Learning Optimization

Reinforcement learning (RL) is utilized to optimize the predicted antenna parameters by interacting with a defined environment and maximizing a reward signal based on beam steering performance [14,39].

2.5.1. Markov Decision Process Formulation

The RL problem is formulated as a Markov Decision Process (MDP) defined by the following:

States (S): Current geometric parameters of the antenna array.
Actions (A): Adjustments to antenna parameters (e.g., element spacing or orientation changes) [11].
Rewards (R): Feedback based on the improvement in beam-steering quality [7].
Policy ( $π$ ): Mapping from states to actions.
Value function (V): Expected cumulative reward for states following a policy [40].

2.5.2. Deep Q-Network (DQN)

A Deep Q-Network approximates the optimal action-value function using a neural network to select actions that maximize expected rewards [14].

2.5.3. Batch DQN with Offline Learning

A Batch DQN trains on a fixed dataset of experience tuples

(s, a, r, s^{'})

to avoid costly real-time interactions [41]. Experience replay buffers store these samples to improve learning stability.

The Q-network used in the offline reinforcement learning model is designed to efficiently learn beam-steering policies for the

1 \times 4

antenna array. The network architecture is as follows:

Input layer: Dimension equal to the number of antenna elements (4), fully connected to 128 neurons.
Hidden layer 1: Fully connected, 128 neurons with ReLU activation.
Hidden layer 2: Fully connected, 128 neurons with ReLU activation.
Output layer: Fully connected, 256 neurons corresponding to the discrete action space of the $1 \times 4$ array, where each element can shift its phase by $- π / 8$ , 0, or $+ π / 8$ .

The network is trained using the Adam optimizer with a Huber loss function to stabilize training against outliers in reward estimation. Rewards are normalized to improve convergence, and an epsilon-greedy exploration strategy is used, decaying from 1.0 to 0.01 across training episodes.

This architecture balances sufficient representational capacity to capture the mapping from antenna states to optimal actions while remaining computationally efficient for offline RL training on precomputed datasets.

2.5.4. Loss Function: Huber Loss

The Huber loss function is used for training, providing robustness against noisy data and outliers [42]. It combines mean squared error and absolute error characteristics:

L (δ) = \{\begin{matrix} \frac{1}{2} δ^{2}, & | δ | \leq δ_{\max} \\ δ_{\max} (| δ | - \frac{1}{2} δ_{\max}), & otherwise \end{matrix}

(5)

where

δ = y_{true} - y_{pred}

, and

δ_{\max}

is a threshold hyperparameter.

2.5.5. Batch DQN Algorithm

The proposed Batch DQN algorithm follows a structured reinforcement learning procedure adapted for antenna beam steering. Algorithm 1 presents the pseudocode, and the following discussion highlights the purpose and rationale of each step:

Network initialization: Both the Q-network and target network are initialized with identical weights. The target network is updated periodically to stabilize training, preventing oscillations and divergence.
Experience replay buffer: Transitions are stored in a fixed-size replay buffer, which allows mini-batch sampling. This breaks the correlation between consecutive transitions and improves convergence stability.
Epsilon-greedy policy: Actions are selected randomly with probability $ϵ$ to encourage exploration, while the remainder are chosen according to the policy (max Q-value). The exploration rate decays over episodes to shift gradually from exploration to exploitation.
Target calculation (Double DQN): For non-terminal states, the target Q-value is computed using the target network to mitigate overestimation bias inherent in standard Q-learning.
Huber loss: The Huber loss is applied to the TD error to provide robustness against outliers in reward estimation and stabilize gradient updates.
Weight updates: Gradients computed from the loss are backpropagated to update the Q-network weights. Periodic synchronization with the target network ensures stable learning.

This procedure enables efficient offline reinforcement learning from precomputed datasets. The batch updates and replay buffer ensure data efficiency and stability, while the Double DQN strategy and Huber loss mitigate overestimation and divergence. Overall, the algorithm is designed to learn robust beam-steering policies for conformal antenna arrays in LEO satellite environments without requiring costly online interactions.

Algorithm 1 Batch DQN algorithm

1:: Initialize Q-network and Q-target network
2:: Initialize experience replay buffer
3:: Set hyperparameters: learning_rate, $γ$ (discount factor), batch_size, $ϵ$ (exploration rate), etc.
4:: for episode = 1 to num_episodes do
5:: state ← reset_environment() ▹ Reset the environment to initial state
6:: while not done do
7:: if random() < $ϵ$ then
8:: action ← random_action()
9:: else
10:: action ← arg max Q-network(state)
11:: end if
12:: next_state, reward, done ← take_action(state, action)
13:: store (state, action, reward, next_state, done) in replay_buffer
14:: sample_batch ← sample_batch(replay_buffer, batch_size)
15:: for (s, a, r, s’, done) in sample_batch do
16:: if done then
17:: target ← r
18:: else
19:: target ← $r + γ max$ Q-target_network $(s^{'})$
20:: end if
21:: current_q_value ← Q-network(s)[a]
22:: $δ \leftarrow$ target - current_q_value
23:: if $| δ | \leq δ_{\max}$ then
24:: loss ← 0.5 * $δ^{2}$
25:: else
26:: loss ← $δ_{\max}$ * ( $| δ |$ - 0.5 * $δ_{\max}$ )
27:: end if
28:: gradients ← compute_gradients(Q-network, loss)
29:: Q-network.update_weights(gradients)
30:: end for
31:: if episode % target_update_freq == 0 then
32:: Q-target_network.copy_weights(Q-network)
33:: end if
34:: $ϵ$ ← $ϵ$ * $ϵ_{decay}$ ▹ Decay epsilon for exploration–exploitation trade-off
35:: end while
36:: end for

3. Results

3.1. Ensemble Model Performance

This section presents the results of the proposed end-to-end model for optimizing conformal antenna arrays in IoT applications, focusing on the stacking ensemble model used for predicting antenna design parameters and reinforcement learning (RL) optimization for beam steering. The evaluation metrics include prediction accuracy, optimization efficiency, and beamforming performance relevant to typical IoT communication scenarios.

Stacking Ensemble Model

The stacking ensemble model, incorporating Linear Regression (LR), Support Vector Regression (SVR), Gradient Boosting (GB), and Extreme Gradient Boosting (XGBoost) as base learners, demonstrated strong predictive capability for key geometric parameters of the IoT antenna array design [43]. Performance was assessed by comparing predicted parameters against ground-truth values from standard antenna design references [44].

Figure 4 shows the comparison between the true geometric parameters and the predictions generated by the proposed ensemble model for the test dataset. Most points lie close to the diagonal line, indicating that the ensemble model is accurately capturing the underlying mapping from input features to optimal antenna parameters.

The outliers correspond to configurations where the model predictions deviate significantly from the ground truth. These typically occur for antenna geometries with extreme curvature or edge-case material parameters that were underrepresented in the training dataset. Although the number of outliers is small, they highlight regions of the design space where the model’s generalization is limited. Recognizing these cases is important for understanding model reliability and for guiding future dataset expansion or refinement. Overall, the figure demonstrates that the ensemble model provides robust predictions across the majority of the design space while also identifying rare challenging configurations for further analysis.

The stacking ensemble reduced the prediction error significantly, achieving an average MSE of 0.06. The individual MSE of each learner is outlined in Table 2. The meta-learner, combining base models, provided robust parameter estimation that enables effective antenna optimization. An R² score of 0.91 confirms that the ensemble model explains 91% of the variance in antenna design parameters relevant to IoT devices.

3.2. Reinforcement Learning-Based Optimization for IoT Beam Steering

Predicted antenna parameters were input to the RL optimization stage in addition to the RL dataset, where a Batch Deep Q-Network (DQN) agent adjusted antenna array characteristics such as element spacing and orientation to maximize directional gain and signal quality in typical IoT operating bands (e.g., 2.4 GHz ISM band) [45]. The offline learning approach minimizes the need for costly real-time experimentation, suitable for resource-constrained IoT environments. Figure 5 shows the rate of convergence of the reinforcement learning model highlighting the decrease in the training and testing loss over multiple epochs.

Optimization results were benchmarked against a baseline rectangular patch antenna and a traditional particle swarm optimization (PSO) algorithm [46,47].

The proposed DQN model achieved the highest gain (12.5 dB) and best impedance matching (

S_{11}

of −17 dB), outperforming both the baseline and PSO methods. These results outlined in Table 3 demonstrate improved antenna performance critical for IoT devices, where enhanced beam steering improves communication range and reduces interference.

3.3. Generalization and Robustness

We tested the optimization model on a variety of IoT antenna array configurations with different element sizes and operating frequencies to assess robustness. The model consistently improved gain and reflection coefficient values across all configurations, validating its adaptability to diverse IoT hardware constraints.

Figure 6 presents the gain values of the synthesized antenna array obtained from CST simulations, which serve to cross-validate the predictions generated by the proposed deep learning model. Each data point corresponds to a specific antenna configuration, and the results demonstrate strong agreement between the simulated gains and the model predictions.

This validation confirms that the stacking ensemble model accurately captures the underlying physical behavior of the antenna array, including its performance in conformal geometries. By comparing the predicted and simulated gain patterns, Figure 6 highlights the model’s reliability and provides confidence in its applicability for designing antennas across diverse configurations.

The findings from Table 4 highlight the model’s capability to generalize antenna optimization across multiple IoT device form factors and environmental conditions.

3.4. Limitations and Future Directions

Despite promising results, the Batch DQN approach relies on pre-collected offline datasets, limiting adaptability to dynamic IoT environments with fluctuating signal conditions. Future work will integrate online learning and explore policy-gradient methods such as Proximal Policy Optimization (PPO) to improve real-time adaptability. Additionally, extending the model to multi-band and multi-antenna (MIMO) IoT systems can further enhance communication reliability and throughput.

4. Discussion

This paper presents a novel approach to antenna array synthesis for IoT applications by integrating machine learning ensemble methods with reinforcement learning to predict and optimize antenna array geometric parameters for conformal and resource-constrained devices. Our method addresses traditional challenges in IoT antenna design, such as high-dimensional, nonconvex, and discontinuous optimization problems, which are often exacerbated by IoT devices’ size, power, and deployment constraints.

4.1. Strengths of the Approach

The multi-model stacking ensemble used for predicting antenna parameters significantly improves prediction accuracy while reducing the need for extensive labeled datasets, which are difficult to obtain for diverse IoT antenna designs. By coupling this ensemble with offline reinforcement learning powered by a Deep Q-Network (DQN), the model iteratively refines antenna parameters to optimize beam-steering performance without requiring costly real-time experimentation or physical prototyping.

Offline reinforcement learning is especially advantageous in IoT contexts, where devices are often deployed in environments where on-the-fly training or feedback collection is infeasible. Leveraging simulation data from tools like CST and COMSOL, our approach enables rapid and scalable training cycles while accommodating complex antenna geometries relevant to conformal and compact IoT antennas.

The results confirm that the predicted antenna parameters correlate well with simulation benchmarks, underscoring the model’s reliability and applicability for practical IoT antenna design tasks. This reduces reliance on time-consuming manual tuning and enables the development of optimized antenna arrays that improve communication range, energy efficiency, and interference mitigation in IoT networks.

4.2. Comparison with Traditional Methods

Conventional IoT antenna design methods, including manual optimization, evolutionary algorithms (e.g., PSO, genetic algorithms), or gradient-based techniques, typically require real-time interaction with physical devices or extensive iterative testing. These approaches are often computationally expensive and poorly suited for high-dimensional, nonconvex antenna design spaces typical of conformal IoT antennas [11,49].

Recent AI-based methods, including deep learning, actor–critic reinforcement learning, and hybrid optimization approaches, have demonstrated improved efficiency and performance in antenna design tasks [50,51,52]. However, most of these approaches either rely on online interaction with the environment, require repeated calls to electromagnetic solvers, or are limited to specific array types and geometries.

Our integrated ensemble and offline reinforcement learning framework provides a scalable and efficient alternative. By combining ensemble ML for rapid prediction of near-optimal antenna parameters with offline RL for adaptive beam steering, our approach eliminates the need for repeated simulations during training and generalizes across multiple antenna types and conformal geometries. This decoupling of training from real-time experimentation, along with the global optimization capabilities of reinforcement learning, overcomes limitations of local optima entrapment and high computational cost. Consequently, our framework achieves faster convergence to optimal antenna configurations and better adaptability to diverse IoT device constraints [49,50,51,52].

4.3. Robustness and Assumptions

The proposed framework relies on a synthetic dataset generated using an aperture-based model, which provides a general description of antenna behavior without explicitly requiring physical dimensions. To account for practical effects such as scattering, edge losses, and material characteristics, efficiency and aperture efficiency corrections are incorporated. This ensures that the synthetic data closely approximates the true electromagnetic behavior of the antennas while maintaining flexibility across diverse geometries.

To validate the accuracy and robustness of the models, the predictions from the deep learning and offline RL framework are cross-validated against CST simulations, as shown in Figure 6 and Figure 7. The ensemble ML model accurately captures key parameters such as gain and reflection coefficients, even for conformal configurations realized with a flexible 1 × 4 microstrip patch array using NinjaFlex substrate and Electrifi conductive traces. Conformal bending, different element sizes, and array types (patch, horn, spiral) were explicitly included in the dataset to test the generalization capability of the models.

While the assumptions in uniform sampling and empirical formulas are acknowledged, the cross-validation against rigorous numerical EM simulations demonstrates that the framework effectively corrects for approximations and maintains consistent performance across the design space. Furthermore, a basic sensitivity analysis was performed by perturbing key parameters (e.g., element spacing, curvature radius) within realistic ranges, confirming that model outputs remain stable and robust. This approach allows optimization across a wide variety of antenna types, sizes, array configurations, and conformal geometries without the need for on-board satellite testing at this stage.

4.4. Practical Implications and Future Work

The proposed method has significant implications for IoT industries, including smart home devices, wearable sensors, and industrial IoT, where antenna performance directly impacts device reliability and network connectivity. The ability to autonomously optimize antenna arrays without physical trial and error expedites device prototyping and facilitates adaptive antenna designs that can adjust to dynamic deployment scenarios.

Nonetheless, some limitations warrant further investigation. The current reliance on simulation-generated offline datasets may not capture all environmental variables encountered by IoT devices in real-world deployments, such as multipath effects, interference, or device orientation variability. Incorporating real-world measurement data and online learning mechanisms will be essential to improve model robustness and adaptability.

Future work could also explore reinforcement learning algorithms beyond the DQN, such as Proximal Policy Optimization (PPO) or Advantage Actor–Critic (A2C), to enhance learning efficiency and adaptability in complex IoT antenna optimization problems. Expanding the approach to support multi-band antennas and MIMO (Multiple-Input Multiple-Output) systems would further broaden its applicability and impact across diverse IoT communication standards.

Overall, this research lays the groundwork for fully autonomous, efficient, and scalable antenna design frameworks tailored to the evolving demands of IoT technology ecosystems.

5. Conclusions

In this work, a novel, data-driven framework for antenna array synthesis in IoT applications was developed by integrating ensemble learning with offline reinforcement learning. The proposed method overcomes the spatial and hardware constraints of conventional design approaches by automatically learning the relationships among geometric parameters and optimizing beam-steering performance without relying on time-consuming physical prototyping. Experimental results demonstrate substantial gains: for small-element arrays, antenna gain increased from 8.0 dB to 10.8 dB and the reflection coefficient

S_{11}

improved from −11 dB to −15 dB; for larger arrays, gain rose from 8.3 dB to 11.3 dB with

S_{11}

enhancements from −11 dB to −16 dB. These findings confirm that the integrated learning-based approach significantly elevates array performance in resource-limited settings. By automating design and minimizing manual tuning, this framework establishes a scalable pathway toward adaptive, conformal antenna systems capable of meeting the dynamic demands of emerging IoT networks. Future work will focus on incorporating real-world deployment data and exploring advanced reinforcement-learning strategies to further improve robustness and generalization across diverse operational environments.

6. Data and Code Availability

The related code and synthetic data used for training the initial reinforcement learning can be found in the GitHub Repository (https://github.com/valli685/A-Multi-Stage-Deep-Learning-Framework-for-Antenna-Array-Synthesis-in-Satellite-IoT-Networks, accessed on 20 September 2025).

Author Contributions

Conceptualization, V.A.; methodology, V.A.; software, V.A.; validation, D.M., L.R. and V.A.; formal analysis, V.A.; investigation, V.A. and L.R.; resources, D.M. and R.G.; data curation, L.R.; writing—original draft preparation, V.A.; writing—review and editing, D.M., R.G., M.R.A. and S.D.; visualization, D.M., V.A. and L.R.; supervision, D.M.; project administration, D.M.; funding acquisition, D.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by WiSys and the Universities of Wisconsin applied research funding (Ignite Grant for Applied Research) under grant no. FY25-106-068000-4.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Acknowledgments

The authors would like to acknowledge the support received from the Computer Science and Computer Engineering Department at the University of Wisconsin-La Crosse (UW-L) to pursue the research work. They would like to extend thanks to the Dean’s Distinguished Fellowship Committee at UW-L for providing the Fellowship and supporting our work. AI-assisted tools were used in the preparation of this manuscript for language editing.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

AI	Artificial Intelligence
DE	Differential Evolution
DNN	Deep Neural Network
DQN	Deep Q-Network
GB	Gradient Boosting
GA	Genetic Algorithm
IoT	Internet of Things
LEO	Low Earth Orbit
LR	Linear Regression
MDP	Markov Decision Process
MIMO	Multiple-Input and Multiple-Output
ML	Machine Learning
MSE	Mean Squared Error
PPO	Proximal Policy Optimization
PSO	Particle Swarm Optimization
RL	Reinforcement Learning
SA	Simulated Annealing
SVR	Support Vector Regression
UAV	Unmanned Aerial Vehicle
XGBoost	Extreme Gradient Boosting

References

Balanis, C.A. Antenna Theory: Analysis and Design; John Wiley & Sons: Hoboken, NJ, USA, 2016. [Google Scholar]
Ferreira, D.B.; de Paula, C.B.; Nascimento, D.C. Design Techniques for Conformal Microstrip Antennas and Their Arrays. In Advancement in Microstrip Antennas with Recent Applications; InTech: London, UK, 2013. [Google Scholar]
Veera, S.A.; Suganthi, J.; Kavitha, T. Conformal Antenna for Aircraft Applications. In Proceedings of the 2023 7th International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS), Bangalore, India, 2–4 November 2023; pp. 1–7. [Google Scholar] [CrossRef]
Jensen, N.S.; Christiansen, L.H. Real-time Antenna Array Synthesis Using Machine Learning. TICRA News, 27 May 2024. [Google Scholar]
Usmani, W.U.; Chietera, F.P.; Mescia, L. Flexible Phased Antenna Arrays: A Review. Sensors 2025, 25, 4690. [Google Scholar] [CrossRef]
Goudos, S. Swarm intelligence algorithms for antenna design and wireless communications. In Swarm Intelligence—Volume 3: Applications; The Institution of Engineering and Technology: Stevenage, UK, 2018; pp. 755–784. [Google Scholar] [CrossRef]
Valdez-Cervantes, L.; Núñez, C.; Ripoll, L.; Guerrero-Granados, B. Optimizing Linear Antenna Arrays with Genetic Algorithms. In Proceedings of the 2024 IEEE Colombian Conference on Communications and Computing (COLCOM), Barranquilla, Colombia, 21–23 August 2024; pp. 1–4. [Google Scholar] [CrossRef]
Kirkpatrick, S.; Gelatt, C.D.; Vecchi, M.P. Optimization by Simulated Annealing. Science 1983, 220, 671–680. [Google Scholar] [CrossRef] [PubMed]
Suman, B.; Kumar, P. A survey of simulated annealing as a tool for single and multiobjective optimization. J. Oper. Res. Soc. 2006, 57, 1143–1160. [Google Scholar] [CrossRef]
El Misilmani, H.; Naous, T. Machine Learning in Antenna Design: An Overview on Machine Learning Concept and Algorithms. In Proceedings of the 2019 International Conference on High Performance Computing & Simulation (HPCS), Dublin, Ireland, 15–19 July 2019. [Google Scholar] [CrossRef]
Gajbhiye, P.; Singh, S.; Kumar Sharma, M. A comprehensive review of AI and machine learning techniques in antenna design optimization and measurement. Discov. Electron. 2025, 2, 46. [Google Scholar] [CrossRef]
Ramasamy, R.; Bennet, M.A. An Efficient Antenna Parameters Estimation Using Machine Learning Algorithms. Prog. Electromagn. Res. C 2023, 130, 169–181. [Google Scholar] [CrossRef]
Benoni, A.; Poli, L. Pattern Matching Approach for the Synthesis of Sub-Arrayed Linear Antenna Arrays. In Proceedings of the 2022 IEEE International Symposium on Antennas and Propagation and USNC-URSI Radio Science Meeting (AP-S/URSI), Denver, CO, USA, 10–15 July 2022; pp. 1620–1621. [Google Scholar] [CrossRef]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
Zhang, B.; Jin, C.; Cao, K.; Lv, Q.; Mittra, R. Cognitive Conformal Antenna Array Exploiting Deep Reinforcement Learning Method. IEEE Trans. Antennas Propag. 2022, 70, 5094–5104. [Google Scholar] [CrossRef]
Hessel, M.; Modayil, J.; Van Hasselt, H.; Schaul, T.; Ostrovski, G.; Dabney, W.; Horgan, D.; Piot, B.; Azar, M.; Silver, D. Rainbow: Combining Improvements in Deep Reinforcement Learning. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 2–7 February 2018; pp. 3216–3224. [Google Scholar]
Zhang, S.; Huang, D.; Niu, B.; Bai, M. High-efficient Optimisation Method of Antenna Array Radiation Pattern Synthesis Based on Multi-layer Perceptron Network. IET Microwaves Antennas Propag. 2022, 16, 763–770. [Google Scholar] [CrossRef]
Rengarajan, D.; Ragothaman, N.; Kalathil, D.; Shakkottai, S. Federated Ensemble-Directed Offline Reinforcement Learning. In Proceedings of the NeurIPS Proceedings, Vancouver, BC, Canada, 10–15 December 2024. [Google Scholar]
Zhao, K.; Hao, J.; Ma, Y.; Liu, J.; Zheng, Y.; Meng, Z. ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles. arXiv 2024, arXiv:2306.06871. [Google Scholar] [CrossRef]
Fakharian, M.M. Machine Learning Approach for Evaluation of Beam-String in a Metasurface-Based Terahertz Antenna for 6G Networks. Mater. Today Commun. 2025, 43, 111671. [Google Scholar] [CrossRef]
Gao, P.; Chen, Z. Hybrid Sparse Array Design Based on Pseudo-Random Algorithm and Convex Optimization with Wide Beam Steering. Electronics 2024, 13, 4422. [Google Scholar] [CrossRef]
Huang, Z.; Sun, X.; Wang, Y.; Wei, Z.; Wang, C.; Fan, Y.; Zhao, J. A soft actor–critic reinforcement learning approach for over the air active beamforming with reconfigurable intelligent surface. Phys. Commun. 2024, 66, 102474. [Google Scholar] [CrossRef]
Yuan, Y.; Zhang, L.; Wang, J. Actor-Critic Learning-Based Energy Optimization for UAV-Assisted Networks. J. Wirel. Commun. Netw. 2021, 2021, 78. [Google Scholar] [CrossRef]
Sadiq, M.; Sulaiman, N.; Isa, M.; Hamidon, M.N. A Review on Machine Learning in Smart Antenna: Methods and Techniques. TEM J. 2022, 11, 695–705. [Google Scholar] [CrossRef]
Rao, S.C.; McAllister, P.E.; Kelsall, T. Antenna Engineering Handbook, 4th ed.; McGraw-Hill: New York, NY, USA, 1999. [Google Scholar]
Lu, Y.; Chen, L.; Zhang, Y.; Shen, M.; Wang, H.; Wang, X.; van Rechem, C.; Fu, T.; Wei, W. Machine Learning for Synthetic Data Generation: A Review. arXiv 2025, arXiv:2302.04062v10. [Google Scholar] [CrossRef]
Kraus, J.D.; Marhefka, R.J. Antennas: For All Applications, 3rd ed.; McGraw-Hill: New York, NY, USA, 2002. [Google Scholar]
Tse, D.; Viswanath, P. Fundamentals of Wireless Communication; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
Rana, M.; Rahman, M. Study of Microstrip Patch Antenna for Wireless Communication System. In Proceedings of the 2022 International Conference for Advancement in Technology (ICONAT), Goa, India, 21–22 January 2022; pp. 1–4. [Google Scholar] [CrossRef]
Pozar, D.M. Microwave Engineering, 4th ed.; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
Landron, O.; Feuerstein, M.; Rappaport, T. A comparison of theoretical and empirical reflection coefficients for typical exterior wall surfaces in a mobile radio environment. IEEE Trans. Antennas Propag. 1996, 44, 341–351. [Google Scholar] [CrossRef]
Rappaport, T.S. Wireless Communications: Principles and Practice, 2nd ed.; Prentice Hall: Upper Saddle River, NJ, USA, 2014. [Google Scholar]
Bishop, C.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Vapnik, V. The Nature of Statistical Learning Theory; Springer: Berlin/Heidelberg, Germany, 1995. [Google Scholar]
Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
Wolpert, D.H. Stacked Generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
Bellman, R. Dynamic Programming; Princeton University Press: Princeton, NJ, USA, 1957. [Google Scholar]
Fujimoto, S.; Meger, D.; Precup, D. Off-Policy Deep Reinforcement Learning without Exploration. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; Volume 3, pp. 2051–2060. [Google Scholar]
Huber, P.J. Robust Estimation of a Location Parameter. Ann. Math. Stat. 1964, 35, 73–101. [Google Scholar] [CrossRef]
Mattar, S.E.; Baghdad, A. Design and optimization of a rectangular microstrip patch antenna for dual-band 2.45 GHz/5.8 GHz RFID application. Int. J. Electr. Comput. Eng. (IJECE) 2022, 12, 5114–5122. [Google Scholar] [CrossRef]
Cullen, A. Microstrip Antenna Theory and Design. Electron. Power 1982, 28, 193. [Google Scholar] [CrossRef]
Shah, R.; Haque, M.J.; Samsuzzaman, M.; Masud, M.A.; Azim, R.; Hossain, I. Patch Antenna Design and Optimization Using Machine Learning Techniques. In Proceedings of the 2024 6th International Conference on Sustainable Technologies for Industry 5.0 (STI), Narayanganj, Bangladesh, 14–15 December 2024; pp. 1–6. [Google Scholar] [CrossRef]
Jin, N.; Rahmat-Samii, Y. Particle Swarm Optimization for Antenna Designs in Engineering Electromagnetics. J. Artif. Evol. Appl. 2008, 2008, 728929. [Google Scholar] [CrossRef]
Schlosser, E.R.; Tolfo, S.M.; Heckler, M.V.T. Particle Swarm Optimization for antenna arrays synthesis. In Proceedings of the 2015 SBMO/IEEE MTT-S International Microwave and Optoelectronics Conference (IMOC), Porto de Galinhas, Brazil, 3–6 November 2015; pp. 1–6. [Google Scholar] [CrossRef]
Anchidin, L.; Lavric, A.; Mutescu, P.-M.; Petrariu, A.I.; Popa, V. The Design and Development of a Microstrip Antenna for Internet of Things Applications. Sensors 2023, 23, 1062. [Google Scholar] [CrossRef] [PubMed]
Singh, S.; Singh, H.; Mittal, N.; Kaur Punj, G.; Kumar, L.; Fante, K.A. A hybrid swarm intelligent optimization algorithm for antenna design problems. Sci. Rep. 2025, 15, 4444. [Google Scholar] [CrossRef] [PubMed]
Ye, X.; Mao, Y.; Yu, X.; Sun, S.; Fu, L.; Xu, J. Integrated Sensing and Communications for Low-Altitude Economy: A Deep Reinforcement Learning Approach. arXiv 2024, arXiv:2412.04074. [Google Scholar] [CrossRef]
Xie, C.; Xiu, Y.; Yang, S.; Miao, Q.; Chen, L.; Gao, Y.; Zhang, Z. Deep Reinforcemnet Learning for Robust Beamforming in Integrated Sensing, Communication and Power Transmission Systems. Sensors 2025, 25, 388. [Google Scholar] [CrossRef]
Zhou, X.; Chen, X.; Tong, L.; Wang, Y. Attention-deep reinforcement learning jointly beamforming based on tensor decomposition for RIS-assisted V2X mmWave massive MIMO system. Complex Intell. Syst. 2024, 10, 145–160. [Google Scholar] [CrossRef]

Figure 1. Conceptual diagram outlining the proposed multi-stage deep learning framework, comprising an ensemble model, a neural network, and an offline reinforcement learning model.

Figure 2. Heatmap of gain distribution across the first 50 samples. The x-axis represents gain values; the y-axis denotes sample indices from 1 to 50.

Figure 3. Two-dimensional radiation patterns showing the state and next state values of 3 samples from the reinforcement learning dataset with gain values measured in dBi.

Figure 4. Plot of true geometric parameter vs. predicted geometric parameters showing the results of the proposed ensemble model.

Figure 5. The rate of convergence of the proposed reinforcement learning model showing the decrease in the training and testing loss over epochs.

Figure 6. The synthesized antenna array’s gain values from CST simulations to cross-validate the results from the proposed deep learning model.

Figure 7. An electromagnetic simulation of a 1 × 4 patch antenna array from CST validating the results of the proposed deep learning model, where A = 250 mm, B = 55.5 mm, L = W = 35.6 mm, FL = 9.7 mm, In = 1 mm, g = 1 mm, FW = 4.5 mm, d = 26.9 mm, a = 13.45 mm, and b = 10.2 mm.

Table 1. Structure of each data sample used for training.

Component	Description
State	Phase distribution of a $1 \times 4$ patch antenna array. Discrete values represent the state.
Action	Phase change applied to each element: $- π / 8$ , 0, or $+ π / 8$ .
Next State	Resulting state after applying the action to the current state.
Gain Array	Output from the stacking ensemble model: 360-length array, each index representing gain at a specific angle.
Max Gain Direction	Angle corresponding to the maximum value in the gain array for the given configuration.
Reward	Maximum gain in the direction computed from the next state.

Table 2. Mean squared error (MSE) of ensemble Learners for IoT antenna parameter prediction.

Model	MSE
Base Model (Linear Regression)	0.48
Ensemble Model	0.20
Meta-Learner	0.22
Overall Model (IoT Antenna Prediction)	0.06

Table 3. Beamforming performance for IoT antenna array optimization.

Model	Gain (dB)	Reflection Coefficient ( $S_{11}$ , dB)
Baseline patch antenna array [43,44]	8.5	−11
PSO [46,47]	11.0	−14
DQN optimization (proposed)	12.5	−17

Table 4. Optimization results on various IoT antenna array configurations.

Configuration	Gain (dB)	$S_{11}$ (dB)
Small Element Size (Baseline) [1,48]	8.0	−11
Large Element Size (Baseline) [1,48]	8.3	−11
Small Element Size (Optimized)	10.8	−15
Large Element Size (Optimized)	11.3	−16

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Arunachalam, V.; Rosen, L.; Akinsiku, M.R.; Dey, S.; Gomes, R.; Mitra, D. A Multi-Stage Deep Learning Framework for Antenna Array Synthesis in Satellite IoT Networks. AI 2025, 6, 248. https://doi.org/10.3390/ai6100248

AMA Style

Arunachalam V, Rosen L, Akinsiku MR, Dey S, Gomes R, Mitra D. A Multi-Stage Deep Learning Framework for Antenna Array Synthesis in Satellite IoT Networks. AI. 2025; 6(10):248. https://doi.org/10.3390/ai6100248

Chicago/Turabian Style

Arunachalam, Valliammai, Luke Rosen, Mojisola Rachel Akinsiku, Shuvashis Dey, Rahul Gomes, and Dipankar Mitra. 2025. "A Multi-Stage Deep Learning Framework for Antenna Array Synthesis in Satellite IoT Networks" AI 6, no. 10: 248. https://doi.org/10.3390/ai6100248

APA Style

Arunachalam, V., Rosen, L., Akinsiku, M. R., Dey, S., Gomes, R., & Mitra, D. (2025). A Multi-Stage Deep Learning Framework for Antenna Array Synthesis in Satellite IoT Networks. AI, 6(10), 248. https://doi.org/10.3390/ai6100248

Article Menu

A Multi-Stage Deep Learning Framework for Antenna Array Synthesis in Satellite IoT Networks

Abstract

1. Introduction

1.1. Related Works

1.2. Contributions

2. Materials and Methods

2.1. Overview

2.2. Dataset

2.3. Related Works

2.3.1. Antenna Gain Calculation

2.3.2. Resonant Frequency and Bandwidth

2.3.3. Reflection Coefficient

2.3.4. Synthetic Dataset Generation Procedure

2.3.5. Reinforcement Learning Dataset

2.4. Stacking Ensemble Model

2.4.1. Base Learner

2.4.2. Primary Learners

2.4.3. Meta-Learner

2.4.4. Input Features

2.4.5. Output

2.5. Reinforcement Learning Optimization

2.5.1. Markov Decision Process Formulation

2.5.2. Deep Q-Network (DQN)

2.5.3. Batch DQN with Offline Learning

2.5.4. Loss Function: Huber Loss

2.5.5. Batch DQN Algorithm

3. Results

3.1. Ensemble Model Performance

Stacking Ensemble Model

3.2. Reinforcement Learning-Based Optimization for IoT Beam Steering

3.3. Generalization and Robustness

3.4. Limitations and Future Directions

4. Discussion

4.1. Strengths of the Approach

4.2. Comparison with Traditional Methods

4.3. Robustness and Assumptions

4.4. Practical Implications and Future Work

5. Conclusions

6. Data and Code Availability

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI