Physics-Informed Reinforcement Learning for Multi-Band Octagonal Fractal Frequency-Selective Surface Optimization

Dong, Gaoya; Liu, Ming; He, Xin

doi:10.3390/electronics14234656

Open AccessArticle

Physics-Informed Reinforcement Learning for Multi-Band Octagonal Fractal Frequency-Selective Surface Optimization

by

Gaoya Dong

^1,2,*

,

Ming Liu

¹ and

Xin He

¹

School of Computer and Communication Engineering, Beijing University of Science and Technology, Beijing 100083, China

²

Shunde Innovation School, University of Science and Technology Beijing, Foshan 528000, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(23), 4656; https://doi.org/10.3390/electronics14234656

Submission received: 27 October 2025 / Revised: 23 November 2025 / Accepted: 24 November 2025 / Published: 26 November 2025

(This article belongs to the Special Issue Reinforcement Learning: Emerging Techniques and Future Prospects)

Download

Browse Figures

Versions Notes

Abstract

Diverse application scenarios demand frequency-selective surfaces (FSSs) with tailored center frequencies and bandwidths. However, their design traditionally relies on iterative full-wave simulations using tools such as the High-Frequency Structure Simulator (HFSS) and Computer Simulation Technology (CST), which are time-consuming and labor-intensive. To overcome these limitations, this work proposes an octagonal fractal frequency-selective surface (OF-FSS) composed of a square ring resonator and an octagonal fractal geometry, where the fractal configuration supports single-band and multi-band resonance. A physics-informed reinforcement learning (PIRL) algorithm is developed, enabling the RL agent to directly interact with CST and autonomously optimize key structural parameters. Using the proposed PIRL framework, the OF-FSS achieves both single-band and dual-band responses with desired frequency responses. Full-wave simulations validate that the integration of OF-FSS and PIRL provides an efficient and physically interpretable strategy for designing advanced multi-band FSSs.

Keywords:

octagonal fractal frequency-selective surface (OF-FSS); physics-informed reinforcement learning (PIRL); frequency-selective surfaces (FSSs)

1. Introduction

Electromagnetic metamaterials are artificial structures composed of subwavelength unit cells, whose ability to manipulate electromagnetic waves is governed by the geometry and arrangement of these cells. Depending on their wave-modulation functionalities, metamaterials can be classified into frequency-selective surfaces (FSSs) [1,2], absorbers [3], frequency-selective absorbers (FSAs) [4], and polarization converters [5,6,7,8]. Among these, FSSs and their derivative structures have been widely studied for diverse applications. For instance, Li et al. proposed a multi-passband metasurface with ultra-high angular stability based on complementary FSSs [9], an angle-selective surface designed for narrow-angle filtering across variable frequencies [10], and a miniaturized frequency-selective rasorber (FSR) based on complementary bandpass FSSs, tailored for 5G applications [11]. The design and optimization of FSSs and their derivatives typically rely on professional electromagnetic simulation tools, such as High-Frequency Structure Simulator (HFSS) or Computer Simulation Technology (CST). However, these usually require hundreds or even thousands of full-wave simulations, making the process time-consuming and highly dependent on manual intervention. Therefore, developing efficient and automated optimization strategies for metamaterials has become imperative.

To overcome this limitation, data-driven surrogate models based on machine learning (ML) have been developed [12,13,14,15,16,17,18,19,20,21,22,23,24] to replace full-wave simulations, thereby simplifying the design and optimization of metamaterials. Specifically, existing ML approaches can generally be categorized into semi-automated [12,13,14,15,16,17,18,19,20,21] and fully automated methods [22,23,24]. In semi-automated approaches, experts first define the topology of the metamaterial structure, after which ML algorithms are employed to optimize its geometric parameters. For instance, pre-trained support vector machines (SVMs) have been utilized to accelerate reflectarray optimization [12,13], while neural networks have been applied to predict the reflection phase of Minkowski fractal reflectarray elements [15]. Supervised learning has also been employed to optimize the Jerusalem-cross topology, effectively reducing the radar cross-section (RCS) [14]. In addition to traditional parameter optimization methods, semi-automated approaches have emerged, incorporating machine learning (ML)-enhanced intelligent optimization schemes that improve upon conventional algorithms. For example, Liu et al. proposed Neuro Crossover, a reinforcement learning (RL)-based genetic locus selection strategy for genetic algorithms, providing a novel approach for integrating RL with heuristic optimization [25]. Notably, targeted semi-automated methods for FSS inverse design have also been developed. Zhu et al. introduced a Fourier subspace-based deep learning method (FS-BDLM) to reduce the input dimensionality of S-parameters, enabling compact and noise-robust inverse modeling for dual-passband FSSs, with significantly improved efficiency over genetic algorithms (GA) and quasi-Newton (QN) methods [26]. Similarly, Cong et al. combined an equivalent circuit model (ECM) with a cascade neural network to design a wideband FSS. In this approach, ECM ensures high-quality training data and accurate target description, achieving 98.9% passband and 95.3% stopband coverage [27].

In contrast, fully automatic methods [22,23,24] aim to achieve the end-to-end automated metamaterial design. In [22], a metasurface design algorithm was developed by integrating a variational autoencoder (VAE) with a predictor–optimizer framework, requiring 17,500 and 16,500 training samples for bilayer and trilayer structures, respectively. In [21], a bianisotropic metasurface optimization technique was developed by combining the method of moments (MoM) for parameter pre-screening with a surrogate model trained on 700,000 samples. In [24], a 12-layer deep neural network (DNN) trained on 21,000 samples was used for the design of an all-dielectric metasurface. For FSS design, Zhu et al. proposed an adversarial-network-regularized inverse-design scheme with frequency-temporal deep learning (AR-FTDL). This approach integrates generative adversarial networks (GANs) for demand-to-data mapping, frequency-temporal inversion networks for data-to-geometry mapping, and a pseudo-twin network for fabrication constraint verification. The AR-FTDL method enables end-to-end design with improved generalization and avoids local optima, outperforming traditional optimization techniques [28]. Additionally, deep learning has been successfully applied to other fields, such as fault diagnosis in machine health monitoring. For instance, Liu et al. introduced an LSTM-GAN-AE framework to tackle fault detection challenges [29], highlighting the adaptability of deep learning models for tasks such as metamaterial performance prediction or defect detection. Nevertheless, surrogate model-based approaches typically require tens of thousands of training samples, leading to high computational costs and limited generalization beyond the training frequency range.

To further improve efficiency, several reinforcement learning (RL)-based optimization frameworks have been proposed. In [30], an RL model was integrated with a DNN-based surrogate to replace time-consuming CST simulations, reducing the total design time for 1 × 2 arrays from 168 to 40.4 h. A hybrid strategy combining differential evolution (DE), decision trees, and deep Q-networks was introduced in [31] for antenna optimization, achieving faster convergence than conventional RL. In [32], knowledge-based deep reinforcement learning (KBDRL) was employed to enhance antenna bandwidth, optimizing a patch antenna with 39.19% bandwidth within 48 h, while the conventional trust-region method achieved only 12.62% bandwidth. RL has demonstrated significant effectiveness in large-scale network optimization. For example, Liu et al. applied digital twin and multi-agent RL for optimizing mobile network coverage [33,34,35] and proposed a distributed deep RL approach for efficient task offloading in vehicular edge networks with dependency guarantees [34]. These studies highlight the scalability and adaptability of RL in complex engineering optimization problems, offering valuable insights for applying RL-based frameworks to metamaterial design-particularly in large-scale or multi-objective optimization tasks where traditional methods face challenges.

Beyond efficiency, several studies have also aimed to improve generalization. In [36], a relational induction neural network (RINN) combining clustering and deep RL was proposed to optimize antenna arrays and filters, showing enhanced generalization. In [37], an automatic antenna optimization framework was introduced by combining the imitation learning pre-training, the deep deterministic policy gradient (DDPG) algorithm and antenna-specific knowledge. This framework effectively addresses the inefficiencies of data utilization and time-consuming training in traditional RL. Using this approach, antennas operating in the 3.2–3.8 GHz and 3.0–5.0 GHz bands were successfully optimized, further verifying the generalization capability of the method.

In this study, we propose a physics-informed reinforcement learning (PIRL) framework for the efficient design and optimization of octagonal fractal frequency-selective surfaces (OF-FSSs). The proposed approach integrates RL with the intrinsic physical principles of fractal resonance to accelerate convergence and enhance generalization under limited training data. Within the PIRL framework, the frequency response and geometric parameter vector of the OF-FSS are formulated as continuous state and action spaces, respectively. By embedding the operating mechanisms of the fractal geometry into the action space design, the proposed method enables efficient policy learning, particularly under limited training data. Full-wave simulations validate the proposed approach, demonstrating its ability to achieve both single-band and dual-band resonances with desired center frequencies and bandwidths. These results highlight the potential of physics-informed RL as a powerful and generalizable tool for the intelligent design of advanced sing-band and dual-band FSSs.

2. Problem Formulation and Methodology

2.1. Octagonal Fractal Frequency-Selective Metasurface

2.1.1. The Structure of the Designed OF-FSS

To meet the diverse requirements for center frequency and bandwidth across different application scenarios, a novel OF-FSS is proposed by combining a square ring resonator with an octagonal fractal geometry. The octagonal fractal unit is selected primarily for its eight-fold rotational symmetry, which ensures polarization robustness and supports multi-modal resonances, enabling flexible switching between single- and dual-band responses by varying the fractal order. In comparison, other common fractal structures have notable drawbacks: triangular fractals exhibit polarization sensitivity, hexagonal configurations suffer from strong inter-element coupling, and Sierpiński-based fractals face fabrication challenges and severe parameter coupling. Therefore, the octagonal structure provides an optimal balance between electromagnetic performance, structural simplicity, and compatibility with the proposed knowledge-guided optimization framework.

Specifically, the proposed fractal geometry is derived from a regular octagon with a side length of ‘D’, and successive iterations are generated by using a scaling factor of 1/2, resulting in first-, second-, and third-order structures (k = 1, 2, 3). The fractal order k = 1, 2, 3 was chosen because it provides a sufficient number of resonances to realize the targeted single- and dual-band frequency-selective surfaces (FSS) analyzed in this study. While higher fractal orders can introduce additional resonant frequencies, the present study focuses specifically on single- and dual-band designs. The effects of higher fractal orders and the associated multi-resonant behavior will be systematically investigated in our future research. The corresponding OF-FSS configurations are illustrated in Figure 1.

The initial geometric parameters of the OF-FSS are determined based on fundamental electromagnetic resonance principles. For a frequency-selective surface, the relationship between the resonant frequency

f_{0}

and the characteristic dimension can be approximated by:

f_{0} = c / (2 l_{e f f} \sqrt{ε_{e f f}})

, where c is the speed of light,

l_{e f f}

is the effective electrical length of the resonator, and

ε_{e f f}

is the effective permittivity. Based on this relationship and the target frequency specifications, initial values for the outer square ring dimension L and inner octagon dimension D are estimated.

Owing to their inherent self-similarity, fractal geometries can produce either single-band or dual-band resonances. Specifically, the first- and second-order OF-FSSs exhibit single-band resonance modes, while the third-order OF-FSS exhibits dual-band resonance modes, making it well suited for FSSs with single-band and dual-band. In this study, different fractal structures are selected according to the specific optimization objectives.

2.1.2. The Operating Mechanism of the Designed OF-FSS

Compared with the full-wave numerical simulations, the lumped equivalent circuit (LEC) offers a clear and efficient method to reveal the operating mechanisms, estimate resonant frequencies, and guide initial design optimization without excessive computational cost. Thus, the LEC method is employed to analyze the operating principles of the proposed OF-FSS. In this model, the metallization patterns are modeled as inductors (L) and capacitors (C), while the dielectric substrate is modeled as a transmission line (

Z_{0} / \sqrt{ε_{r}}, h

). In this configuration, the combination of the octagonal fractal patch and the outer square ring can be represented by the lumped equivalent circuit shown in Figure 2. Specifically, the dual-mode resonances associated with the octagonal fractal patch can be represented by

L_{1}

,

L_{2}

,

L_{3}

and

C_{3}

, while the outer square ring structure is represented by

L_{3}

and

C_{3}

.

Based on the lumped-equivalent circuit shown in Figure 2, the ABCD transmission matrix of the OF-FSS is formulated and expressed in Equation (1). The corresponding scattering parameters derived from this matrix are presented in Equation (2).

\begin{array}{l} [\begin{matrix} A & B \\ C & D \end{matrix}] = [\begin{matrix} 1 & 0 \\ 1 / Z_{A} & 1 \end{matrix}] [\begin{matrix} 1 & 0 \\ 1 / Z_{B} & 1 \end{matrix}] [\begin{matrix} \cos θ & j \sin θ \cdot Z_{0} / \sqrt{ε_{r}} \\ j \frac{\sin θ}{Z_{0} / \sqrt{ε_{r}}} & \cos θ \end{matrix}] [\begin{matrix} 1 & 0 \\ \infty & 1 \end{matrix}] \\ = [\begin{matrix} j \sin θ \cdot Z_{0} / \sqrt{ε_{r}} & 0 \\ j \frac{Z_{0}}{\sqrt{ε_{r}}} \cdot \sin θ (\frac{1}{Z_{A}} + \frac{1}{Z_{B}}) + \cos θ & 0 \end{matrix}] \end{array}

(1)

where

Z_{A} = \frac{(1 - ω^{2} \cdot L_{1} \cdot C_{1}) \cdot (1 - ω^{2} \cdot L_{2} \cdot C_{2})}{j ω C_{2} (1 - ω^{2} \cdot L_{1} \cdot C_{1}) + j ω C_{1} (1 - ω^{2} \cdot L_{2} \cdot C)}

,

Z_{B} = \frac{j ω \cdot L_{3}}{1 - ω^{2} \cdot C_{3} \cdot L_{3}}

S_{11} = \frac{- \cos θ + j \sin θ / \sqrt{ε_{r}} \cdot [1 - Z_{0} \cdot (1 / Z_{a} + 1 / Z_{b})]}{\cos θ + j \sin θ / \sqrt{ε_{r}} \cdot [1 + Z_{0} \cdot (1 / Z_{a} + 1 / Z_{b})]}

(2a)

S_{21} = \frac{2}{\cos θ + j \sin θ / \sqrt{ε_{r}} \cdot [1 + Z_{0} \cdot (1 / Z_{a} + 1 / Z_{b})]}

(2b)

\frac{\sqrt{ε_{r}}}{Z_{0}} \cot θ = \frac{ω \cdot [C_{2} \cdot (1 - ω^{2} \cdot L_{1} \cdot C_{1}) + C_{1} \cdot (1 - ω^{2} \cdot L_{2} \cdot C_{2})]}{(1 - ω^{2} \cdot L_{1} \cdot C_{1}) \cdot (1 - ω^{2} \cdot L_{2} \cdot C_{2})} - \frac{1 - ω^{2} \cdot L_{3} \cdot C_{3}}{ω \cdot L_{3}}

(3)

where

S_{11}

and

S_{21}

denote the reflection and transmission coefficients of the FSS, respectively.

θ

is the incidence angle of the incoming wave.

ε_{r}

is the relative permittivity of the dielectric substrate.

Z_{0}

is the intrinsic impedance of free space.

Z_{A}

and

Z_{B}

represent the characteristic impedances of the equivalent lumped elements corresponding to different parts of the FSS unit cell.

L_{1}

,

L_{2}

,

L_{3}

,

C_{1}

,

C_{2}

, and

C_{3}

are the equivalent inductances and capacitances in the lumped circuit model. ω is the angular frequency of the incident wave.

As observed from Equation (2), the designed OF-FSS operates in transmission mode when

S_{11} = 0

and

S_{21} = 1

. By substituting these conditions into Equation (2), the relationship between the angular frequency (

ω

) and the lumped inductance and capacitance values can be derived and shown in Equation (3). Accordingly, the resonant frequency is determined by the values of

L_{1}

,

L_{2}

,

L_{3}

,

C_{1}

,

C_{2}

and

C_{3}

. Furthermore, by adjusting the parameters of ‘L’ and ‘D’ in the layout (as illustrated in Figure 1), the resonant frequency and bandwidth of the OF-FSS can be flexibly tuned, which provides design flexibility to realize the desired frequency responses.

2.2. Physics-Informed Reinforcement Learning for OF-FSS

RL is typically formalized as a Markov decision process (MDP), in which an agent learns optimal strategies through trial-and-error interactions with an environment. Building on this foundation, we propose a physics-informed reinforcement learning (PIRL) framework for designing and optimizing the octagonal fractal FSS (OF-FSS) by embedding electromagnetic domain knowledge into the learning process. The proposed PIRL achieves high optimization efficiency and strong generalization across different specification targets.

To operationalize this framework within a high-fidelity electromagnetic environment, we develop a CST-in-the-loop workflow that enables direct interaction between the RL agent and full-wave simulations. We implement a Python-driven “CST-in-the-loop” framework that tightly integrates CST Studio Suite with the reinforcement learning (RL) workflow. A Python 3.9.13 script interfaces with CST via the official cst.interface API and supplementary VBA macros to perform geometry transformations and ASCII data export. In each RL step, a discrete multiplicative action is applied to two design variables-the octagonal side length (using Transform.Scale) and the parameter L (via StoreParameter followed by Rebuild). The CST solver is then executed from Python, after which the S-parameters (S11/S21) are exported using ASCIIExport. The bandwidth (Bw) and transmission (Ft) metrics are subsequently computed from the resulting CSV files using a −10 dB grid-based criterion.

The framework of proposed PIRL is exhibited in Figure 3. Specifically, based on the presented PIRL, the OF-FSS can achieve the specific frequency response by iteratively interacting between the agent and the CST simulation environment. At each time step t, the agent selects an action (

a_{t}

) based on policy π_s, which causes the frequency response to transition from the current state (

s_{t}

) to the next state (

s_{t + 1}

). The reward (

r_{t}

) is then used to evaluate the quality of the chosen action. Detailed descriptions of the process are provided below.

2.2.1. Environment (CSTEnv)

The CST Studio Suite (CST) is adopted as the environment. Once the agent selects an action, CST automatically updates the OF-FSS configuration, runs a full-wave simulation, and returns the resulting frequency response.

2.2.2. State Space

The state space represents the frequency responses of the OF-FSS under different actions. Specifically, at time step t, the state

s_{t}

consists of the resonant center frequency

f_{t}

and the passband bandwidth

{B W}_{t}

extracted from the simulated response. Accordingly, the state of the agent is defined in Equation (4).

s_{i} = (f_{i}, B W_{i})

(4)

2.2.3. Action Space

The action space defines the optimization capability of the agent. In this work, the action space is illustrated as Equation (5), where

D_{i}

and

L_{i}

denote the side length of the inner octagon and the outer square ring, respectively.

a_{i} = < D_{i}, L_{i} >

(5)

2.2.4. Reward Function

Owing to their self-similar nature, fractal geometries inherently generate multiple resonant modes, enabling the OF-FSS to support both single-band and dual-band responses. To effectively guide the agent toward these multi-resonant optimization objectives, the reward function simultaneously evaluates the deviations between the target and the simulated frequencies and bandwidths. This reward function is carefully designed to provide informative learning signals, encouraging the agent to achieve the desired center frequencies and bandwidths within the specified frequency range. The reward for the i-th target band is defined in formulation (6), where (6a) corresponds to the reward function for the single-band optimization objective, and (6b) represents that for the dual-band optimization objective.

R_{i} = - (| f_{i} - f_{o b j} | + | B W_{i} - B W_{o b j} |)

(6a)

R_{i} = - (| f_{i, 1} - f_{o b j, 1} | + | B W_{i, 1} - B W_{o b j, 1} |) - (| f_{i, 2} - f_{o b j, 2} | + | B W_{i, 2} - B W_{o b j, 2} |)

(6b)

R_{t o t a l} = (3 \cdot R_{i}) + 1

(6c)

\{\begin{cases} R_{i} = - 0.3, R_{i} < - 0.3 \\ R_{i} = - 0.03, R_{i} > - 0.3 \end{cases}

(6d)

2.2.5. Agent Implementation and Training Process

An off-policy Q-learning algorithm is employed to optimize the OF-FSS by iteratively updating the state–action value function toward the optimal policy.

Q (s_{t}, a_{t}) = E [U_{t} | s_{t}, a_{t}] = E [\sum_{k = 0}^{\infty} γ^{k} r_{t + k + 1} | s_{t}, a_{t}] = E [r_{t + 1} + γ r_{t + 2} γ^{2} r_{t + 3} + \dots, | s_{t}, a_{t}]

(7)

where E denotes the expectation operation, and

γ

is the discount factor balancing immediate and future rewards. In complex electromagnetic design scenarios, directly computing long-term returns is often computationally expensive. Therefore, temporal-difference (TD) learning is employed to iteratively update the Q-values according to (8):

Q (s_{t}, a_{t}) = Q (s_{t}, a_{t}) + α \cdot [r (s_{t}, a_{t}) + γ \cdot \max_{a_{t + 1} \in A} Q (s_{t + 1}, a_{t + 1}) - Q (s_{t}, a_{t})]

(8)

where

α \in (0, 1]

is the learning rate controlling the update magnitude. All state-action pairs and their corresponding Q-values are stored in a Q-table, which gradually converges to the optimal values through iterative training. Once the Q-table has converged, the optimal policy is derived by selecting the action with the highest Q-value for each state.

The pseudo-code of the proposed PIRL algorithm is summarized in Algorithm 1, and the parameter settings are listed in Table 1.

Algorithm 1: PIRL for OF-FSS Optimization

Initialization: Learning rate

α

, discount factor

γ

,

ε

-greedy

ε

(Epsilon), Epsilon_min, Epsilon_decay, Q-table

Q (s_{t}, a_{t})

arbitrarily for all state-action pairs.
Process:
For episode = 1 to M do:
Reset the environment CSTEnv and get initial state

a_{t}

For step t = 1 to T do:
With probability

ε

: choose a random action

a_{t}

from the full action space A.
Otherwise: choose action

a_{t} = \arg \max_{a} Q (s_{t}, a_{t})

End if
Execute action

a_{t}

and obtain reward

r_{t}

Store {

s_{t}, a_{t}, r_{t,}, s_{t + 1}

} in Q-table
If

s_{t}

meets |

f_{t}

−

f_{o b j}

| <

△ f

and |

B W_{t}

−

B W_{o b j}

| <

△ B W

then
              break (Exit step loop for this episode)
          End if
       End for
End for

The learning parameters (α, γ, ε) were selected based on established reinforcement learning practices and empirical tuning to ensure the stable and efficient convergence of the PIRL-based FSS optimization framework. The final values are provided in Table 1. A small learning rate ensures stable Q-value updates, while the chosen discount factor (γ) prioritizes long-term rewards, which are crucial for multi-step optimization. The exploration rate (ε) strikes a balance between exploring diverse design configurations and exploiting previously acquired knowledge.

3. Optimization Results

To validate the proposed PIRL framework, optimization studies were carried out on the OF-FSS for two representative cases: (i) single-band operating at 2.4 GHz or 3.5 GHz with different bandwidth objectives, and (ii) dual-band operating at 1.8 GHz and 3.5 GHz with various bandwidth requirements. The optimization was performed using the PIRL algorithm described in Section 2, where the RL agent interacts directly with CST to autonomously adjust the parameters (L, D) and achieve the desired frequency responses.

While this study is based on CST simulations, the proposed PIRL-driven optimization framework can be readily extended to experimental verification. The optimized octagonal FSS units consist of standard metal–dielectric–metal layers, which can be easily fabricated using conventional printed circuit board (PCB) processes. Subsequently, the fabricated prototypes can be characterized in a microwave anechoic chamber to validate the simulated transmission and reflection responses of the designed OF-FSS. Due to time constraints, experimental fabrication and measurement are not included in this paper. However, in previous studies [4,5,6,10,11], FSS structures with similar metal-dielectric-metal configurations were both simulated (using CST/HFSS) and experimentally measured, demonstrating excellent agreement between simulation and measurement results. Therefore, the simulation results in this work can be considered highly reliable and experimentally feasible.

3.1. Single Band Optimization at 2.4 GHz and 3.5 GHz

For single-band operation, the PIRL agent autonomously selects from the k = 1, 2 fractal configurations, as shown in Figure 1, and then optimizes the different bandwidth targets.

Figure 4 shows the optimized frequency responses of the OF-FSSs, while the corresponding parameters values, center frequency and bandwidth, are listed in Table 2 and Table 3. For the single-band OF-FSS operating at 2.4 GHz, the PIRL agent achieves bandwidths of 106 MHz, 292 MHz, and 588 MHz, corresponding to the target values of 100 MHz, 300 MHz, and 600 MHz, with relative errors of 6.0%, 2.7%, and 2.0%, respectively. For the center frequency target of 3.5 GHz and bandwidth targets of 150 MHz, 300 MHz, and 600 MHz, the optimized designs achieve actual bandwidths of 146 MHz, 300 MHz, and 599 MHz, yielding relative errors of 2.7%, 0%, and 0.17%, respectively. These results demonstrate that the PIRL algorithm can efficiently design single-band FSSs with diverse bandwidth specifications, achieving high accuracy.

3.2. Dual-Band Optimization at 1.8 GHz/3.5 GHz

For dual-band operation, the PIRL agent autonomously selects the third-order fractal configuration, which inherently supports two independent resonant modes. Five sets of dual-band bandwidth targets were considered and optimized, as illustrated in Figure 5, while the corresponding geometric parameters, center frequencies, and passband bandwidths are summarized in Table 4. In these cases, the design objective was to realize dual-band OF-FSSs operating at 1.8 GHz and 3.5 GHz, with target bandwidth pairs of 250/550 MHz, 300/600 MHz, 300/650 MHz, 300/700 MHz, and 350/750 MHz. Utilizing the proposed PIRL framework, the optimized designs yielded actual bandwidths of 245/566 MHz, 273/609 MHz, 297/668 MHz, 312/694 MHz, and 367/758 MHz, corresponding to average relative errors of 5.8% for the 1.8 GHz band and 2.4% for the 3.5 GHz band. These results demonstrate that the PIRL algorithm can efficiently achieve dual-band responses while maintaining high accuracy across bandwidths.

The proposed method provides a PIRL framework that efficiently generates multiple FSS configurations with specific center frequencies, bandwidths, and resonance modes (single-band or dual-band). These designs are particularly useful for communication and radar systems, where FSSs with different frequency responses can be selectively integrated into subsystems operating at various frequency bands. Additionally, the framework supports engineering standardization, as the learned design knowledge enables the rapid adaptation of parameters to produce a family of FSS units covering specific frequency ranges.

Therefore, even though this work is based on simulation results, the proposed approach is both experimentally feasible and highly practical for developing an adaptable library of FSS structures tailored to diverse electromagnetic applications.

3.3. Analysis of Simulation Time and Convergence Characteristics

To ensure the reliability and efficiency of the numerical simulations conducted in this study, the following optimized experimental configuration was employed: the central processing unit (CPU) was an AMD Ryzen 9 9950X 16-Core Processor, which provides high computational throughput for complex electromagnetic field calculations; the system was equipped with 32 GB of random-access memory (RAM) to support the handling of large-scale simulation models and data sets; the operating system adopted was Windows Server 2019 Datacenter, offering stable and secure runtime environments for long-duration numerical computations; and all electromagnetic simulations were performed using CST Studio Suite 2024, a state-of-the-art finite integration technique (FIT)-based software package widely recognized for its accuracy in electromagnetic modeling and analysis (CST Studio Suite, 2024). This hardware and software configuration was specifically selected to balance computational performance and simulation precision, enabling the efficient implementation of the proposed method and the reliable comparison with state-of-the-art approaches. Based on this configuration, simulations of the OF-FSS for k = 1, 2, and 3 across the 0.5–5 GHz band require 0.6 min, 0.7 min, and 3.0 min, respectively. The total simulation time can be estimated using Equation (9). In this equation, ‘

t_{e}

’ denotes the duration of a single simulation in CST, while ‘i’ represents the total number of simulation runs.

T_{t} = t_{e} \cdot i

(9)

Figure 6 illustrates the reward evolution over iterations. After approximately 130 iterations, the reward exhibits clear convergence behavior and stabilizes around 0.9. The upper bound of this convergence is determined by the definition given in Equation (6).

4. Comparison

At present, researchers are focusing on the field of machine learning (ML) and reinforcement learning (RL) for optimizing frequency-selective surfaces (FSS). Table 5 presents a comparative analysis of the proposed method against existing state-of-the-art approaches. This table compares the methods across five key dimensions: core technology, operating frequency, frequency transferability, polarization sensitivity, and physical interpretability.

The comparisons of OF-FSS operating at 3.5 GHz by adopting different algorithms are shown in Table 6. Accordingly, the proposed PIRL achieves a bandwidth closer to the target with fewer iterations and less computation time than the PSO methods.

As demonstrated by the comparative tables above, the proposed method in this paper possesses multiple advantages. It can not only flexibly realize the response design of single-band and dual-band FSS to meet application requirements in different scenarios, but also effectively reduce the consumption of computing resources and time costs by deeply integrating electromagnetic physical information into the optimization process. For the optimization scenario of the 3.5 GHz frequency band, compared with traditional algorithms (such as PSO), the proposed method can quickly converge to the target frequency point with fewer iterations, significantly improving optimization efficiency. In addition, the octagonal fractal structure designed in this paper, relying on its eight-fold rotational symmetry, exhibits excellent polarization insensitivity. Even in complex electromagnetic environments with arbitrarily polarized incident waves, it can still maintain stable performance. This characteristic greatly enhances the practicality and environmental adaptability of the frequency-selective surface (FSS) in practical engineering scenarios such as communications and radar, laying a foundation for its large-scale application.

5. Conclusions

In conclusion, this study presents a PIRL framework for the intelligent design and optimization of OF-FSSs. By embedding electromagnetic prior knowledge into the reinforcement learning process, the proposed approach effectively bridges the gap between data-driven optimization and physical interpretability. The presented OF-FSS, which integrates a square ring resonator with self-similar fractal geometry, inherently supports single-band and dual-band resonances. Within the PIRL framework, the OF-FSS design process is formulated as continuous state–action spaces, enabling autonomous exploration and optimization through direct interaction with full-wave simulations. This physics-informed strategy accelerates convergence, reduces data dependency, and ensures stable optimization across different frequency bands. Full-wave simulation results demonstrate that the proposed OF-FSS exhibits excellent agreement with target frequency responses in both single-band and dual-band cases. Overall, the integration of OF-FSS design and PIRL optimization provides an efficient, generalizable, and physically interpretable pathway for developing advanced single-band and dual-band FSS.

Author Contributions

Conceptualization, G.D. and M.L.; methodology, G.D.; software, M.L.; validation, M.L. and X.H.; formal analysis, X.H.; investigation, M.L.; resources, G.D.; data curation, X.H.; writing—original draft preparation, G.D.; writing—review and editing, X.H.; visualization, X.H.; supervision, G.D.; project administration, G.D.; funding acquisition, G.D. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62501045, in part by the Beijing Natural Science Foundation under Grant 4232016, in part by the Basic and Applied Basic Research Foundation of Guangdong Province under Grant 2022A1515110565, and in part by the Fundamental Research Funds for the Central Universities under Grant FRF-GF-25-005.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CST	Computer Simulation Technology
HFSS	High-Frequency Structure Simulator
OF-FSS	Octagonal fractal frequency-selective surface
PIRL	Physics-informed reinforcement learning
FSSs	Frequency-selective surfaces
FSAs	Frequency-selective absorbers
ML	Machine learning
SVMs	Support vector machines
VAE	Variational autoencoder
DNN	Deep neural network
RL	Reinforcement learning
KBDRL	Knowledge-based deep reinforcement learning
RINN	Relational induction neural network
DDPG	Deterministic policy gradient

References

Xiao, T.; Liao, Q.; Tang, G.; Huang, L.; Wang, H.; Liu, C.; Lin, F. A Novel Frequency-Selective Polarization Converter and Application in RCS Reduction. Electronics 2025, 14, 1280. [Google Scholar] [CrossRef]
Zhang, H.; Si, L.; Ma, T.; Dong, L.; Niu, R.; Bao, X.; Sun, H.; Ding, J. Triple-band terahertz chiral metasurface for spin-selective absorption and reflection phase manipulation. Electronics 2022, 11, 4195. [Google Scholar] [CrossRef]
Ouyang, Y.; Zeng, Y.; Liu, X. Explainable Encoder–Prediction–Reconstruction Framework for the Prediction of Metasurface Absorption Spectra. Nanomaterials 2024, 14, 1497. [Google Scholar] [CrossRef]
Sun, Z.; Yan, L.; Zhao, X.; Gao, R.X.-K. An ultrawideband frequency selective surface absorber with high polarization-independent angular stability. IEEE Antennas Wirel. Propag. Lett. 2022, 22, 789–793. [Google Scholar] [CrossRef]
Guo, C.; Li, J. Polarization-Multiplexed Transmissive Metasurfaces for Multifunctional Focusing at 5.8 GHz. Electronics 2025, 14, 1774. [Google Scholar] [CrossRef]
Ren, X.; Liu, Y.; Ji, Z.; Zhang, Q.; Cao, W. Ultra-Wideband Passive Polarization Conversion Metasurface for Radar Cross-Section Reduction Across C-, X-, Ku-, and K-Bands. Micromachines 2025, 16, 292. [Google Scholar] [CrossRef]
Xu, J.; Liu, J.; Hao, R.; Chen, G.; Wang, W.; Li, H.; Sheng, P.; Li, Y.; Kong, J.; Zhao, J. Simulation of Circular Dichroism in a Three-Layer Complementary Chiral Metasurface. Photonics 2025, 12, 228. [Google Scholar] [CrossRef]
Zhang, S.; Qin, Q.; Hua, M. A Wideband Polarization-Insensitive Bistatic Radar Cross-Section Reduction Design Based on Hybrid Spherical Phase-Chessboard Metasurfaces. Coatings 2024, 14, 1130. [Google Scholar] [CrossRef]
Li, Y.; Ma, Y.; Xu, R.; Ren, P.; Xiang, Z. Design of a multi-passband metasurface with ultra-high angular stability based on complementary frequency selective surfaces. Opt. Express 2025, 33, 32862–32880. [Google Scholar] [CrossRef]
Li, Y.; Ma, Y.; Ren, P.; Xu, B.; Xu, R.; Xiang, Z. Design of angle-selective surface with narrow-angle filtering for variable frequency. IEEE Antennas Wirel. Propag. Lett. 2025, 24, 1487–1491. [Google Scholar] [CrossRef]
Li, Y.; Ren, P.; Chen, R.; Xu, B.; Xiang, Z.; Wang, M. Design of miniaturized frequency selective rasorber based on complementary bandpass FSS for 5G applications. IEEE Trans. Electromagn. Compat. 2024, 66, 636–639. [Google Scholar] [CrossRef]
Prado, D.R.; Lopez-Fernandez, J.A.; Arrebola, M.; Goussetis, G. Support vector regression to accelerate design and crosspolar optimization of shaped-beam reflectarray antennas for space applications. IEEE Trans. Antennas Propag. 2018, 67, 1659–1668. [Google Scholar] [CrossRef]
Prado, D.R.; López-Fernández, J.A.; Arrebola, M.; Pino, M.R.; Goussetis, G. Wideband shaped-beam reflectarray design using support vector regression analysis. IEEE Antennas Wirel. Propag. Lett. 2019, 18, 2287–2291. [Google Scholar] [CrossRef]
Abdullah, M.; Koziel, S. Supervised-learning-based development of multibit RCS-reduced coding metasurfaces. IEEE Trans. Microw. Theory Tech. 2021, 70, 264–274. [Google Scholar] [CrossRef]
Koziel, S.; Abdullah, M. Machine-learning-powered EM-based framework for efficient and reliable design of low scattering metasurfaces. IEEE Trans. Microw. Theory Tech. 2021, 69, 2028–2041. [Google Scholar] [CrossRef]
Hodge, J.A.; Mishra, K.V.; Zaghloul, A.I. RF metasurface array design using deep convolutional generative adversarial networks. In Proceedings of the 2019 IEEE International Symposium on Phased Array System & Technology (PAST), Waltham, MA, USA, 15–18 October 2019; pp. 1–6. [Google Scholar]
Fan, J.A. Generating high performance, topologically-complex metasurfaces with neural networks. In Proceedings of the 2019 Conference on Lasers and Electro-Optics (CLEO), San Jose, CA, USA, 5–10 May 2019; pp. 1–2. [Google Scholar]
Naseri, P.; Goussetis, G.; Fonseca, N.J.G.; Hum, S.V. Inverse design of a dual-band reflective polarizing surface using generative machine learning. In Proceedings of the 2022 16th European Conference on Antennas and Propagation (EuCAP), Madrid, Spain, 27 March–1 April 2022; pp. 1–5. [Google Scholar]
Xiao, L.Y.; Jin, F.L.; Wang, B.Z.; Liu, Q.H. Efficient inverse extreme learning machine for parametric design of metasurfaces. IEEE Antennas Wirel. Propag. Lett. 2020, 19, 992–996. [Google Scholar] [CrossRef]
Zhu, R.; Wang, J.; Han, Y.; Sui, S.; Qiu, T.; Jia, Y.; Feng, M.; Wang, X.; Zheng, L.; Qu, S. Design of aperture-multiplexing metasurfaces via back-propagation neural network: Independent control of orthogonally-polarized waves. IEEE Trans. Antennas Propag. 2022, 70, 4569–4575. [Google Scholar] [CrossRef]
Wei, Z.; Zhou, Z.; Wang, P.; Ren, J.; Yin, Y.; Pedersen, G.F.; Shen, M. Equivalent circuit theory-assisted deep learning for accelerated generative design of metasurfaces. IEEE Trans. Antennas Propag. 2022, 70, 5120–5129. [Google Scholar] [CrossRef]
Naseri, P.; Hum, S.V. A generative machine learning-based approach for inverse design of multilayer metasurfaces. IEEE Trans. Antennas Propag. 2021, 69, 5725–5739. [Google Scholar] [CrossRef]
Naseri, P.; Pearson, S.; Wang, Z.; Hum, S.V. A combined machine-learning/optimization-based approach for inverse design of nonuniform bianisotropic metasurfaces. IEEE Trans. Antennas Propag. 2021, 70, 5105–5119. [Google Scholar] [CrossRef]
Nadell, C.C.; Huang, B.; Malof, J.M.; Padilla, W.J. Deep learning for accelerated all-dielectric metasurface design. Opt. Express 2019, 27, 27523–27535. [Google Scholar] [CrossRef]
Liu, H.; Zong, Z.; Li, Y.; Jin, D. NeuroCrossover: An intelligent genetic locus selection scheme for genetic algorithm using reinforcement learning. Appl. Soft Comput. 2023, 146, 110680. [Google Scholar] [CrossRef]
Zhu, E.; Wei, Z.; Xu, X.; Yin, W.-Y. Fourier subspace-based deep learning method for inverse design of frequency selective surface. IEEE Trans. Antennas Propag. 2021, 70, 5130–5143. [Google Scholar] [CrossRef]
Cong, R.; Liu, N.; Li, X.; Wang, H.; Sheng, X. Design of wideband frequency selective surface based on the combination of the equivalent circuit model and deep learning. IEEE Antennas Wirel. Propag. Lett. 2023, 22, 2110–2114. [Google Scholar] [CrossRef]
Zhu, E.; Li, E.; Wei, Z.; Yin, W.-Y. Adversarial-network regularized inverse design of frequency-selective surface with frequency-temporal deep learning. IEEE Trans. Antennas Propag. 2022, 70, 9460–9469. [Google Scholar] [CrossRef]
Liu, H.; Zhao, H.; Wang, J.; Yuan, S.; Feng, W. LSTM-GAN-AE: A promising approach for fault diagnosis in machine health monitoring. IEEE Trans. Instrum. Meas. 2021, 71, 1–13. [Google Scholar] [CrossRef]
Wei, Z.; Zhou, Z.; Wang, P.; Ren, J.; Yin, Y.; Pedersen, G.F.; Shen, M. Fully automated design method based on reinforcement learning and surrogate modeling for antenna array decoupling. IEEE Trans. Antennas Propag. 2022, 71, 660–671. [Google Scholar] [CrossRef]
Peng, F.; Chen, X. An antenna optimization framework based on deep reinforcement learning. IEEE Trans. Antennas Propag. 2024, 72, 7594–7605. [Google Scholar] [CrossRef]
Wei, Z.; Zhou, Z.; Wang, P.; Ren, J.; Yin, Y.; Pedersen, G.F.; Shen, M. Automated antenna design via domain knowledge-informed reinforcement learning and imitation learning. IEEE Trans. Antennas Propag. 2023, 71, 5549–5557. [Google Scholar] [CrossRef]
Liu, H.; Li, T.; Jiang, F.; Su, W.; Wang, Z. Coverage optimization for large-scale mobile networks with digital twin and multi-agent reinforcement learning. IEEE Trans. Wirel. Commun. 2024, 23, 18316–18330. [Google Scholar] [CrossRef]
Liu, H.; Huang, W.; Kim, D.I.; Sun, S.; Zeng, Y.; Feng, S. Towards efficient task offloading with dependency guarantees in vehicular edge networks through distributed deep reinforcement learning. IEEE Trans. Veh. Technol. 2024, 73, 13665–13681. [Google Scholar] [CrossRef]
Liu, H.; Su, W.; Li, T.; Huang, W.; Li, Y. Digital twin enhanced multi-agent reinforcement learning for large-scale mobile network coverage optimization. ACM Trans. Knowl. Discov. Data 2024, 19, 1–23. [Google Scholar] [CrossRef]
Liu, J.; Chen, Z.X.; Dong, W.H.; Wang, X.; Shi, J.; Teng, H.-L.; Dai, X.-W.; Yau, S.S.-T.; Liang, C.-H.; Feng, P.-F. Microwave integrated circuits design with relational induction neural network. arXiv 2019, arXiv:1901.02069. [Google Scholar] [CrossRef]
Su, Y.; Yin, Y.; Li, S.; Zhao, H.; Yin, X. Bandwidth Improvement for Patch Antenna via Knowledge-Based Deep Reinforcement Learning. IEEE Antennas Wirel. Propag. Lett. 2024, 23, 4094–4098. [Google Scholar] [CrossRef]

Figure 1. The meta-unit of the designed FFSs (1) k = 1, (2) k = 2, (3) k = 3.

Figure 2. The lumped-equivalent circuit of the OF-FSS unit.

Figure 3. The framework of proposed PIRL.

Figure 4. The single-band optimization result with multiple bandwidth targets (a) at 2.4 GHz, (b) at 3.5 GHz.

Figure 5. The dual-band optimization result with multiple bandwidth targets. Three different bandwidth optimization goals for low frequencies and high frequencies: (a) 250 MHz/550 MHz, 300 MHz/650 MHz, 350 MHz/750 MHz; (b) 300 MHz/600 MHz, 300 MHz/650 MHz, 600 MHz/700 MHz.

Figure 6. The reward under different iterations (Optimization Objective:

f_{0}

= 2.4 GHz,

B w

= 300 MHz).

Figure 6. The reward under different iterations (Optimization Objective:

f_{0}

= 2.4 GHz,

B w

= 300 MHz).

Table 1. Detailed parameters of PIRL algorithm.

Variable	Value
Learning rate $α$	0.0001
Discount factor $γ$	0.99
$ε$ -greedy $ε$ (Epsilon)	0.1
Epsilon_min	0.05
Epsilon_decay	0.92
Maximum episode M	1
Maximum steps of each episode T	700

Table 2. Optimized detailed Information of the 2.4 GHz single-band task under different bandwidth objectives.

Target Bandwidth	100 MHZ	300 MHZ	600 MHZ
Actual Bandwidth	106 MHz	301 MHz	598 MHz
L (mm)	22.44	28.94	32.80
D (mm)	7.39	8.49	7.88

Table 3. Optimized detailed Information of the 3.5 GHz single-band task under different bandwidth objectives.

Target Bandwidth	150 MHZ	300 MHZ	600 MHZ
Actual Bandwidth	146 MHz	300 MHz	599 MHz
L (mm)	17.30	21.11	25.42
D (mm)	5.41	6.63	6.58

Table 4. Optimized detailed Information of the dual-band (1.8/3.5 GHz) task under paired bandwidth objectives.

Target Bandwidth	250/550 MHz	300/600 MHz	300/650 MHz	300/700 MHz	350/750 MHz
Actual Bandwidth	245/566 MHz	273/609 MHz	297/668 MHz	312/694 MHz	367/758 MHz
L (mm)	51.29	52.32	55.05	55.51	56.19
D (mm)	10.18	10.01	10.68	10.66	9.84

Table 5. Detailed Performance Comparison Table.

Literature	Core Technology	Operating Frequency	Frequency Migrability	Polarization Sensitive	Physical Interpretability
[26]	FS-BDLM + FSS	Dual Band	N	N	N
[27]	ECM + CNN + FSS	Three Band	N	N	Y
[28]	AR-FTDL + FSS	Single Band	N	N	N
Baseline	PSO	Single Band	N	N	N
This work	PIRL + OF-FSS	Single and Dual Band	Y	N	Y

CNN: Convolutional Neural Networks. ECM: Equivalent Circuit Model. FS-BDLM: Fourier Subspace-Based Deep Learning Method. AR-FTDL: Adversarial-network Regularized Inverse-Design Scheme with Frequency-Temporal Deep Learning. PSO: Particle Swarm Optimization.

Table 6. The comparisons of OF-FSS operating at 3.5 GHz based on different algorithms.

	Objective BW	150 MHz	300 MHz	600 MHz
PSO	ΔBW	66 MHz	4 MHz	6 MHz
PSO	Iteration	700	533	430
PIRL	ΔBW	4 MHz	0 MHz	1 MHz
PIRL	Iteration	42	169	73

ΔBW =

{B W}_{o}

−

{B W}_{i}

,

{B W}_{o}

denotes the objective BW,

{B W}_{i}

denotes the actual simulation BW.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dong, G.; Liu, M.; He, X. Physics-Informed Reinforcement Learning for Multi-Band Octagonal Fractal Frequency-Selective Surface Optimization. Electronics 2025, 14, 4656. https://doi.org/10.3390/electronics14234656

AMA Style

Dong G, Liu M, He X. Physics-Informed Reinforcement Learning for Multi-Band Octagonal Fractal Frequency-Selective Surface Optimization. Electronics. 2025; 14(23):4656. https://doi.org/10.3390/electronics14234656

Chicago/Turabian Style

Dong, Gaoya, Ming Liu, and Xin He. 2025. "Physics-Informed Reinforcement Learning for Multi-Band Octagonal Fractal Frequency-Selective Surface Optimization" Electronics 14, no. 23: 4656. https://doi.org/10.3390/electronics14234656

APA Style

Dong, G., Liu, M., & He, X. (2025). Physics-Informed Reinforcement Learning for Multi-Band Octagonal Fractal Frequency-Selective Surface Optimization. Electronics, 14(23), 4656. https://doi.org/10.3390/electronics14234656

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Physics-Informed Reinforcement Learning for Multi-Band Octagonal Fractal Frequency-Selective Surface Optimization

Abstract

1. Introduction

2. Problem Formulation and Methodology

2.1. Octagonal Fractal Frequency-Selective Metasurface

2.1.1. The Structure of the Designed OF-FSS

2.1.2. The Operating Mechanism of the Designed OF-FSS

2.2. Physics-Informed Reinforcement Learning for OF-FSS

2.2.1. Environment (CSTEnv)

2.2.2. State Space

2.2.3. Action Space

2.2.4. Reward Function

2.2.5. Agent Implementation and Training Process

3. Optimization Results

3.1. Single Band Optimization at 2.4 GHz and 3.5 GHz

3.2. Dual-Band Optimization at 1.8 GHz/3.5 GHz

3.3. Analysis of Simulation Time and Convergence Characteristics

4. Comparison

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI