1. Introduction
Waveguide gratings are well known for their ability to couple free-space modes to guided waveguide modes forming guided-mode resonances (GMRs). The spectral position and quality factor (Q-factor) of the GMRs are dependent on the geometrical parameters of the structure, such as the grating period, waveguide thickness, and refractive index contrast [
1]. Such resonances are also highly sensitive to changes in the surrounding environment and can be used to detect changes in the refractive index of the surrounding medium, which is useful in sensing applications such as sensors for chemical and biological detections [
2], biosensing applications (e.g., [
3,
4,
5]), bio-imaging (e.g., [
6,
7]), strain sensing, and strain measurement applications (e.g., [
8,
9]). GMRs are also highly sensitive to the wavelength and angle of incident light [
1], making them suitable for filtering applications (e.g., [
10,
11,
12,
13,
14,
15]) and optical communications (e.g., [
16]).
Waveguide-based structures can support different types of mode coupling phenomena depending on the materials and geometries involved. In purely dielectric systems, photonic-photonic mode coupling can occur, which shows the highest Q-factors and is widely used in integrated photonics [
17,
18]. Multilayer dielectric structures, e.g., 1D photonic crystals, can also support Bloch surface waves, which exhibit strong field confinement and surface-localized propagation [
19]. Additionally, in waveguides containing metal films, photonic-plasmonic coupling can arise. The guided modes—surface plasmon polaritons (SPPs)—enable strong field localization, which is particularly promising for sensing applications due to a high sensitivity to refractive index changes close to the metal surface. At very thin metal films and gratings two SPPs can hybridize [
20,
21], while in general SPPs and photonic transverse magnetic (TM) modes can hybridize as well [
22,
23]. The resulting modes can combine the benefits of high Q-factors and high sensitivity.
In this work, we focus on hybrid plasmonic photonic waveguide gratings, where a dielectric slab waveguide is combined with a thin metallic grating. The geometry enables the maximum complexity of hybrid TM modes within the waveguide, so the GMRs prediction is difficult. On the other hand, it provides sufficient propagation losses to limit the sharpness of the GMRs. This way, a suitable spectral resolution with regard to energy and momentum can be defined that ensures no GMRs are missed by the simulations.
For material selection, silver is chosen due to its well-known strong support of surface plasmon polaritons in the visible spectrum [
20,
24]. As the dielectric layer, we chose the transparent polymer OrmoCore. OrmoCore is a commercially available polymer suitable for low-loss waveguides. The combination of silver and OrmoCore layers forms a robust platform for hybrid waveguide gratings [
23,
25,
26]. The OrmoCore–silver grating stack was placed on top of a glass substrate.
While the effective refractive index of purely dielectric modes cannot exceed the index of the dielectric core [1
1.58], the results presented in
Supporting Information reveal multiple mode solutions that exhibit high real parts of the effective refractive index, reaching values as large as 1.954 at 580 nm, indicating the hybrid nature of the associated modes.
In our study, we concentrate on three primary geometrical parameters: the silver thickness, the dielectric (OrmoCore) thickness, and the grating period. These parameters were selected based on their dominant influence on the position, sharpness, and number of GMRs observed in the reflection spectra. Although other design spaces may involve different materials or additional parameters, such as grating dimensions or multilayer configurations, our chosen parameter set strikes a balance between physical interpretability and complexity, making it ideal for training and evaluating inverse design models.
Machine Learning (ML) has emerged as a tool in photonics, supporting a wide range of tasks from material characterization and real-time process monitoring to image-based diagnostics and design optimization. A particularly promising application area is inverse design, where the goal is to identify structural parameters that yield desired optical properties (e.g., [
27,
28,
29]). Instead of relying solely on computationally expensive parameter sweeps, ML models can learn the complex mapping between optical responses and structural configurations, thereby accelerating the design process. This strategy has proven effective in various domains, such as photonic crystals [
30], metamaterials [
31], resonators [
32], and quantum nanophotonics [
33], with applications ranging from optical computing and light manipulation to sensing and optical communication systems.
State-of-the-Art and Motivation
The core idea behind ML-based inverse design is to decrease the expenses associated with generating new data by substituting simulations or experiments with AI models. However, inverse design comes with intrinsic challenges. A key issue is that most inverse design problems are considered ill-posed, meaning that different structures may lead to the same output, and multiple viable solutions exist [
34,
35].
As a result, a direct inverse neural network that attempts to learn a one-to-many mapping from desired properties to design parameters can struggle to produce stable or representative predictions. This is due to the inherent ambiguity in the inverse mapping, which can lead the model to average over multiple valid solutions or to become biased toward certain regions of the design space [
34,
35,
36].
Numerous studies to date have endeavored to address the problem of non-uniqueness inherent in the input-output relationship of the inverse design models. For instance, it has been suggested to introduce appropriate constraints, such as limiting the design space or projecting it to a low-dimensional space, to establish a well-defined problem [
37,
38].
One approach involved segmenting the training dataset into distinct groups. They partitioned the overall training data into groups based on the derivative of the forward model’s output with respect to its input, to ensure each group exhibited a one-to-one mapping from response to design [
34]. This segmentation aimed to reduce ambiguity in the inverse problem. For each of these distinct groups of data, an individual inverse neural network model was trained. The final prediction was obtained by feeding the input into all trained inverse models, passing their outputs through the trained forward model, and selecting the model whose output closely matched the input. This strategy resulted in higher prediction accuracy compared to training a single inverse model on the entire dataset. However, the challenge remains in properly clustering the data, especially for more complex or high-dimensional design spaces.
To address the ill-posed nature of the inverse design problem, a tandem architecture was proposed that combines a pre-trained forward model with a trainable inverse model. Hence, the model learns by minimizing the error between the predicted and input optical responses [
35]. Nevertheless, using a tandem network, only one of the several possible design configurations is predicted for the desired output, and the remaining alternatives are intentionally dismissed to minimize the training loss. Consequently, the tandem network potentially neglects other alternatives that could be more feasible, cost-effective, or convenient in terms of fabrication.
Generative AI models, such as a conditional Variational Autoencoder (cVAE) [
39] enable tackling the ill-posed issue and can generate multiple valid design candidates for a given target response (e.g., [
40]). Instead of converging to a single solution, the cVAE maps the same desired optical spectrum to a distribution of structurally distinct yet physically consistent parameter sets. However, due to the single-modal normal distribution assumption in standard cVAE frameworks, the diversity of generated structural parameters is often limited in practice (e.g., [
28]).
Alternatively, conditional Generative Adversarial Network (cGAN) [
41] have been shown to provide higher diversity and flexibility in inverse design tasks. Nevertheless, this increased diversity often comes at the cost of reduced accuracy, making cGAN less suitable for applications where precise prediction of structural parameters is critical (e.g., [
28]).
In this work, we address the inverse design of hybrid waveguide gratings based on their reflection spectra using two different ML approaches, Tandem Network and cVAE. The remainder of this paper is organized as follows:
Section 2 describes the simulation setup and the overall methodology, including the dimensionality reduction of data and the applied ML approaches.
Section 3 outlines the evaluation procedure and explains how prediction accuracy, robustness, and errors are assessed. In
Section 4 and
Section 5, we present and discuss the detailed results and conclusions based on our findings.
2. Materials and Methods
To investigate the optical properties of the proposed hybrid waveguide gratings, accurate modeling of light–matter interactions at the nanoscale is essential to predict device performance and guide the inverse design process. For this project, Rigorous Coupled-Wave Analysis (RCWA) simulations were utilized to compute the reflection of the zero-order TM reflected wave for 1D periodic nanostructures under non-conical incidence [
42,
43].
In our study, the inputs to the simulations include the structural parameters of the device, such as the material of the dielectric or metal layers, their corresponding thicknesses, the periodic distances in the x-direction, and the size of the grating. A schematic of the structure used in this research is presented in
Figure 1a. As can be seen, the structure used in this study is a one-dimensional grating made of silver, placed atop a dielectric slab waveguide.
OrmoCore with a refractive index in the range of 1.54 to 1.58 and normal dispersion in the visible range given by the Cauchy formula:
with
A = 1.53,
B = 8000 nm
2, and
. For silver, we used the optical constants reported by Palik [
44], which account for both the real and imaginary parts of the refractive index.
The values for the silver thickness, , the dielectric slab thickness, , and the grating period, , are randomly selected within specified ranges: 10–100 nm for , 0–1500 nm for the , and 280–550 nm for the . The structure is supported by a glass substrate. The thickness of the glass substrate is considered infinitely thick. Each combination of these parameters is unique and non-repetitive. In total, 15,000 simulations were performed to sufficiently cover the 3D design space while balancing computational cost and model training requirements. In each case, the filling factor of the grating is 0.5, due to easy fabrication. To balance the computational efficiency and precision of the RCWA simulations, we retained 10 Fourier orders per side (21 in total) in all calculations.
Although these parameter ranges were chosen to broadly explore the design space, relevant fabrication constraints are considered. In particular, silver layers thinner than 10 nm are known to form discontinuous, island-like structures rather than uniform films, making them impractical to fabricate reliably [
45,
46]. Therefore, ultra-thin silver layers were excluded from the simulation range. Concerning OrmoCore layers, layer thicknesses less than 80 nm are challenging to produce using standard fabrication processes. Nevertheless, this lower range was included in the simulations in order to fully investigate the influence of dielectric thickness on device performance.
The incident light is a TM-polarized plane wave, and the output, as it is shown in
Figure 1b, represents the zero-order TM reflection for different angles,
, and wavelengths,
, of the incident light, where
is the angle between the incident wave and the z-axis (perpendicular to the plane), varied between 0 and 89 degrees. The simulations were performed for wavelengths from the spectral range of interest, 400–800 nm.
To address the one-to-many nature of the inverse design problem with a focus on accuracy, we have implemented two approaches: a Tandem network [
35] and a cVAE [
39].
Before training the models, an autoencoder is used to extract low-dimensional latent representations from the reflection data, simplifying the learning process and reducing computational complexity.
2.1. Dimensionality Reduction of Reflection Data Using Autoencoders
In this work, a convolutional autoencoder (AE) is trained to compress high dimensional reflection spectra of 2D matrices of size 401 × 90 into a compact latent representation of size 128 without losing important information. The encoder maps each spectrum into a lower dimensional representation. Simultaneously, the decoder is trained to reconstruct the original reflection spectra from this latent space, ensuring minimal information loss during compression [
47].
After training, only the encoder is used to convert each reflection matrix into its corresponding latent vector. This conversion reduces the dimensionality of the data, which simplifies the input for the subsequent forward and tandem networks, as well as the cVAE, reduces computational complexity, and improves learning efficiency, while still preserving critical spectral features.
We trained the autoencoder as a denoising model by adding Additive Gaussian Noise (AGN) to each input reflection matrix during training. AGN was chosen as it closely approximates the type of noise typically observed in reflection measurements from real samples. The standard deviation,
, of the noise was randomly sampled from a uniform range [0, 0.1] for each sample in a batch and regenerated at every training iteration. The network was trained to reconstruct the original clean input from noisy data, forming a denoising autoencoder (e.g., [
48]). As a result, the latent vectors extracted from the denoising autoencoder, which are later used as inputs to subsequent models, ensure that subsequent models are resilient to the noise and fluctuations typical of real-world measurements. Additional details of the autoencoder model architecture are provided in
Appendix A.1.
While AGN adequately represents random detector fluctuations, real measurement data can also be affected by colored noises. Colored noises are the stochastic noise characterized by its power spectrum caused by systematic effects such as source drifts or environmental instabilities. In this study, white noise was considered during training. However, the trained models were additionally evaluated with colored noise applied to the reflection spectra as well. For more information, see
Section S3 of the Supplementary Information.
2.2. Tandem Network
The Tandem network combines Forward and Inverse Neural Networks, as shown schematically in
Figure 2a.
Prior to training the Tandem Network, the high-dimensional reflection spectra are compressed using the pre-trained AE (see
Section 2.1). This step transforms each reflection matrix into a lower-dimensional latent vector, which retains essential spectral features. These latent vectors are then used as the target outputs for training the forward model and as the input data for the inverse model in the Tandem Network.
The forward model, which maps structural parameters to their corresponding optical responses, is first trained independently using supervised learning on a large dataset of simulated design-response pairs. Once trained, its weights are frozen. Here, we train a Forward Neural Network (FNN) to predict the lower-dimensional latent representations of the reflection spectra obtained from the AE, from given structural parameters using the mean squared error (MSE) as the loss function for the FNN:
where
N is the number of training samples,
is the latent representations of the reflection spectra for the
i-th data point, and
is the FNN’s predicted latent representation of reflection spectra for the structure
.
The FNN is then integrated into a joint architecture with an Inverse Neural Network (INN), which maps the latent representation of the target spectra to candidate structural parameters. During training, the predicted structures from the INN are passed through the fixed FNN, and the loss is computed between the resulting and target latent representation of the spectra. This Tandem loss is used to update only the weights of the INN, ensuring physically consistent inverse predictions [
35].
where
is the predicted structure for the latent vector
, and
is the latent representation of the reflection spectra predicted by the pre-trained FNN. As can be seen, the tandem loss directly depends on the output of the pretrained forward model
g. Since only the inverse model
f is updated during the training, the accuracy of the INN’s predictions is inherently constrained by the quality of
g. This approach also ensures that the INN converges to a solution consistent with the FNN’s predictions, thereby tackling the ill-posed problem. Further details of the model architecture are provided in
Appendix A.2.
2.3. Conditional Variational Autoencoder (cVAE)
In order to predict the three structural parameters from reflection spectra, we trained a cVAE. In this model, both the encoder and decoder are conditioned on the reflection data to ensure that the generated structural parameters comply with the physical constraints dictated by the input spectrum [
39].
As an auxiliary condition,
, the model uses low-dimensional latent vectors extracted from the reflection data by the pre-trained autoencoder (see
Section 2.1). By incorporating this condition into both the encoder and decoder, the cVAE learns a latent space,
, that captures the variability in the structural parameters while remaining consistent with the latent representation of the reflection spectrum.
The overall architecture, illustrated in
Figure 2b, consists of an encoder, a latent space, and a decoder. The encoder processes the structural parameters along with the condition
and maps them into a probabilistic latent space characterized by a mean and variance. The decoder then reconstructs the structural parameters by combining a sampled latent vector with the same condition
, ensuring that the predicted structures are physically consistent with the input reflection data. Additional model architecture details are provided in
Appendix A.3.
The model is trained by optimizing a loss function that consists of two terms: a reconstruction loss, which ensures the predicted structural parameters closely match the ground truth, and a Kullback–Leibler (KL) divergence term, which regularizes the latent space to follow a standard normal distribution. Formally, the loss is:
where
is the encoder that approximates the posterior distribution of the latent variable
given both the input
and the condition
.
is the decoder that reconstructs
from both
and
.
is the prior distribution of the latent variable conditioned on
.
Figure 2.
Schematic structure of the combination of the autoencoder for dimensionality reduction with (a) the Tandem Network and (b) the cVAE. The reflection spectra are shown as color maps and the colored blocks denote different neural-network components.
Figure 2.
Schematic structure of the combination of the autoencoder for dimensionality reduction with (a) the Tandem Network and (b) the cVAE. The reflection spectra are shown as color maps and the colored blocks denote different neural-network components.
3. Results
To evaluate our proposed approaches, we first assess the accuracy of the dimensionality reduction using the AE. Since the AE forms the foundation for subsequent models by compressing complex reflection spectra into compact latent representations, it is essential to ensure that this compression preserves the essential spectral information. In particular, the denoising capability of the AE plays a crucial role in enhancing model robustness against measurement noise.
To quantitatively assess the performance of the trained AE, we computed mean squared error (MSE) and mean absolute error (MAE) between the original and reconstructed reflection spectra for the test dataset. For the noise-free test dataset, the AE achieved an average MSE of
and MAE of
, with only a negligible increase to
and
under noisy inputs with constant
. Given that the reflection values range between 0 and 1, these error values confirm that the AE reliably compresses and reconstructs the high-dimensional data for noise-free and noisy inputs. In addition, a visual comparison of a reflection spectra from test dataset and its reconstruction using the trained AE is shown in
Figure 3. The reconstruction of a noisy reflection spectrum is shown in the second row of
Figure 3.
After training the AE, we trained a forward model that maps structural parameters to the AE’s latent space. The predicted latent vectors are then passed through the decoder part of the pre-trained AE to reconstruct the corresponding reflection spectra.
Figure 4 illustrates this sequence of predictions by the forward model and AE for a test dataset and compares the reconstructed reflection spectra to the ground truth spectra. As shown, the model closely replicates the physical behavior of the system.
The predicted reflection spectra, obtained using the same sequence of models (forward model followed by the AE decoder), and the ground truth reflection spectra for the entire test dataset result in a MSE of 0.000136 and a MAE of 0.0056. Since reflection values are confined to the range [0, 1], the low MAE and MSE confirm the model’s ability to accurately reconstruct the system’s physical response.
To test the generalization of our models, we reserved 20% of the total 15,000 data as a test set, fully withheld during model development. The remaining 80% was used in a 5-fold cross-validation to evaluate the model. For each fold, we trained the model on 80% of the fold’s data and validated it on the remaining 20%.
All reported metrics written in
Table 1 and
Table 2, including MSE, MAE, the coefficient of determination (
), sensitivity, and other related measures and figures, for both the Tandem network and the cVAE, are calculated on either normalized or original scale (unnormalized) withheld test data and averaged across the five folds. This approach provides a statistically reliable assessment of the model’s generalization and reduces the risk of biased evaluation due to a single train-test split.
The MAE for each structural parameter’s prediction on values in original scale test data is separately summarized in
Table 3 for both the Tandem model and the cVAE. To probe the reason behind the differences between the MAEs across parameters, the sensitivity of the reflection spectra to each of the structural parameters is quantified via a central finite-difference Jacobian. The results show that the grating period has the strongest influence, followed by the silver thickness, while the OrmoCore thickness has only a minor effect. As it is shown in
Table 4, the Root Mean Square (RMS) absolute sensitivity with respect to the OrmoCore thickness is
nm
−1. In comparison, the RMS absolute sensitivity for silver thickness and grating period are about 31 and 221 times larger, respectively. This explains why the models consistently predict the period and silver thickness more accurately than the OrmoCore thickness. Step-size convergence tests confirmed that the chosen finite-difference steps yield reliable sensitivity values. Further details are provided in the
Appendix B.
Despite training the model with noisy data, we assessed the model’s robustness by introducing noise to the reflection data and measuring how much the prediction accuracy of the structures degrades. First, the structural parameters are predicted from the noise-free test dataset, and the MAE between the prediction and the actual structural parameter is calculated. Subsequently, AGN is added to the same test dataset. The standard deviation of the noise was fixed to a constant value of
, which was applied uniformly to all samples in the dataset. The MAE is computed again between the predicted structures from the noisy dataset and ground truth.
Figure 5 illustrates this robustness analysis, showing how the MAE varies with the standard deviation,
, of the AGN.
In addition to the normalized-scale robustness analysis, we further evaluated the effect of AGN on the absolute prediction accuracy of each structural parameter. The standard deviation of the noise was fixed to
for each sample in the test dataset.
Figure 6 presents the MAE in real units (nm) for the silver thickness, OrmoCore thickness, and grating period, separately for both the Tandem and cVAE models.
Besides, to measure the sensitivity of the model’s predictions to small errors in the input, two predictions are obtained for each sample: one from a clean reflection spectrum and one from a noisy reflection spectrum. The standard deviation of the noise was fixed to 0.1 for each sample in the test dataset. The relative absolute difference between the two corresponding predictions is then calculated as a direct measure of sensitivity to noise,
:
where
and
are the predicted denormalized, real-valued structural parameters of the clean and noisy data, respectively. The sensitivity to noise,
, is computed for each sample in the test set, and the average
over all samples is calculated for each fold. Finally, the reported sensitivity to noise is the average of these values across all five folds. Sensitivity to noise is reported separately for each structural parameter, silver thickness, dielectric thickness, and period, as each may respond differently to the noise in the input reflection data. The result is shown in
Table 5.
To evaluate the influence of training data availability on cVAE and Tandem models performances, we used the same architectures that were used to train each of the cVAE and Tandem but using different fractions of the full training dataset. First, the same 20% of the dataset that had been excluded during the training of the main model on the entire dataset in order to serve as an independent test set was set aside again for an unbiased evaluation. Then, 10%, 30%, 50%, 70% of the remaining data were used to train the models. For each data fraction, we performed 5-fold cross-validation. Each trained model was evaluated on the same held-out test set, and the performance was assessed using the MAE and MSE on normalized data. The reported values represent the average performance across all five folds. The results are illustrated in
Figure 7.
4. Discussion
The inverse design has been extensively studied, leading to a significant corpus of research and developments. In this paper, we primarily focus on the inverse design of one-dimensional hybrid waveguide gratings composed of a silver grating on a dielectric using the reflection spectrum of the device.
Researchers who have a specific optical response in mind, which is the most suitable for their applications, can leverage our trained model to determine the corresponding structural parameters that would produce that response. This approach reduces fabrication costs and accelerates the design cycle as there is less need for trial-and-error fabrication processes.
Our model here can also assist researchers in determining the precise structural parameters of a plasmon grating structure from its measured optical response. This is particularly useful when the type of structural material is known and matches the materials used in this study, and when the grating is one-dimensional. In this case, our models can infer the structural parameters without the need for costly and time-consuming physical characterization techniques like Atomic Force Microscopy (AFM) or Scanning Electron Microscopy (SEM).
The results of this study demonstrate the effectiveness and robustness of two distinct machine learning approaches for the inverse design of hybrid waveguide gratings: the Tandem network, a discriminative model, and the cVAE, a generative model.
Our comparative analysis in
Table 1 shows that the Tandem model achieves slightly better overall predictive accuracy, as evidenced by its lower MSE and MAE compared to the cVAE. Both models, however, are reliable due to the coefficient of determination (
) values in
Table 2, which for both approaches are very close to unity across all structural parameters.
An evaluation of the MAE of each structural parameter in not normalized scale in
Table 3 reveals that both models predict each of the structural parameters with high accuracy. Despite the larger absolute error values for the OrmoCore thickness, the relative accuracies with respect to each parameter’s design range across all three structural parameters remain within ∼1%. The sensitivity analysis, shown in
Appendix B, provides a physical explanation for these differences in accuracy. Reflection spectra are significantly more sensitive to variations in grating period and silver thickness than to OrmoCore thickness. Consequently, these parameters are predicted with higher accuracy. Within the considered
window under TM polarization, the period
primarily governs the resonance positions, while the silver thickness
affects resonance coupling and damping. In contrast, the OrmoCore thickness
mainly modifies effective indices and guided-mode dispersion, which lead to comparatively weaker variations in the reflectance. This sensitivity-driven accuracy is particularly relevant for applications such as optical sensing, where resonance positions must be predicted precisely. Since the grating period dominantly governs resonance positions, its higher sensitivity enabled the AI models to predict it most accurately. From a fabrication perspective, this implies that parameters with higher spectral sensitivity must be controlled with greater precision, as even small deviations can strongly alter the reflection response. In addition, the MAE for the OrmoCore thickness is comparable to typical fabrication error encountered for instance, in spin-coating processes, particularly for thicker OrmoCore layers. Thus, the model’s prediction error for the OrmoCore thickness remains within practical fabrication tolerance.
The robustness evaluation of the models highlighted some differences between these models.
Figure 5 shows how the MAE changes as a function of
for both the cVAE and the Tandem model. As illustrated in the plot, for low noise levels (
), both models do not show significant degradation in prediction accuracy, with the Tandem model showing slightly lower MAE. At higher noise levels (
), however, the cVAE demonstrates greater robustness to noise compared to the Tandem model.
The parameter-wise robustness analysis, shown in
Figure 6, provides additional insights into the practical reliability of the models. As shown in
Figure 6, the MAE of all structural parameters increases gradually with
, indicating the sensitivity of the models to the input noise. In
Figure 6a,c, both Tandem and cVAE models maintain low mean absolute errors for silver thickness and grating period at low to moderate
. Even under strong noise levels (up to
), the MAE for silver thickness and period remains below 3 nm. As can be seen in
Figure 6a,c, at
, the MAE of the cVAE model outperforms the Tandem model, achieving errors of only 5.6 nm for the silver thickness and 5.5 nm for the grating period.
Figure 6b demonstrates that the dominant error source in both models is the OrmoCore thickness, which can be attributed to the intrinsically lower sensitivity of the reflection response to changes in this parameter. Up to
the MAE increases by only about 3 nm, from approximately 13–15 nm to 16–17 nm for Tandem and cVAE models, respectively; however, beyond this point, the error rises rapidly. A crossover in performance between the two models can also be seen. The Tandem network is preferable in scenarios with low to moderate noise, while the cVAE provides superior stability under high noise conditions, where measurement data may be significantly degraded and noisy.
We also quantified their sensitivity to noise using relative absolute differences between the predictions on clean and noisy reflection spectra under low-noise regime of
. Based on the results shown in
Table 5, the average SN values for all three structural parameters in the cVAE are higher than the corresponding values obtained from the Tandem network. It demonstrates that the Tandem network offers optimal precision for low-noise data and cVAE is more responsive to small perturbations.
To assess the performance of our cVAE and Tandem networks in the event of varying data availability,
Figure 7 shows how the MAE and MSE decreases as the fraction of training data increases. Errors decrease monotonically with more data, and the Tandem model consistently yields lower MAE and MSE than the cVAE across all fractions. The performance gap is most pronounced at 10% of the data and gradually narrows at higher fractions. Beyond 70% of the training data, both models approach a saturation point, indicating that further increases in dataset size provide only marginal accuracy improvements.
In general, each method tackles the inverse design problem differently. An inspection of
Figure 8 reveals each model’s capacity to tackle the one-to-many nature of the inverse problem. Each sub-panel illustrates 100 structural predictions from the cVAE model for a single target reflection spectrum, as well as Tandem model’s prediction and the ground truth. The Tandem network provides a single, deterministic solution, which is desirable when an unambiguous, reproducible design is required. In contrast, the cVAE generates multiple valid design candidates, introducing limited diversity into the predictions. As can be seen the predictions from cVAE model yields a distribution of solutions that cluster closely around the ground truth. This indicates that the cVAE reproduces the most probable structural configurations as it captures the natural ambiguity in the inverse mapping owing to the ill-posed problem. However, this diversity remains relatively narrow due to the single-modal Gaussian assumption of the standard cVAE. Nevertheless, both model achieves high accuracy and effectively addresses the ill-posed nature of the inverse design task.
During training we added additive Gaussian white noise to the reflection spectra. AGN was used to emulate random experimental errors. Real measurements include colored noises as well. Colored noise refers to a type of stochastic noise characterized by its power spectrum that exhibits correlations across neighbouring data points, unlike white noise, which is entirely uncorrelated. Applying colored noise to reflection spectra models systematic variations such as source drifts or environmental fluctuations. Although we did not use colored noise during training, we evaluated both models under colored noises. Further information for definitions, parameters and evaluation results are provided in
Supplementary Information.
In future work, we aim to explore the application of Physics-Informed Neural networks (PINNs) [
49] to incorporate physical constraints directly into the learning process. This approach can improve model performances particularly for complex optical systems with less available data. Furthermore, expanding these methods to more complex structures or multi-dimensional gratings could further enhance their practical utility and scope.
5. Conclusions
In conclusion, this study presents a robust and efficient inverse design strategy for 1D hybrid waveguide gratings using machine learning models trained on reflection spectra. Both the deterministic Tandem network and the generative cVAE demonstrated high accuracy in predicting structural parameters, meeting strict design requirements for silver and dielectric thicknesses, and grating period. Notably, even under measurement conditions with substantial noise, the cVAE maintained low and reliable MAE values for silver thickness and grating period, underscoring its suitability for experimental data where noise levels may be significant.
Overall, this work highlights the potential of data-driven approaches to accelerate the design cycle, reduce fabrication costs, and eliminate the need for extensive spatial measurements, paving the way for practical applications in photonics, sensing, and integrated optical devices. Beyond hybrid gratings, the same framework can be applied to other nanophotonic structures, facilitating access to design spaces that are difficult to probe through traditional simulation approaches. Looking ahead, integrating physics-informed neural networks (PINNs) or hybrid simulation–ML workflows may further boost accuracy while reducing data requirements, strengthening the role of machine learning as a core tool for photonic inverse design.