An Inverse Design Framework for Isotropic Metasurfaces Based on Representation Learning

: A hybrid framework for solving the non-uniqueness problem in the inverse design of isomorphic metasurfaces is proposed. The framework consists of a representation learning (RL) module and a variational autoencoder-particle swarm optimization (VAE-PSO) algorithm module. The RL module is used to reduce the complex high-dimensional space into a low-dimensional space with obvious features, with the purpose of eliminating the many-to-one relationship between the original design space and response space. The VAE-PSO algorithm ﬁrst encodes all meta-atoms into a continuous latent space through VAE and then applies PSO to search for an optimized latent vector whose corresponding metasurface fulﬁlls the target response. This framework gives the solution paradigm of the ideal non-uniqueness situation, simpliﬁes the complexity of the network, improves the running speed of the PSO algorithm, and obtains the global optimal solution with 94% accuracy on the test set.


Introduction
Metasurfaces are synthetic composites composed of subwavelength structures arranged in different geometric shapes and distribution functions, which have been successfully applied in spectrum filtering, focusing, holographic imaging, polarization conversion, and other fields in recent years [1]. The subwavelength scatters that make up the metasurfaces are called meta-atoms, which resemble atoms or molecules of natural materials. When a beam of electromagnetic (EM) waves hits a metasurface, it is the strong resonances or spatial orientations of meta-atoms that cause the phase mutation, so metasurfaces have the ability to control the phase, amplitude, and polarization of reflected/transmitted waves in space [2]. Cui et al. proposed the concept of digital metasurface in 2014, in which the meta-atom pattern is discretized and encoded, greatly expanding the design space of metasurfaces [3]. Different coding sequences are proposed on this basis to achieve various EM responses, while the difficulty of inverse design has also increased dramatically with the number of codes for a digital metasurface increasing. Although some evolutionary algorithm (EA) has advantages in terms of fast random search without a problem domain [4], plenty of primary parameter settings of EAs have a dramatic impact on the evolution procedure and convergence result, which might lead to a fall in locally optimal solutions.
Fortunately, the development of deep learning, which is based on artificial neural networks to create computer reasoning by simulating human learning patterns in massive data, has brought new solutions [5]. Deep learning is a completely data-driven approach that mines implicit rules and relationships by learning from enough data sets, which has shown great advantages in computer vision, natural language processing, knowledge graph, and other fields. The basic idea is to design an algorithm to find rules between input data and output data according to a set of given data. The mined rules are stored in the deep learning model as the parameters given. If the rules between input and output remain unchanged, the model that has been trained can automatically and quickly predict its output as soon as another input is given. Since deep learning has its natural advantages in automatically mining undefined rules, we associate deep learning theory with metasurface inverse design to mine the relationship between the massive metasurface geometric parameters and response parameters. This could lead to a disruptive breakthrough in the field of metasurface design, skipping the physical interpretation and being faster and more efficient. Thus, there will be no professional theoretical requirements on designers so engineers are only required to pay attention to their practical demands instead of to the complicated design process and obscure physical theory [6]. The most difficult aspect of employing deep learning in the inverse design of metasurfaces is the non-uniqueness challenge [7]. This means that the expected EM response may correspond to multiple design patterns. Another challenge in using deep learning to design a complex meta-atom is the large size of the response and design spaces resulting in the need to train large neural networks [8], which means more parameters are required and networks are difficult to train. Adequate sampling points are required to achieve the desired EM response over a wide band, which typically results in thousands of data points in the response space. How to deal with the large design space and response space is of great importance.
In this article, we propose a new framework for meta-atom structure inverse design based on representation learning by addressing both the non-uniqueness issue and network-size issue. This framework consists of a representation learning (RL) module and a variational autoencoder-particle swarm optimization (VAE-PSO) algorithm module, which can achieve a high accuracy on the test data set. We use a sufficient number of data sets to train two different kinds of autoencoder, embedding high-dimensional space into low-dimensional space with a low loss rate, and the trained final network can output reduced design vectors (RDV) according to the target response. Then VAE-PSO algorithm is used to search for the optimal solution according to the RDV output by the RL module in the continuous space generated by VAE. Transforming the non-uniqueness issue into an optimal solution can greatly reduce the verification time and design cost. In contrast to earlier work, the most notable technique in this work is that the complex high-dimensional space is transformed into the characteristic low-dimensional space by the RL module, which gives the optimal solution to the non-uniqueness problem, greatly reduces the network complexity, and improves the running speed of the optimization algorithm enormously.

Theory and Design
As different metasurfaces can achieve the same EM response, when the framework inputs the target response, which metasurface will be returned is the non-unique problem. In practical application, though different metasurfaces can generally produce a similar response, it is almost impossible to produce an identical response at each of the sampling points. Therefore, in the global design space, there must be a metasurface that could realize the response closest to the target response, which means a framework is expected to find the optimal solution. The overall framework flow chart is shown in Figure 1. The task of the RL module is to translate the target EM response into RDV through the representation learning network. Any target EM response corresponds to a unique RDV, while one RDV might correspond to more than one metasurface. To solve this many-to-one problem, the VAE-PSO module generates latent design space, where the PSO algorithm is applied to search globally for the optimal latent vector. The VAE decoder is used to decode the latent vector into the optimal metasurface.  Figure 2 illustrates the detailed structure of a meta-atom. The top pattern layer (Layer 1), with a side length L1, is evenly divided into 16 × 16 discrete cells, where '1' means copper and '0' means vacuum in digital coding. Thus, the digital metasurface can be modeled as a 16 × 16 binary design matrix, which has 2 8×8 possible patterns. Each discrete copper piece is a square patch with a thickness of h1 and a side length of L'1. The second layer is a dielectric layer (Layer 2) with a dielectric constant of = 2.65, and a loss tangent of tan = 0.001. The back layer (Layer 3) is the backing copper sheet. The thickness and the side length of Layer 2 and Layer 3 are h2, h3, L2, and L3, respectively. To reduce the influence of polarization, the object of our study is an isotropic metasurface composed of an 8 × 8 coding sequence as a subblock, which forms a 16 × 16 matrix through four-fold symmetry [9]. In the simulation, boundary conditions and excitations are added to the metasurface model. The calculation principle of the software is solved by Maxwell's equations based on meshing, whose calculation time surges with the increase of model complexity. The design matrices of meta-atoms are selected as the input features and the S-parameter as the EM responses, in which data sets are randomly collected to train deep learning models. MATLAB to control CST STUDIO is used to generate digital metasurfaces, calculate S parameters, and save data sets automatically.

Representation Learning Module
The key to our framework is transforming design space and response space into low dimensional space by the RL module to solve the non-uniqueness problem and reduce network complexity [10]. First, the different spaces and their corresponding vectors need to be clearly defined: The original space includes the original design space (ODS) and  Figure 2 illustrates the detailed structure of a meta-atom. The top pattern layer (Layer 1), with a side length L 1 , is evenly divided into 16 × 16 discrete cells, where '1' means copper and '0' means vacuum in digital coding. Thus, the digital metasurface can be modeled as a 16 × 16 binary design matrix, which has 2 8×8 possible patterns. Each discrete copper piece is a square patch with a thickness of h 1 and a side length of L' 1 . The second layer is a dielectric layer (Layer 2) with a dielectric constant of ε r = 2.65, and a loss tangent of tan δ = 0.001. The back layer (Layer 3) is the backing copper sheet. The thickness and the side length of Layer 2 and Layer 3 are h 2 , h 3 , L 2, and L 3 , respectively. To reduce the influence of polarization, the object of our study is an isotropic metasurface composed of an 8 × 8 coding sequence as a subblock, which forms a 16 × 16 matrix through four-fold symmetry [9].  Figure 2 illustrates the detailed structure of a meta-atom. The top pattern layer (Laye 1), with a side length L1, is evenly divided into 16 × 16 discrete cells, where '1' mean copper and '0' means vacuum in digital coding. Thus, the digital metasurface can be mod eled as a 16 × 16 binary design matrix, which has 2 8×8 possible patterns. Each discrete cop per piece is a square patch with a thickness of h1 and a side length of L'1. The second laye is a dielectric layer (Layer 2) with a dielectric constant of = 2.65, and a loss tangent o tan = 0.001. The back layer (Layer 3) is the backing copper sheet. The thickness and the side length of Layer 2 and Layer 3 are h2, h3, L2, and L3, respectively. To reduce the influenc of polarization, the object of our study is an isotropic metasurface composed of an 8 × 8 coding sequence as a subblock, which forms a 16 × 16 matrix through four-fold symmetry [9]. In the simulation, boundary conditions and excitations are added to the metasurface model. The calculation principle of the software is solved by Maxwell's equations based on meshing, whose calculation time surges with the increase of model complexity. Th design matrices of meta-atoms are selected as the input features and the S-parameter a the EM responses, in which data sets are randomly collected to train deep learning mod els. MATLAB to control CST STUDIO is used to generate digital metasurfaces, calculate S parameters, and save data sets automatically.

Representation Learning Module
The key to our framework is transforming design space and response space into low dimensional space by the RL module to solve the non-uniqueness problem and reduc network complexity [10]. First, the different spaces and their corresponding vectors need to be clearly defined: The original space includes the original design space (ODS) and In the simulation, boundary conditions and excitations are added to the metasurface model. The calculation principle of the software is solved by Maxwell's equations based on meshing, whose calculation time surges with the increase of model complexity. The design matrices of meta-atoms are selected as the input features and the S-parameter as the EM responses, in which data sets are randomly collected to train deep learning models. MATLAB to control CST STUDIO is used to generate digital metasurfaces, calculate S parameters, and save data sets automatically.

Representation Learning Module
The key to our framework is transforming design space and response space into low dimensional space by the RL module to solve the non-uniqueness problem and reduce network complexity [10]. First, the different spaces and their corresponding vectors need to be clearly defined: The original space includes the original design space (ODS) and original response space (ORS), and the corresponding vectors are the original design vector (ODV) and original response vector (ORV). The reduced space includes reduced design space (RDS) and reduced response space (RRS), and the corresponding vectors are reduced design vector (RDV) and reduced response vector (RRV). As multiple sets of meta-atoms can result in the same EM response, it is desired to map to the same vectors in the reduced space so that RDS and RRS can form a one-to-one mapping controlled by a nonlinear function, and this process is invertible. In this way, we transfer the many-to-one relationship between ODS and ORS into the many-to-one relationship between ODS and RDS, which could be solved by the VAE-PSO module. Figure 3 illustrates the mapping relationship between different spaces, with the red line representing one-to-one mapping and the blue line representing many-to-one mapping. original response space (ORS), and the corresponding vectors are the original design vector (ODV) and original response vector (ORV). The reduced space includes reduced design space (RDS) and reduced response space (RRS), and the corresponding vectors are reduced design vector (RDV) and reduced response vector (RRV). As multiple sets of meta-atoms can result in the same EM response, it is desired to map to the same vectors in the reduced space so that RDS and RRS can form a one-to-one mapping controlled by a nonlinear function, and this process is invertible. In this way, we transfer the many-toone relationship between ODS and ORS into the many-to-one relationship between ODS and RDS, which could be solved by the VAE-PSO module. Figure 3 illustrates the mapping relationship between different spaces, with the red line representing one-to-one mapping and the blue line representing many-to-one mapping. Representation learning is a machine learning technology that extracts effective features from raw data. Its essence is a dimensionality reduction technology, which can be realized by random forest, principal component analysis, autoencoder (AE), etc. In this article, the RL module uses AE, which is composed of an encoder and a decoder that can be used separately. The back propagation algorithm is used to continually train AE in order to minimize the cost function during the training process. Equations (1)-(3) illustrate features h, reconstruct data and cost function L. (1) where x represents the input data, W, b represents the transformation matrix and bias of the encoder and decoder, and N represents the amount of training data. σ is the active function. The dimension of h should be less than x in order to achieve feature dimension reduction. Figure 4 illustrates the design process of AEs. The dimension of the ORS is reduced by training the autoencoder shown in Figure 4a. S11 between 0-20 GHz is selected as the EM response of this study, with a sampling frequency of 0.02 GHz and a length of 1000 dimensions. Subsequently, a pseudo-autoencoder (the input space is different from the output space) in Figure 4b is used to train the mapping between ODS and RDS, RDS and RRS together. The mapping of the former is many-to-one, and that of the latter is one-toone. The one-to-one relationship between RDS and RRS is trained by a multi-layer perceptron (MLP). The decoder uses the "decoder1" trained in Figure 4a. Representation learning is a machine learning technology that extracts effective features from raw data. Its essence is a dimensionality reduction technology, which can be realized by random forest, principal component analysis, autoencoder (AE), etc. In this article, the RL module uses AE, which is composed of an encoder and a decoder that can be used separately. The back propagation algorithm is used to continually train AE in order to minimize the cost function during the training process. Equations (1)-(3) illustrate features h, reconstruct data x and cost function L.
where x represents the input data, W, b represents the transformation matrix and bias of the encoder and decoder, and N represents the amount of training data. σ is the active function. The dimension of h should be less than x in order to achieve feature dimension reduction. Figure 4 illustrates the design process of AEs. The dimension of the ORS is reduced by training the autoencoder shown in Figure 4a. S11 between 0-20 GHz is selected as the EM response of this study, with a sampling frequency of 0.02 GHz and a length of 1000 dimensions. Subsequently, a pseudo-autoencoder (the input space is different from the output space) in Figure 4b is used to train the mapping between ODS and RDS, RDS and RRS together. The mapping of the former is many-to-one, and that of the latter is one-to-one. The one-to-one relationship between RDS and RRS is trained by a multi-layer perceptron (MLP). The decoder uses the "decoder1" trained in Figure 4a. When the two reduced spaces are completed by a pseudo-autoencoder, an effe forward prediction model can be formed. It maps different design matrices that achieve the same EM response to the same RDV, then maps them to the RRS one-to through MLP, and finally decodes ORV of 1000 dimensions. Subsequently, the inv design model is built, and the inverse of the MLP should be found in Figure 4b. As dem strated in Figure 4c, an inverse design network composed of the encoder1 in Figur and the inverse structure of the MLP in Figure 4b is established. It can generate un RDV by inputting a 1000-dimensional ORV.
The internal structures of encoders and decoders are various. A convolutional ne network (CNN) and a fully connected layer (FCL) are utilized in this work. Network Figure 4a is an autoencoder that reduces ORV of 1000 dimensions. If all FCL is adop the network scale and network parameters will be greatly increased, which is difficu train. Thus, a hybrid network of one-dimensional convolution layer (Conv1D), poo layer, and FCL are adopted in encoder1, and FCL is used in decoder1 in Figure 5a. work 2 in Figure 4b is a pseudo-autoencoder, which transforms the design matrix in reduced design vector and established a one-to-one mapping with the reduced resp vector using MLP. The aim of network 2 is to train an encoder that can effectively c press an original design matrix to the RDV and train a MLP that connects two red spaces. Well-trained network 2 is a forward prediction network. CNN shows good fea extraction effect when processing 2D raster data [11], so our encoder2 uses a two-dim sional convolution layer (Conv2D) as the main network. The specific network structu shown in Figure 5b. In this article, we collected 30,480 samples, among them, 21,366 w used for the training set, 6096 for the validation set, and 3018 for the test set. The trai of these networks needs to strictly follow the order shown in Figure 5. When the two reduced spaces are completed by a pseudo-autoencoder, an effective forward prediction model can be formed. It maps different design matrices that can achieve the same EM response to the same RDV, then maps them to the RRS one-to-one through MLP, and finally decodes ORV of 1000 dimensions. Subsequently, the inverse design model is built, and the inverse of the MLP should be found in Figure 4b. As demonstrated in Figure 4c, an inverse design network composed of the encoder1 in Figure 4a and the inverse structure of the MLP in Figure 4b is established. It can generate unique RDV by inputting a 1000-dimensional ORV.
The internal structures of encoders and decoders are various. A convolutional neural network (CNN) and a fully connected layer (FCL) are utilized in this work. Network 1 in Figure 4a is an autoencoder that reduces ORV of 1000 dimensions. If all FCL is adopted, the network scale and network parameters will be greatly increased, which is difficult to train. Thus, a hybrid network of one-dimensional convolution layer (Conv1D), pooling layer, and FCL are adopted in encoder1, and FCL is used in decoder1 in Figure 5a. Network 2 in Figure 4b is a pseudo-autoencoder, which transforms the design matrix into a reduced design vector and established a one-to-one mapping with the reduced response vector using MLP. The aim of network 2 is to train an encoder that can effectively compress an original design matrix to the RDV and train a MLP that connects two reduced spaces. Well-trained network 2 is a forward prediction network. CNN shows good feature extraction effect when processing 2D raster data [11], so our encoder2 uses a two-dimensional convolution layer (Conv2D) as the main network. The specific network structure is shown in Figure 5b. In this article, we collected 30,480 samples, among them, 21,366 were used for the training set, 6096 for the validation set, and 3018 for the test set. The training of these networks needs to strictly follow the order shown in Figure 5.

Variational Autoencoder-Particle Swarm Optimization Module
Although the final inverse design network can generate the required RDV, we stil cannot find all the design matrices from the RDV due to the one-to-many relationship Theoretically, a RDV may correspond to a multiple design matrix due to the non-unique ness problem, and there is no suitable deep learning model that can carry out one-to-many mapping. In recent years, optimization algorithms have been widely used in the design of metasurfaces, but their running speed is slow due to the huge computational cost o original space [12]. Since the RL module has mapped the inverse design problem to a re duced space, a Particle Swarm Optimization (PSO) is applied in this low dimensiona space with the purpose to improve the speed of the algorithm.
First, the global vector space of the 16 × 16 meta-atom space is generated to identify the target vector effectively and expeditiously in a global manner. A generative model can well complete this task [13], for which VAE is chosen to complete the generation of globa vector space. A VAE can convert high-dimensional discrete data to a low-dimensiona continuous space known as the latent space, where any sampling point can be encoded a a meaningful output. The VAE training process requires continuous learning of condi tional probability distributions of inputs given latent variables based on the input data When VAE is trained with the available binary matrix of 30,000 different isotropi metasurfaces, a continuous low-dimensional design space can be obtained and the re duced dimension is K [14]. Figure 6a illustrates an architecture of a VAE. Since the design matrix of the metasurface is a two-dimensional grid shape, the V-encoder and V-decode of VAE are implemented by a convolutional neural network, which is illustrated in Figur 6b,c respectively.

Variational Autoencoder-Particle Swarm Optimization Module
Although the final inverse design network can generate the required RDV, we still cannot find all the design matrices from the RDV due to the one-to-many relationship. Theoretically, a RDV may correspond to a multiple design matrix due to the non-uniqueness problem, and there is no suitable deep learning model that can carry out one-to-many mapping. In recent years, optimization algorithms have been widely used in the design of metasurfaces, but their running speed is slow due to the huge computational cost of original space [12]. Since the RL module has mapped the inverse design problem to a reduced space, a Particle Swarm Optimization (PSO) is applied in this low dimensional space with the purpose to improve the speed of the algorithm.
First, the global vector space of the 16 × 16 meta-atom space is generated to identify the target vector effectively and expeditiously in a global manner. A generative model can well complete this task [13], for which VAE is chosen to complete the generation of global vector space. A VAE can convert high-dimensional discrete data to a low-dimensional continuous space known as the latent space, where any sampling point can be encoded as a meaningful output. The VAE training process requires continuous learning of conditional probability distributions of inputs given latent variables based on the input data. When VAE is trained with the available binary matrix of 30,000 different isotropic metasurfaces, a continuous lowdimensional design space can be obtained and the reduced dimension is K [14]. Figure 6a illustrates an architecture of a VAE. Since the design matrix of the metasurface is a twodimensional grid shape, the V-encoder and V-decoder of VAE are implemented by a convolutional neural network, which is illustrated in Figure 6b,c respectively. The encoder first converts the input binary matrix into mean vectors μ and standard deviation vectors σ. The latent vector Z is sampled from the Gaussian distribution ~ , during the training process. After that, the decoder G reconstructs Z into the design matrix. Pattern topologies of meta-atom with comparable features are mapped to the same region of latent space in this process, and similar decoding meta-atoms are constantly modified by disrupting latent vectors. A well-trained decoder can recover any hidden vector sampled in the Z space into a binary matrix like the training set, which accomplishes the purpose of generating new samples. Equations (4)-(6) illustrate the loss function of a VAE, denoted by LVAE.
where x, represent input data and reconstruct data. KL refers to Kullback-Leibler divergence. After training VAE, a new network can be trained to find many-to-one mappings between continuous design space V and RDV D in Figure 7a. As shown in Figure  7b, the many-to-one relation can be converted into a multi-solution problem in functions. This can be expressed in Equation (7)  The encoder first converts the input binary matrix into mean vectors µ and standard deviation vectors σ. The latent vector Z is sampled from the Gaussian distribution Z ∼ N µ, σ 2 during the training process. After that, the decoder G reconstructs Z into the design matrix. Pattern topologies of meta-atom with comparable features are mapped to the same region of latent space in this process, and similar decoding meta-atoms are constantly modified by disrupting latent vectors. A well-trained decoder can recover any hidden vector sampled in the Z space into a binary matrix like the training set, which accomplishes the purpose of generating new samples. Equations (4)-(6) illustrate the loss function of a VAE, denoted by L VAE .
where x, x represent input data and reconstruct data. KL refers to Kullback-Leibler divergence. After training VAE, a new network ϕ can be trained to find many-to-one mappings between continuous design space V and RDV D in Figure 7a. As shown in Figure 7b, the many-to-one relation can be converted into a multi-solution problem in functions. This can be expressed in Equation (7): Electronics 2022, 11, 1844 8 of 13 fore, it is necessary to introduce an optimization algorithm to find the optimal latent vec tors with the smallest distance from the target RDV Dtar. Searching for the optimal solution to achieve the target response in the huge ODS will save a lot of verification work and greatly improve the inverse design efficiency of the metasurface. In this work, PSO was chosen as the optimization algorithm, which is implemented in Python using the PySwarms toolkit [15].  Figure 8 illustrates the optimization process. The continuous latent vector space is mapped to the RDS through the network . The fitness function is obtained by calculat ing the mapped vector and the target vector in RDS, which can be written in Equation (8).
The fitness function is the objective function of optimization. The condition at the end of the optimization iteration is , where is an arbitrarily small value based on training data. After optimization, the optimal latent design vector is returned and the bi nary matrix can be generated by V-decoder. To evaluate the performance of our framework quantitatively [16], the accuracy of each target EM response is defined in Equation (9). (9) where f1 and f2 are the frequency bounds of the input spectra, is the target EM re sponse, is the generative response calculated by the generated design matrix from the VAE-PSO framework, which can be obtained directly using the trained network 2 in Figure 4b. The accuracy measures the matching degree between and . Applying Equation (9) to all test sets and then averaging them, the average accuracy of the entire framework is calculated. By transforming the non-uniqueness problem into a global opti mal solution problem in lower-dimensional space, a high average accuracy of 94% on tes sets can be achieved in our framework. Therefore, if the objective RDV and the function expression of the inverse network in Figure 5a are known, the non-uniqueness problem can be solved. Practically, it is impossible to express the function fitted with the neural network. Simultaneously, it is difficult for different meta-atoms to achieve the same response curve in every sample point. Therefore, it is necessary to introduce an optimization algorithm to find the optimal latent vectors with the smallest distance from the target RDV D tar . Searching for the optimal solution to achieve the target response in the huge ODS will save a lot of verification work and greatly improve the inverse design efficiency of the metasurface. In this work, PSO was chosen as the optimization algorithm, which is implemented in Python using the PySwarms toolkit [15]. Figure 8 illustrates the optimization process. The continuous latent vector space is mapped to the RDS through the network ϕ. The fitness function is obtained by calculating the mapped vector D ϕ and the target vector D tar in RDS, which can be written in Equation (8).

Design Result and Analysis
Initially, it is necessary to discuss whether network 1 learns the low-dimensional resentation of ORS effectively. Network 1 is a typical AE for squeezing 1000 dimensi ORV down to 10 dimensions. Equation (10) illustrates the relative response error (RR measure the quality of the autoencoder in the test set. where is the discretized value of ORV, is the discretized value of the recons EM response, and n is the number of discrete points of the spectrum. We input 100 mensions of the S11 curve into network 1 and the average RRE is 3.72% in the test set The fitness function is the objective function of optimization. The condition at the end of the optimization iteration is |∆D| < ε, where ε is an arbitrarily small value based on training data. After optimization, the optimal latent design vector is returned and the binary matrix can be generated by V-decoder. To evaluate the performance of our framework quantitatively [16], the accuracy of each target EM response is defined in Equation (9).
where f 1 and f 2 are the frequency bounds of the input spectra, R tar is the target EM response, R gen is the generative response calculated by the generated design matrix from the VAE-PSO framework, which can be obtained directly using the trained network 2 in Figure 4b. The accuracy measures the matching degree between R tar and R gen . Applying Equation (9) to all test sets and then averaging them, the average accuracy of the entire framework is calculated. By transforming the non-uniqueness problem into a global optimal solution problem in lower-dimensional space, a high average accuracy of 94% on test sets can be achieved in our framework.

Design Result and Analysis
Initially, it is necessary to discuss whether network 1 learns the low-dimensional representation of ORS effectively. Network 1 is a typical AE for squeezing 1000 dimensional ORV down to 10 dimensions. Equation (10) illustrates the relative response error (RRE) to measure the quality of the autoencoder in the test set.
where r i is the discretized value of ORV, g i is the discretized value of the reconstruct EM response, and n is the number of discrete points of the spectrum. We input 1000 dimensions of the S11 curve into network 1 and the average RRE is 3.72% in the test set. The results of the two examples randomly selected are shown in Figure 9a,b. These results prove that a perfect consistency between the input response and the output response has been established and it is feasible to use RRV extracted by AE to represent the high-dimensional ORV.

Design Result and Analysis
Initially, it is necessary to discuss whether network 1 learns the low-dimensional representation of ORS effectively. Network 1 is a typical AE for squeezing 1000 dimensional ORV down to 10 dimensions. Equation (10) illustrates the relative response error (RRE) to measure the quality of the autoencoder in the test set. (10) where is the discretized value of ORV, is the discretized value of the reconstruct EM response, and n is the number of discrete points of the spectrum. We input 1000 dimensions of the S11 curve into network 1 and the average RRE is 3.72% in the test set. The results of the two examples randomly selected are shown in Figure 9a,b. These results prove that a perfect consistency between the input response and the output response has been established and it is feasible to use RRV extracted by AE to represent the high-dimensional ORV. Then, the performance of network 2 should be discussed. The essence of network 2 is a forward prediction network, which will input the design matrix of the metasurface and output the predicted ORV. Its main role in the overall framework is to quickly predict the response curve such as S11 of the inverse-designed metasurface instead of using a traditional physical simulator, so as to calculate the accuracy of this framework rapidly. Prediction accuracy is usually used to describe the performance of a forward prediction Then, the performance of network 2 should be discussed. The essence of network 2 is a forward prediction network, which will input the design matrix of the metasurface and output the predicted ORV. Its main role in the overall framework is to quickly predict the response curve such as S11 of the inverse-designed metasurface instead of using a traditional physical simulator, so as to calculate the accuracy of this framework rapidly. Prediction accuracy is usually used to describe the performance of a forward prediction network and it can be expressed as 1-REE. Figure 10a,b illustrate the S11 curve respectively calculated by the physical simulator and the forward prediction network of two randomly selected metasurfaces. The average prediction accuracy of network 2 on the test set is up to 96%. the running time will be greatly increased. It should be noted that the number of iterations should be set large enough to ensure the convergence of the PSO algorithm with the possibility of early stopping in case the fitness function does not change after about 15 iterations. Figure 10d illustrates how the fitness function changes during iteration. When iterating fifteen times, the fitness function reaches the minimum value and is almost unchanged in subsequent iterations, which proves the algorithm has converged. Next, we introduce how to use the framework. Firstly, the target response is applied to network 3 to obtain the RDV, and then transmitted to the VAE-PSO algorithm frame- The training process of network 1 and network 2 is based on the back propagation algorithm, and the loss function value decreases rapidly with the number of epochs during the training process. Figure 10c shows the curve of the loss function for the two networks on the validation set. After about 25 epochs, the loss function values of the two networks can be reduced to less than 0.5, which achieved a good degree of training for an autoencoder network. The overall loss of network 1 is greater than that of network 2 because network 1 has high dimensions of input and output, which will be more difficult to train. To deal with the overfitting problem, L 2 regularization is necessary.
The VAE-PSO module realizes the application of the PSO algorithm in VAE generated latent space and the optimal latent vector is selected. The algorithm initializes 20 particles and sets the upper limit of iteration to 100. If the number of particles is too small, PSO will fall into the local optimal solution, while if the number of particles is too large, the running time will be greatly increased. It should be noted that the number of iterations should be set large enough to ensure the convergence of the PSO algorithm with the possibility of early stopping in case the fitness function does not change after about 15 iterations. Figure 10d illustrates how the fitness function changes during iteration. When iterating fifteen times, the fitness function reaches the minimum value and is almost unchanged in subsequent iterations, which proves the algorithm has converged.
Next, we introduce how to use the framework. Firstly, the target response is applied to network 3 to obtain the RDV, and then transmitted to the VAE-PSO algorithm framework and optimized iteratively. Finally, the design matrix that can produce the most similar EM response is output. As a verification, we select S11 curves as target responses from the test set to feed into the framework and get an average accuracy of up to 94%. Figure 11a illustrates a concrete example selected from the test set, with the target response in the blue line, and the generative response in the red line. For practical applications, perfectly ideal response curves made up of straight lines are usually input. We expect to design a metasurface capable of implementing a −2 dB S11 parameter at 12 GHz, so an ideal EM response without transition bands is fed into the framework and returns the most accurate design matrix in the global latent space in Figure 11b. An artificial neural network (ANN) can learn complex nonlinear input-output relationships through the training process and data adaptation, which are their main characteristics [17,18]. FCL is the most common type of ANN, which can directly train the corresponding rules between ORV and ODV. For performance comparison, the responses generated based on the FCL method, whose matching degree with the target response is far less than that of the framework, are also illustrated in Figure 11a,b. The phase response of the metasurface is also a significant property, which is illustrated in Figure 11c,d. Abrupt phase changes can be obtained in the planar metasurface structures over the subwavelength scale, which provides a new avenue to enable a variety of applications, including large-scale planar imaging, electromagnetic virtual shaping, and holographic display with a large field of view [19].
work and optimized iteratively. Finally, the design matrix that can produce the most similar EM response is output. As a verification, we select S11 curves as target responses from the test set to feed into the framework and get an average accuracy of up to 94%. Figure  11a illustrates a concrete example selected from the test set, with the target response in the blue line, and the generative response in the red line. For practical applications, perfectly ideal response curves made up of straight lines are usually input. We expect to design a metasurface capable of implementing a −2dB S11 parameter at 12 GHz, so an ideal EM response without transition bands is fed into the framework and returns the most accurate design matrix in the global latent space in Figure 11b. An artificial neural network (ANN) can learn complex nonlinear input-output relationships through the training process and data adaptation, which are their main characteristics [17,18]. FCL is the most common type of ANN, which can directly train the corresponding rules between ORV and ODV. For performance comparison, the responses generated based on the FCL method, whose matching degree with the target response is far less than that of the framework, are also illustrated in Figure 11a,b. The phase response of the metasurface is also a significant property, which is illustrated in Figure 11c,d. Abrupt phase changes can be obtained in the planar metasurface structures over the subwavelength scale, which provides a new avenue to enable a variety of applications, including large-scale planar imaging, electromagnetic virtual shaping, and holographic display with a large field of view [19]. Finally, this framework is named "STARRY", and the performance difference among STARRY, the conventional design method, and the FCL design method is compared in Figure 12. Traditional design methods take genetic algorithms as an example. FCL design method directly trains the mapping of response curve to design matrix with full connection layer. The design time and area of deep learning methods are much smaller than that of the conventional design method. The STARRY framework consumes more time and Finally, this framework is named "STARRY", and the performance difference among STARRY, the conventional design method, and the FCL design method is compared in Figure 12. Traditional design methods take genetic algorithms as an example. FCL design method directly trains the mapping of response curve to design matrix with full connection layer. The design time and area of deep learning methods are much smaller than that of the conventional design method. The STARRY framework consumes more time and space resources than FCL but shows the highest accuracy. At the same time, STARRY solves the problem of non-uniqueness and network complexity. space resources than FCL but shows the highest accuracy. At the same time, STARRY solves the problem of non-uniqueness and network complexity.

Conclusions
In conclusion, an isomorphic metasurfaces inverse design framework based on representation learning has been proposed and demonstrated. By using AEs of various architectures, the high-dimensional original space is mapped to the low-dimensional space with little loss, and the non-uniqueness problem is changed into a multi-solution problem in the low-dimensional space. The VAE-PSO algorithm is used to swiftly identify the best global search space value and deliver the binary matrix with the best response accuracy. This hybrid framework, which gives the optimal solution to the non-uniqueness problem and reduces the network complexity, achieves a high average accuracy of 94% on test sets, and it can give the design matrix in just a few seconds, which saves a lot of resources and time compared with the traditional inverse design methods.

Conclusions
In conclusion, an isomorphic metasurfaces inverse design framework based on representation learning has been proposed and demonstrated. By using AEs of various architectures, the high-dimensional original space is mapped to the low-dimensional space with little loss, and the non-uniqueness problem is changed into a multi-solution problem in the low-dimensional space. The VAE-PSO algorithm is used to swiftly identify the best global search space value and deliver the binary matrix with the best response accuracy. This hybrid framework, which gives the optimal solution to the non-uniqueness problem and reduces the network complexity, achieves a high average accuracy of 94% on test sets, and it can give the design matrix in just a few seconds, which saves a lot of resources and time compared with the traditional inverse design methods.