Deep Learning for The Inverse Design of Mid-infrared Graphene Plasmons

We theoretically investigate the plasmonic properties of mid-infrared graphene-based metamaterials and apply deep learning of a neural network for the inverse design. These artificial structures have square periodic arrays of graphene plasmonic resonators deposited on dielectric thin films. Optical spectra vary significantly with changes in structural parameters. Our numerical results are in accordance with previous experiments. Then, the theoretical approach is employed to generate data for training and testing deep neural networks. By merging the pre-trained neural network with the inverse network, we implement calculations for inverse design of the graphene-based metameterials. We also discuss the limitation of the data-driven approach.


Introduction
Metamaterials are artificial composites engineered to possess desired features not found in nature. Metals and dielectrics in metamaterials are periodically organized at the subwavelength scale. The incident light excites surface plasmons or collective oscillations of quasi-free electrons, which cause strong light-matter interactions. The subwavelength confinement of electromagnetic waves allow us to obtain a negative refractive index [1,2], and perfect absorption and transmission [3,4]. These properties have been applied to various areas including superlens [5], cloak of invisibility [6], sensing [7,8], and photothermal heating [9][10][11][12]. However, plasmon lifetime in metal nanostructures is limited because of large inelastic losses of noble metals. The large Ohmic loss also reduces service life of optical confinement.
Graphene is a novel plasmonic material [7,8,13]. Graphene can strongly confine electromagnetic fields, particularly in the infrared regime but dissipates a small amount of energy [14]. A significant reduction of the heat dissipation in graphene compared to that in metals is caused by the small number of free electrons. One can easily tune optical and electrical properties of graphene via doping, applying external fields, and injecting charge carriers [13]. Consequently, graphene-involved metamaterials are expected to possess various interesting behaviors.
Graphene-based metamaterials can be investigated using different approaches. While experimental implementation and simulation are very expensive and time-consuming, theoretical approaches provide good insights into underlying mechanisms of metamaterials. Rapid collection of data from theoretical calculations, simulations, and experiments has introduced data-driven approaches to effectively investigate systems. Data-driven approaches using machine learning and deep learning have revolutionized plasmonic and photonic fields since they can speed up calculations and be a one-time-cost approach after collecting data by expensive sources. Reliable theoretical models can generate huge amounts of systematic data for Machine Learning and Deep Learning analyses. Thus, combining theory and deep learning would pave the way for better understanding in science and introducing futuristic applications.
Inverse design problems have attracted the scientific community since they allow us to accelerate the design process targeting desired properties [15][16][17][18][19][20]. One has recently applied inverse design techniques to several physical systems such as quantum scattering theory [15,16], photonic devices [17,18], and thin film photovoltaic materials [19]. Typically, inverse design problems are solved using the optimization approach in high-dimensional space, the genetic algorithm [21], and the adjoint method [22]. Hower, these approaches require lengthy calculations and computational timescale. In this context, artificial neural networks provide faster calculations with higher precision.
In this work, we present a theoretical approach to investigate plasmonic properties of graphene-based nanostructures and use deep neural networks for the inverse design of the structure when knowing an optical spectrum. Our modeled systems mimic mid-infrared graphene detectors fabricated in Ref. [7]. The validity of our theoretical approach is verified by comparisons with finite difference time domain (FDTD) simulations and the previous experimental work [7]. Then, the theoretical calculations are used to generate training and testing data sets to train a neural network for forward prediction and inverse network. We also discuss limitations of these methods. Figure 1a,b show top-down and side view of our graphene-based systems in air medium (ε 1 = 1). The systems include a square lattice of graphene nanodisks on a diamond-like carbon thin film placed on a silicon substrate. The energy of optical phonon of the diamond-like carbon thin film is expected to be the same order as this energy of diamond (165 meV) [23]. While the phonon energy of SiO 2 is approximately 55 meV [24] and other conventional substrates have lower photon energy. Compared to other materials, the surface of diamond-like carbon films is non-polar and chemically inert due to very low trap density. The diamond-like carbon layer has a thickness of h = 60 nm and dielectric function ε 2 = 6.25. While the dielectric function of silicon is ε 3 = 11.56. The square lattice of graphene nanodisks has a lattice period, a = 270 nm, a resonator size, D = 210 nm, and a number of graphene resonator layers, N. The width between two adjacent graphene plasmons is (a − D). Under quasi-static approximations (the size is much smaller than incident wavelengths) associated with the dipole model, the polarizability of the graphene resonators as a function of frequency ω is analytically expressed as [12,13,25,26]

Optical Response of Graphene-Based Metamaterials
where ζ and η are geometric parameters and σ(ω) is the optical conducitivity of graphene plasmons. In Ref. [25], authors found ζ = 0.03801 exp −8.569Nd g /D − 0.1108 and η = −0.01267 exp −45.34Nd g /D + 0.8635; here, d g = 0.334 nm is the thickness of a graphene monolayer. The N-layers graphene conductivity in mid-infrared regime can be calculated using the random-phase approximation with zero-parallel wave vector [13]. Typically, σ(ω) is contributed by both interband and intraband transitions. However, in the mid-infrared regime, the interband conductivity is ignored and we have where e is the electron charge,h is the reduced Planck constant, and τ is the carrier relaxation time. In our calculations,hτ −1 = 0.03 eV. Note that Equation (2) is the in-plane optical conductivity.
The analytical expression only validates in the low frequency regime [13]. Effective combinations of Equations (1) and (2) require a strong coupling condition [27,28], which is that the ratio of the spacing distance between graphene disks to the diameter D has to be very small. In our systems, graphene disk layers are separated by a dielectric layer [28]. Thus, for simplicity, we assume that the vertical optical conductivity is ignored. The stacking between graphene layers does not change the chemical potential, and the horizontal optical conductivity is a simple addition of layers. The reflection and transmission coefficients of the graphene-based nanostructure are where where r pq and t pq are the bulk reflection and transmission coefficients, respectively, when electromagnetic fields strike from medium p to q, c is the speed of light, and g ≈ 4.52 is the net dipolar interaction over the whole square lattice. From Equations (3) and (4), the transmission |t 13 | 2 ≡ |t 13 (N)| 2 for N > 0 and N = 0 corresponding to systems with and without graphene plasmonic resonators is calculated. In experiments, experimentalists measure the relative difference in these transmissions 1-|t 13 (N)| 2 /t 13 (N = 0)| 2 and call it the extinction spectrum. A variation of this spectrum determines graphene-plasmon-induced confinement of electromagnetic fields.

Simulation
To validate our theoretical calculations, we use FDTD solver in Computer Simulation Technology (CST) microwave studio software [29] to investigate the transmission properties and electromagnetic responses of the proposed metamaterial. The model of graphene in CST is described according to the Kubo formula. The dielectric function of multilayer graphene can be expressed as [30][31][32] where ε 0 is the vacuum permittivity. Equation (5) suggests an insensitivity of ε g (ω) to the number of graphene layers. It means that the dielectric function of stacked multilayer graphene systems is approximately equal to that of the monolayer graphene counterpart. This assumption is relatively reasonable since, if we consider the stacked N-layer graphene disks as a film, its dielectric function is independent of the thickness. In CST simulations, the incident electromagnetic wave is perpendicular to the surface, in which the electric and magnetic components are along the y-axis and x-axis, respectively. We apply the open boundary condition along the z-axis while the periodic boundary conditions are employed along the xand y-axes. Two transmitters and receivers are located on sides of the structure along the z-axis to measure transmission scattering parameters S 21 (ω) of the electromagnetic wave when interacting with the graphene/diamond-like carbon (DLC)/silicon medium. Then, transmittance (T) would be obtained by T(ω) = [S 21 (ω)] 2 . Then, the extinction spectrum can be obtained by 1 − T(N)/T 0 , where T 0 is the transmittance of the structure without graphene disks on top.

Deep Neural Network
Although the theoretical method has many advantages in understanding physical properties of graphene nanostructures, it is difficult to predict the structural parameters for a desired spectrum. We employ the tandem network introduced in Ref. [18] for the inverse design of our graphene-based metamaterials having N = 3 in Figure 1a.
Inverse design is an ill-posed problem and many designs can be proposed to satisfy a given set of performance criteria. This is a research problem itself. There are two basic approaches.
(1) The forward simulation/prediction is used to conjugate with several search techniques (for example, genetic algorithms, Bayesian optimization). If simulation is slow, this can be expensive and time-consuming. However, since machine learning always approximates accurate simulation, this can speed up calculations but reduce the accuracy. (2) Another approach is to use backward mapping from performance criteria back to design using machine learning models (mostly neural networks due to its flexibility). Finding multiple reasonable solutions is the most challenging problem if using this method. However, deep neural networks work very well when collecting a large amount of data, which are sequential and systematic. The calculations run very fast. In this work, the second method is employed.
First, we use Equations (3) and (4) to generate roughly 11316 and 2860 extinction spectra for training and testing data sets, respectively. An entry of a theoretical calculation has three parameters. The diameter D of a graphene nanodisk is constrained between 100 nm and 300 nm with a step size of 5 nm, while the width is changed from 10 nm to 120 nm with a step size of 5 nm. For the chemical potential E F , we increase from 0.05 eV to 0.6 eV with a step size of 0.05 eV. Practically, E F of graphene-based biosensors and photodetectors varies from 0.17 eV to 0.45 eV and the value can be controlled by an external electric field [7,8,13]. Each optical spectrum has 400 spectral points at a frequency range from 0.001 to 0.4 eV.
Second, we use the training data set to train our neural network, which is depicted in Figure 1c. There are six hidden layers in the forward-modeling neural network. Currently, no specific formula for choosing the number of hidden layers has been revealed yet. The running time grows with an increase of the number of hidden layers while the accuracy remains nearly unchanged as having more than three layers [33]. In many studies, one empirically prefers six layers. These layers sequentially have 1024-512-512-256-256-128 hidden units [18]. Again, there is also no universal role to estimate the number of units in each layer. From the computer-science point of view, the number of units is typically chosen as a power of two and the latter layer is equal to or reduced by a factor of two in comparison with the previous one. Increasing the number of nodes gives more accuracy but causes long calculations [18]. The learning rate is initialized at 0.0005. When finishing the training, a new set of structural parameters inputting to the forward network gives a predicted optical spectrum.
Finally, a tandem neural network is formed by connecting the trained forward-modeling neural network to an inverse-design network with two hidden layers, which have 512 and 256 hidden units. The learning rate in the inverse design is set to 0.0001. This tandem neural network takes data points from a new/desired spectrum as an input to propose possible design parameters. Then, these design parameters are put in the trained forward-modeling neural network to generate the corresponding spectrum. The algorithm adjusts weights in the inverse network to minimize the mean square error between the real and predicted spectrum. The process is repeatedly carried out. Figure 2 shows theoretical and simulation extinction spectra with several values of graphene plasmon layers (N = 1, 3, 5 and 10). Due to absorption of electromagnetic fields by surface plasmons in graphene nanodisks, the transmission is reduced. The reduction is enhanced when increasing the number of graphene layers and the plasmonic resonance is also blue-shifted. One can observe theoretical calculations quantitatively agree with simulations, particularly N = 3 and 5. The spectral positions of optical resonances predicted using these two approaches are very close. Particularly, the system with a square lattice of three-layers-graphene disks has the plasmonic peak at 0.1 eV (∼ 806 cm −1 ). This value is fully consistent with experiment in Ref. [7]. The excellent agreement at N = 3 indicates that optical spectra calculated by our theoretical approach is reliable to generate for deep/machine learning study. Thus, in the next calculations, we focus on the graphene-based metamaterials with N = 3. The mainframe of Figure 3 shows theoretical infrared extinction spectra of graphene-based systems with D = 210 nm and the width of 60 nm at several values of the chemical potentials. The calculations are carried out using Equations (3) and (4). For E F = 0.45 eV, all structural parameters are identical to the fabricated detector in Ref. [7]. The plasmonic peak is roughly located at 0.1 eV. The value is in a quantitative accordance with the prior experimental result [7]. A decrease of E F not only red-shifts the surface plasmon resonance, but also significantly reduces an amplitude of the optical signal. The reason is σ(ω) ∼ E F . The metal-like or plasmonic properties are lost with decreasing the chemical potential through reducing the numnber of free electrons.  The inset of Figure 3 shows the sensitivity of optical spectra to the diameter of graphene nanodisks. The nearest distance between two plasmonic resonators is fixed at 60 nm. Since α(ω) ∼ D 3 , the extinction cross section of a graphene nanodisk approximated by 4 Im(α)ω/c is proportional to D 3 . Thus, a decrease of the diameter weakens the plasmonic coupling among resonators and lowers the optical peak. In addition, one can observe the blue-shifts in the spectrum when reducing the size of nanodisks. These behaviors suggest that increasing the size of graphene plasmons enables confining more mid-infrared optical energy. The trapped energy is highly localized in plasmonic resonators and thermally dissipates through the system.

Numerical Results and Discussion
Once the neural network is trained by our training and testing data sets, it can very quickly calculate a new extinction spectrum when inputting a new set of structural parameters including D, the width, and E F . The trained network is now incorporated with the inverse network to construct the tandem network for improving accuracy of inverse design. To demonstrate the validity of our deep neural network, we exame this network by predicting structural parameters for two desired spectra and show results in Figure 4. First, Equations (3) and (4)    Again, one of the most challenging problems in the inverse design is nonuniqueness. There may be many designs having an identical performance. It is difficult to overcome this issue. In a recent work [18], Liu and his coworkers proposed the tandem architecture to handle the nonunique designs. However, our above calculations numerically prove that it is not universal. Clearly, the real and deep-learning-predicted structures are different. Although this result does not negate the validity of the tandem neural network for the inverse design problem, it indicates that the nonunique issue is still not handled. Furthermore, since the structure-property relationship is highly nonlinear, calculations using this tandem neural network can work well with high dimensional data features as stated in Ref. [18], but it does not imply that the success happens at the low dimensional one.

Conclusions
We have investigated extinction spectra of graphene-based metamaterials using theory, simulations, and deep learning. The nanostructures have a square array of three-layers graphene nanodisks placed on the diamond-like carbon thin film on a semi-infinite silicon substrate. The polarizability of array of graphene nanodisks is calculated using the dipole model associated with the random-phase approximation. Based on this analysis, we analytically express the transmission coefficient and calculate the extinction spectra as a function of many structural parameters. The numerical results agree quantitatively well with CST simulations. Since the FDTD simulations require very large computational work and running time, the theoretical calculations are employed to generate reliable data for the tandem neural network. After training the artificial neural network, it has been used to solve the inverse design problem. Our deep learning calculations have showed that the predicted design can be accurately given for a target performance. However, calculations based on the tandem neural network do not handle the issue of nonunique design.