Leveraging AI in Photonics and Beyond

: Artiﬁcial intelligence (AI) techniques have been spreading in most scientiﬁc areas and have become a heated focus in photonics research in recent years. Forward modeling and inverse design using AI can achieve high efﬁciency and accuracy for photonics components. With AI-assisted electronic circuit design for photonics components, more advanced photonics applications have emerged. Photonics beneﬁt a great deal from AI, and AI, in turn, beneﬁts from photonics by carrying out AI algorithms, such as complicated deep neural networks using photonics components that use photons rather than electrons. Beyond the photonics domain, other related research areas or topics governed by Maxwell’s equations share remarkable similarities in using the help of AI. The studies in computational electromagnetics, the design of microwave devices, as well as their various applications greatly beneﬁt from AI. This article reviews leveraging AI in photonics modeling, simulation, and inverse design; leveraging photonics computing for implementing AI algorithms; and leveraging AI beyond photonics topics, such as microwaves and quantum-related topics.


Introduction
Electromagnetics is a fundamental branch of physics that arises from the interaction of charged particles with ever-expanding technological applications and scientific discoveries [1][2][3][4][5]. It has enormous implications in our daily life from medical applications to mobile phones. The electromagnetic spectrum encapsulates wavelengths from thousands of kilometers down to a fraction of the size of an atomic nucleus. In a celestial manner, the behavior of electromagnetic waves corresponding to this entire spectrum can be succinctly described by the golden set of Maxwell equations. Table 1 summarizes the spectrum distribution from Radiofrequency (RF) to Optics and the corresponding typical applications/techniques. As given in the table, the wide spectrum of optics (where photonics are mainly located) from far-infrared to ultraviolet yields numerous photonics applications [6][7][8].   [17,18] Ultraviolet 750 THz-10 PHz 320-100 eV UV-C LED [19] Unlike other domains, the physics within Maxwell equations is complete and selfconsistent in nature. However, one must employ hard computing to numerically seek solutions to these equations, which is computationally expensive and laborious. The modeling complexity of hard computing methods scales directly with the domain size and the required precision-limiting full exploration of the parameter space. Thus, interruptions of the traditional Maxwell equation-based method with Artificial intelligence (AI) are vital to advance the state-of-art in the design and modeling of electromagnetic problems [20][21][22]. AI methods, such as machine learning (ML) techniques are proven methodologies for the capture, interpolation, and optimization of highly complex phenomena in many fields. They are widely used in image classification [23,24], image/video processing [17,25,26], natural language processing (NLP) [27,28], and robotics [29,30]. Thus, coupling AI techniques with traditional physics-based methods could potentially discover pseudo-random designs with performance excellence that is beyond physical intuitions. This entire cycle of design, modeling, and simulation carried by soft computing algorithms will accelerate execution speeds by two to three orders of magnitude. Importantly, the intrinsic nature of ML algorithms, which is data-driven, allows the incorporation of many of the uncertainties in material parameters, fabrication, and manufacturing. Therefore, when viewed as a whole, the methodology will reduce fabrication tape-outs and cycles and increase manufacturing yield. As illustrated in Table 1, all the applications/techniques listed from RF to optics can leverage AI. For example, within the optics domain, infrared with AI techniques can help to detect the human body [14], analyze material composition [15], and colorize NIR images [16]. For the fields beyond optics, the related applications leveraging AI will be further discussed in Section 4.
With the rapid development of AI in recent years, especially the deep learning methods, how to leverage AI for photonics has attracted significant interest from researchers worldwide. Figure 1 shows the research trend on AI and photonics since 1996. The number of the publication with topics including AI and photonics from web of science is searched by conditions: AI (topic) or deep learning (topic) or machine learning topic) and photonics (topic) from 1996 to 2021. It can be seen from Figure 1 that the publications have increased dramatically since 2005 from 5000 to almost 50,000, and it is expected that such an increasing trend could last for years. A network visualization is also shown in Figure 1 based on the highly-cited paper from 2021, searched by web of science and plotted using VOSviewer [31].
This article reviews the applications of ML methods in electromagnetics with a particular emphasis on photonics. Photonics covers a wide range of the electromagnetic spectrum from visible to mid-infrared wavelengths. This encapsulates vast applications that include data transport, telecommunication, quantum information technologies, biology, and chemical sensing. The rest of this review paper is organized as follows. Section 2 describes the applications of AI on photonics modeling, simulation, and inverse design. The recent studies on soft computing using ML and the inverse design for photonics using Generative adversarial network (GAN) are summarized. Section 3 reviews the studies on how photonics contribute to AI in terms of implementing advanced Neural Networks using photonics hardware for acceleration. In Section 4, we review the AI applications beyond photonics, including AI for computational electromagnetics (forward and inverse solvers), microwave device optimization and design, electromagnetic compatibility and interference (EMC/EMI), and quantum-computing-related topics. <HDU 3XEOLFDWLRQVN Figure 1. The trend of the publication on AI and photonics and the network visualization of the highly cited paper from 2021. (Searched by web of science and plotted by VOSviewer).

AI for Photonics: Modeling and Simulation
As indicated by Figure 1, the publications of photonics related to AI/deep learning have been drastically increased recently. There are several articles [32][33][34] reviewing applications of AI techniques in photonics. [32,33] target applications of AI methods for general device modeling in photonics, while [34] targets application of AI methods in exclusively in integrated quantum photonics. Device modeling in photonics can be broadly divided into forward and inverse modeling.
In forward modeling, the input parameter space consists of device geometrical parameters, and the output parameter space comprises the performance vectors. AI methods typically utilizes discriminative neural networks. Discriminative neural networks model the definite mapping between the input and output variables via multilayer feedforward networks or CNNs. These have been successfully applied in forward modeling of the spectral response of plasmonic scatters [35], effective refractive indices of waveguides [36], electric field profiles of photonics systems [37,38], dielectric metasurfaces [39][40][41], dielectric metagratings [42], beam combiners [43], beam steering devices [39], and photonics topological insulators [44]. On the other hand, in inverse modeling, as the name suggests, the inputs are the performance vectors, and the outputs are the corresponding device geometry. The AI methods in inverse modeling can be further categorized into three categories. The first category uses gradient descent-based algorithms coupled with a forward model. Here, the gradients of the forward model are evaluated either by adjoint methods [45,46] or automatic differentiation [40,47]. Automatic differentiation employs chain rules of derivatives, and they are mathematically equivalent to backpropagation algorithms. Starting from an educated guess of the device geometry, the error between the guess geometry response, and the desired response is calculated. This error is backpropagated, and a modified geometry is obtained. The process is repeated until the error between the guess and desired response is an acceptably small value. The second class of inverse methods makes use of conventional optimization techniques, such as genetic and particle swam algorithms and forward models. In this class, the optimization algorithm will conduct an algorithmic search for the device geometry with the desired performance, and the algorithms will access the forward model multiple times (depending on the problem the counts could be hundreds or thousands). The third class of inverse algorithm makes use of generative neural networks to predict the device geometry with desired performance targets [48][49][50], and the detail review will be given in Section 2.3. As explained above, the application of AI techniques in photonics thus far is vastly concentrated on device designs. Readers can refer to [32][33][34] for more details. In the following sections, we will take two specific areas for details discussions. The first is application of AI methods in for optical mode solving-a forward modeling problem, and the second is the application of generative neural networks in the inverse designs. Before we move to the AI applications in photonics, a brief introduction on some fundamentals of Neural Networks will be given in the next subsection.

Neural Networks
Artificial Neural Network (ANN), Convolution Neural Network (CNN) and Recurrent Neural Network (RNN) are three most commonly used Neural Networks. ANN usually consists of three layers-input, hidden and output layers, connected in the forward direction. Each node of the input and hidden layers has parameters of weight and bias for the node connected to the next layer. ANN is capable of learning complex nonlinear function that maps any input to the desired output. For example, ref. [51] predicted hourly global, diffuse, and direct solar irradiance using ANN based on various measured data. However, ANN is only applicable when the input data size (the number of the nodes of the input layer) and the number of the nodes of the hidden layer are relatively small because the number of trainable parameters (weights and bias) increases drastically as the numbers of input layer nodes and the hidden layers increase. In addition, ANN loses the spatial/temporal features of the input data, such as images and video, and it cannot capture sequential information in the input data, such as time series data. CNN is prevalent in the deep learning community, especially in computer vision. Unlike ANN, the trainable parameters lie in the filters, also called kernels. Instead of fully connecting all the nodes in each layers, the filters are used to extract the relevant features from the input using the convolution operation. For instance, ref. [52] used CNN to extract high-level features of computed tomography data to diagnose COVID-19 symptoms. RNN is widely used in natural language processing with sequential input data. It has a recurrent connection on the hidden layers and shares parameters across different time steps. This results in fewer parameters to train and decreases the computational cost. In [38], a RNN is employed to establish the field continuity between the adjacent pixels for the calculation of optical mode profiles. Based on the three essential NNs, many machine learning frameworks/models, such as ResNet [23], GAN [53,54], LSTM [55,56], and Transformer [57] have been built for solving various complicated problems.

Optical Mode Solving
Traditionally the design tasks in photonics are carried out using physical principles and intuitions. However, this does not explore the full input parameter space constrained by the available materials and fabrication techniques. The modeling and simulation in photonics are governed by the golden set of Maxwell equations. The physics within Maxwell equations is complete and self-consistent in nature. However, one must employ hard computing to numerically seek solutions to these equations, and this is computationally expensive and laborious. The modeling complexity of hard computing methods scales directly with the computational domain size and the required precision-limiting the full exploration of the parameter space. Thus, interruptions of the traditional Maxwell equation-based method with new set of soft-computing algorithms are critical to advance the state-of-art in the design and modeling of electromagnetic problems. Soft computing relies on machine learning techniques and could potentially autodiscover pseudo random designs with performance excellence that is beyond physical intuitions. This entire cycle of design, modeling, and simulation carried by soft computing algorithms will accelerate execution speeds by two to three orders of magnitude. Additionally, the intrinsic nature of soft computing algorithm, which is data driven will allow incorporation of many of the uncertainties in material parameters, fabrication, and manufacturing. Therefore, the methodology when viewed as a whole, will reduce fabrication tape-outs and cycles, and increase photonics manufacturing yield. Optical mode solving represents an area where the soft computing techniques have been successfully applied. The task of mode solving has fundamental importance in a photonics integrated circuit design. Engineers usually spent plenty of their design time in this step extracting and optimizing mode profiles, field confinements, effective, and group refractive indices. Optical mode solving allows one to monitor the fundamental properties of the optical waveguides. It plays a key role in the design of more complicated components, such as directional couplers, resonators, arrayed waveguide gratings, and modulators. Therefore, accurate and rapid mode-solving are extremely necessary. Traditionally, modes of an optical waveguide are solved by seeking solutions to time-independent Maxwell's equations. Analytical solutions exist for simple one-dimensional slab waveguides. For two-dimensional geomeattempts, such as channel, and strip waveguides, numerical simulations must be sought for accurate solutions. Usually, to tackle the numerical problem, one applies matrix-diagonalization based methods, such as finite difference, and finite elements. Although, these methods are well established, they consume certain amount of computational resources, and such resource consumption especially matters when performing large number of geometrical sweeps, optimizations, and group index calculations. The group index is calculated by repeating the effective index calculations for a range of wavelengths. The parameter space for the waveguide geometry in the photonics problems is usually well defined. They are limited by the choice of available materials, and the dimensions that can be fabricated using existing fabrication capabilities. Researchers often explore the well-defined parameter space repeatedly with brute force numerical methods. Such repeated exploration is, thus, an inefficient use of computational resources. Many valuable patterns can be learned in each of these repetitions and thus can be used an effective representation for future calculations without the use of numerical methods. The soft-computing based optical mode solver is an exceptionally good complement to the physics-based solver as the user can now solve some popular predetermined waveguide geomeattempts almost quasi-instantaneously without any hard computations. These are immensely helpful when the user is doing a parameter sweep and optimizations. The user still need to use physics-based solver for non-traditional and complicated geomeattempts, or to double check (however, the AI solver delivers high precision results) one of the instances in sweep.
In the following, we describe the recent works of soft computing techniques in optical mode solving. Specifically, we illustrate the modal classification, effective refractive index calculations, and optical mode profile predictions via applications of deep-learning models.

Modal Classifications
Most devices in integrated optics requires the underlying waveguide to operate with a single mode. Thus, quick identifications of single mode waveguide geomeattempts will help to accelerate the design cycle of the on-chip photonics components. Traditionally, modal classification is done by finding effective index curves as a function of waveguide geometrical parameters. In buried channel waveguides, we have two geometrical parameters, waveguide width (w) and height (h) respectively. For a given h, the effective refractive index versus w curves is generated for multiple order of the modes. Here, the effective refractive index is at every single w is obtained by solving time independent Maxwell equation using numerical methods, such as finite difference or finite element. We will see that the higher order modes will emerge only after some critical w. By monitoring this critical value, single mode waveguides for a given h, can be designed. For a different h, the entire procedure has to be repeated. In [58], a deep-learning model for waveguide model classification is presented. Silicon nitride channel waveguides with silica cladding are considered. Figure 2a depicts the modal classification based on a densely connected feedforward architecture with four hidden layers. The horizontal and vertical axes represent w and h, respectively. The solid black line B(w,h) represent the exact single mode curve calculated using conventional finite difference method. This curve splits the input parameter space into regions of single and multimode mode geomeattempts. Geomeattempts with (w,h) lies below B(w,h) have a single mode, and lies above B(w,h) have multiple modes. The solid green curves are deep learning generated single mode curves. There are four panels in Figure 2a, each depicts deep-learning model with varying number of learning (training) points (dark yellow circles shows the coordinates of the learning points). We can see that that, when the number of training points is 25 (second row, first panel), the predicted single mode curve matches the exact curve. The required number of data points to estimate B(w,h) is far less than the actual number of points needed to estimate B(w,h)where more than 500 points were used. The blue and red crosses in Figure 2a represent random test points. The red crosses represent single mode classifications, and blue crosses represent multimode classifications. Figure 2b shows the mean square error between the predicted and exact B(w,h) as a function number of learning points. On the other hand, Figure 2c shows percentage of misclassifications as a function number of learning points. Please refer [58] for more details.

Effective Refractive Indices
The effective refractive index is an optical quantity that describes how well the mode is confined to the waveguide core. It is a critical parameter to design many of the functional devices in photonics. For an example, in arrayed waveguide grating the effective index and group refractive index (calculated by sweeping effective index calculations over wavelength of light) determines the phase difference and free spectral range. In directional couplers, the beating length is essentially determined by computing effective refractive index. A deep-learning model can accelerate effective refractive index calculations. The trained deep-learning models are able to predict the effective refractive quasi-instantaneously. This in turn enables ultra-fast prediction of other derived quantities, such as the group refractive index and beating lengths. In [59], deep-learning models for effective refractive index predictions were shown for silicon nitride buried channel waveguides with silica cladding. Figure 3 describes the developed deep-learning model using a feedforward architecture of three hidden layers. Figure 3 summarizes the results when the deep-learning model is trained with 4, 9, and 16 points. The left panels show the input parameter space. The blue circles in the panels represent the coordinates of the (w,h) used in the training. On the other hand, the right panel displays the predicted effective refractive indices along the lines A-F interpolated in the input parameter space. The plots show how the patterns in the effective refractive indices of the fundamental waveguide modes for both polarizations of light, can be uncovered with only 4 to 16 learning points for the entire parameter space. In [36], the authors developed a universal deep-learning model for the effective refractive index of a buried channel waveguide. The cladding material is kept to silica, while the model can have varying core refractive index. The deep-learning model is able to make precise predictions for wide spectrum optical wavelengths, core materials of refractive indices varying from 1.45 to 3.8, and wide range of feasible geometrical parameters of the waveguides. The authors explore single and multi-layer neural architectures with minimal number of learning points and demonstrate the precision superiority of the multilayer deep-learning models as opposed to the single layer deep-learning model and the conventional interpolation techniques. With only 27 learning data points, we are able to achieve a MSE of 1.56 × 10 −2 . The MSE reduces to 3.94 × 10 −5 when the number of learning points is increased to 64. Figure 4a,b showcase the training and prediction durations, respectively, as a function of number of learning (training) points. Figure 4d compares the precision and calculation time offered by the single and multi-layer optimized neural networks, interpolation techniques, and the exact finite difference method (for more details on the figure, and the architecture details, please refer to [36]). As it is clear from this figure, the time taken by the exact method (orange bar) is enormous when compared to the neural network and interpolation methods. In addition, we can see that the neural networks with two and three-layers offer good compromise between calculation time and accuracy. The precision values provided by the multilayer neural networks are one to two orders higher than the interpolation techniques, although they consume slightly more calculation time.

Optical Mode Profile
Apart from modal classifications and effective refractive index calculations, another essential task in integrated photonics is the calculation of the optical mode profiles. Unlike effective refractive index prediction, where the task is to predict a single value for a given waveguide geometry and polarization, here the deep-learning model must be able to predict a two-dimensional array of values corresponding to the distribution of the electric field. In [38], a recurrent neural network (RNN) was employed to accomplish this task.
The input to the model is the geometrical parameters of the waveguide, and the output is the field values (array). The recurrent connection helps to establish the field continuity between the adjacent pixels. Figure 5 summarizes the key results.

Inverse Design of Photonic Structures: Deep Generative Models
In recent years, the deep learning (DL) has gained momentum in the design of photonics structures. Deep neural networks (DNNs) outperform established methods in discovering new photonics structures from massive data to achieve optimal optical performance. In this part, we review deep generative models (DGMs) for the inverse design of photonics structures, providing some insights as to how these deep generative models can solve the inverse design problems. We then provide a discussion on the current limitations and future directions of using DGMs for the inverse design of photonics structures.
DGMs are the methods combined generative models with DNNs, and they have achieved great success in only few years. The DGMs leverage the DNNs to learn a function that is capable of approximating the model distribution to the true data distribution of the training set so that new data points are generated with some variations. Generative Adversarial Nets (GANs) [53], variational autoencoders (VAEs) [60], and auto-regressive models [61] are popular DGMs, with the former two most commonly used.
GANs have been applied to the inverse design of metasurface nanostructures [62] and dielectric metasurfaces [63]. VAEs have been employed to the inverse design of double-layered chiral metamaterials [64]. GANs and VAEs mainly differ in the way of training generative models, but GANs produce better results as compared to VAEs. GANs introduces a novel way to train generative models. They consist of a generator G and a discriminator D, with G and D being trained in an adversarial manner. G generates a structure pattern x = G(z) from a random noise vector z, and D classifies the pattern x as synthesized (from G) or real (from training data). G attempts to fool D by producing patterns that cannot be distinguished from the ones in the training data. D is discarded once the networks are trained. Conditional GANs (cGANs) are the most widely used variation of GANs, that are constructed by simply adding conditional vector along with the noise vector. The inverse design of photonics structures requires that G outputs a structure pattern with desired optical response rather than a pattern generated randomely from random sample of noise z. cGANs are the way to do that. By conditioning G on target response y so that G outputs a reconstructionx = G(y). An ANN based metamodel was trained to approximate the optical and chromatic response of a hybrid subwavelength grating (HSWG) structure [65]. It can serve as a surrogate model for fast spectral performance prediction in the cGANs for inverse design. Deep Convolutional GANs (DCGANs) are the GANs employing Convolutional Neural Network (CNN) architecture. They are comprised of many convolutional, deconvolutional and fully connected layers.
There are problems with GANs and one of them is stabilizing their training. The accuracy of the generative model depends on both the number and quality of training data. However GANs are indeed powerful, being used in a variety of tasks. Unifying GANs and VAEs allows obtaining the best of both models. In the next few years, DGMs will be very helpful for the inverse design of various photonics devices.

Photonics for AI: Using Photonics Computing to Implement AI Algorithms
In the previous sections, we investigated how AI algorithms, such as deep learning, can be utilized in the design and optimization of photonics devices [33,66]. Here, we will review some of the recent research work in photonics computing that demonstrates how photonics systems are used to implement the AI algorithms. There is, thus, a strong synergy between photonics and AI: AI can be used to accelerate the design process of photonics devices, and photonics, in turn, can be used to implement the AI algorithms and to boost the performance of the AI systems due to some of the inherent advantages of photonics, such as lower latency compared with electronics.
Artificial intelligence-in particular, deep learning [67]-brought about a step-function improvement in image classification in the early 2010s and has since made a much deeper and broader influence on how businesses and organizations operate [68]. Studies project that AI will massively impact the global economy in the 2020s as a result of increased productivity in almost all industry sectors [69]. We can expect to see major investment from both the public and private sector and a shift towards an "AI first" strategy [70], where AI techniques will be adopted to improve outcomes beyond traditional techniques.
The success of deep learning has been attributed to the convergence of three important factors: (1) advances in deep learning research, (2) access to large high quality datasets, and (3) the advent of hardware deep learning accelerators. GPUs and ASICs (e.g., TPUs [71]), which are special purpose hardware for machine learning applications, have been deployed in data centers for nearly 5 years. Worryingly, a study in 2018 by OpenAI highlighted the astonishing growth of compute resources used by AI, showing a 3.5 month doubling time, which was faster than Moore's law [72]. Moreover, Moore's law scaling is expected to end within the next decade due to fundamental limits to transistor scaling as well as related bottlenecks in power dissipation and interconnect bandwidth [73]. The study called into question the economic and environmental sustainability of the exponential growth of compute demand and underscored the need to consider non-conventional computing architectures to continue to drive future innovations in AI.
At the same time, integrated silicon photonics has made significant advances over the past decade. The performance of silicon photonics devices and systems continues to improve, driven mainly by high bandwidth requirements from tele-communications and data center interconnects applications [74,75]. Silicon photonics foundry fabrication processes are also becoming more mature and reliable [76,77], such that the community is looking to develop applications beyond data interconnects, for example LIDAR [78], quantum computing [79] and in particular, machine learning and neuromorphic computing [80,81].
Using photonics for deep learning and neuromorphic computing has many potential advantages over conventional electronics, namely, large bandwidth, low latency, and nearly lossless interconnects [82]. Moreover, spatial and wavelength multiplexing enables high speed high throughput information processing, e.g., the multiplication and summation of signals, which is a critical operation in deep learning.

Current Developments in Photonics Computing
Most hardware machine learning accelerators currently in deployment are highthroughput parallel processors, like GPUs and TPUs. Research into non von Neumann architectures like neuromorphic computing, in which aspects of the design mimic principles present in biological neural networks, is ongoing. Among these, the memristor crossbar array stands out as a promising candidate [83]. Multiplication and summation is implemented following Ohm's law at each cross-point and Kirchhoff's current law at each column. However, engineering challenges remain as the variability and non-ideal characteristic of memristive devices make large scale arrays difficult to realise. Nonetheless, several impressive recent demonstrations indicate progress towards scalability [84,85]. On the other hand, several studies have projected that the fundamental limits of analog optical computing are on equal footing or even slightly advantageous when compared to memristors [86,87].
Early studies on optical computing and optical neural networks date back to the late 1980s [88,89], and the technology was touted to enable a new generation of faster computers [90]. Inopportunely, the first optical neural networks were implemented with bulk optical components, which are large, slow and unstable and, hence, were unfeasible to scale to large and densely connected networks. Decades of major breakthroughs in integrated electronics manufacturing has led to its dominance in computing today and a circumspect view of the prospects of computing with optics [91]. However, the limits of continued transistor scaling and recent developments in integrated silicon photonics has breathed new life into the field of optical information processing. Advances in silicon photonics foundry processes has enabled demonstrations of high-speed silicon photonics integrated circuits with thousands of elements with reconfigurable functions [92,93]. Leveraging on the silicon photonics technology platform, several research groups, start-ups and large companies have begun work on optical neural networks over the latter part of the last decade. Table 2 shows a summary of proposed and demonstrated photonics neural network architectures from recent published works. We give an overview of these different architectures in our next section.

Photonic Accelerator
Matrix multiplications are one of the most common operations in high performance computing. In fact, current deep-learning models rely heavily on computing matrix multiplications, but conventional computers using the von Neumann architecture are not optimized for such calculations. Data and instructions need to be moved over metal line interconnects from the memory cache to the CPU (or matrix multiplication units) and back. Moreover, metal lines have to be charged and discharged at energy cost of CV 2 [82]. Thus, conventional electronic computers suffer from high communication overheads and high latencies. On the other hand, linear transformations, including matrix computations [112] and Fourier transforms [113], can be performed with high speed and high throughput using photonics-typically orders of magnitude better than electronics [86,87]. For example, photonics devices typically have bandwidth of 20 GHz and the propagation delay through a photonics chip is on order of 100 ps. In view of these advantages, there have been several published works on using photonics as co-processors to accelerate expensive and time consuming computations [94][95][96][97][98]. Indeed, several high profile startups are currently working on photonics chips as drop-in replacements for electronic deep learning accelerators like TPUs [114][115][116]. The great interest from industry indicates the near term promise of photonics accelerators. Ironically, the main bottlenecks for photonics accelerators are the periphery electronics. For example, the analog nature of the photonics computation requires low power and high bandwidth digital-analog and analog-digital conversion circuits and amplifier circuits for modulation and detection. Another challenge is that deep-learning models have to be adapted to be able to accept the low precision and potentially noisy analog computations [117][118][119]. Photonic accelerators are hybrid digital-analog systems, which makes sense when working with digital data. However, if the incoming data is inherently analog, for example in sensing applications, then all-optical neural network analog information processing that bypasses D/A and A/D conversions could have advantages.

Coherent Feed-Forward Neural Network
Implementations of optical neural networks can be generally classified as coherent or incoherent. Coherent circuits make use of constructive or destructive interference in multi-port interferometers to implement linear transformations like matrix multiplications. A very general interferometric device is a regular mesh of Mach-Zehnder interferom-eters (MZIs), which has been shown to be able to implement any unitary matrix [120]. Then, using singular value decomposition, real valued matrices that are used in most neural networks can be implemented [99]. Additionally, there are proposals to use these Mach-Zehnder interferometers to implement novel architectures, such as unitary neural networks [121] and quantum neural networks [122]. A major challenge for such coherent circuits is the sensitivity of the performance to fabrication imprecision and additional loss, with several groups proposing more robust designs [100,101]. This requires O(N 2 ) MZIs to implement a N×N square matrix, which limits the maximum possible network size on a chip. Furthermore, integrating all-optical nonlinear activations into the coherent circuit remains an area of active research [96].

Continuous-Time Recurrent Neural Network
Incoherent circuits use non-interfering signal carriers (e.g., wavelengths, polarizations, modes) to perform weighted summations. One prominent architecture uses wavelength division multiplexing to collect and distribute signals weighted by micro-ring resonators [103]. The collected wavelengths are summed by photodetectors measuring the total power. The photodetector current then drives a modulator acting as a nonlinear node whose output is fed back into the network [104]. This kind of feedback network implements a continuous-time recurrent neural network. Implementing densely connected layer with N number of input and output nodes requires N 2 micro-ring resonators, which have to be individually thermally controlled as they are highly sensitive to inevitable fabrication imprecision. Additionally, N independent wavelength channels have to fit within one free-spectral range, which have to be sufficiently spaced to inhibit inter-channel cross-talk. These strict component requirements means that scaling up such an architecture faces severe challenges.

Spiking Neural Network with Phase-Change Materials
Another architecture uses waveguides embedded with phase-change materials, which can be switched between non-volatile multi-level transmission states by optical pulses [105]. The phase-change material thus stores the weight values of the neural network and also acts as a nonlinear activation function. Additionally, researchers were able to demonstrate a simplified spike-timing-dependent plasticity by overlapping pulses in time, to show unsupervised learning [106]. Recent demonstrations use wavelength division multiplexing micro-rings for signal collection, distribution and summation and hence face similar challenges as above. Phase change materials like GST-PCMs have a limited number of programmable states (3 bit) and hence are more desirable for low precision neural networks. Engineering phase change materials with low-switching powers and fast response times is on-going research. Furthermore, phase change materials, although technically "CMOS compatible", are not currently standard in the photonics foundry process. Existing proposals advocate for the integration of phase change materials into the photonics platform [123], but it is unlikely to be a short term endeavor.

Reservoir Computing
Reservoir computing is a framework that is closely related to recurrent neural networks. The reservoir is a tunable nonlinear dynamical system with randomly interconnected nodes that takes input data and projects it into a high dimensional feature space. Predictions from the reservoir are a linear combination of the observed states of the reservoir obtained by linear regression. Recent demonstrations of photonics reservoirs include a network of spiral waveguides [107] and several proposals exist that use networks of micro-resonators [108,109]. Another proposed reservoir computer consists of a multimodal optical waveguide, with the large number of optical modes acting as reservoir nodes and random coupling due to light scattering behaving as the random connectivity in the network [104,105]. Several challenges exist for scaling up photonics reservoirs, for example, loss accumulation in the reservoir. There is also difficulty in observing the node states to train the readout weights in an efficient manner.

On-Chip Fourier Transform and Convolutions Using Star Couplers
In convolution neural networks (CNN), the convolution layer uses far fewer learned parameters than a fully connected layer, due to the shared weights architecture, while showing equal or better performance in classification tasks. CNNs are a promising approach for optical neural networks, as convolutions can be implemented with Fourier optics [113]. Additionally, the shared weights allow high performance while reducing the total required components. The generic model of a CNN and the implementation of such network photonics is illustrated in Figure 6. We recently proposed the use of a N × N star coupler [124,125], as shown in Figure 7, which is a diffractive component, to perform the Fourier transform [113] and (by the convolution theorem) the convolution operation, to be used in convolutional neural networks (CNNs) [126]. To implement convolution, (1) a first star coupler will transform the input data to the Fourier space, (2) phase and/or amplitude modulators will apply the kernel filters, and (3) a second star coupler will transform the data back to configuration space. Compared to a typical MZI implementation of the (unitary) Fourier transform, our simulations predict a footprint reduction by tens of times when using the star coupler. This considerable reduction in footprint not only saves physical space on the chip but also reduces accumulated propagation loss as light has to travel a shorter distance, making deep neural networks feasible to be implemented using coherent photonics integrated circuits. Details of our work can be found in [126].

AI Beyond Photonics
Sections 2 and 3 introduce how AI methods can help in photonics simulation and design, and how photonics, in turn, can help AI algorithms for acceleration. In addition to photonics, AI techniques have been widely applied in many other photonics-related fields, such as electromagnetics, quantum, etc. This section will briefly summarize the recent AI algorithms that have been applied to Computational Electromagnetic (CEM), RF components, Electromagnetic Compatibility (EMC) and Electromagnetic Interference (EMI), and quantum-related topics.

AI for Computational Electromagnetic Solvers: Forward and Inverse
The three most popular computational electromagnetic methods for solving forward electromagnetic problems are Finite-difference time-domain method (FDTD) [127], finite element method (FEM) [128], and method of moment (MoM) [129]. No matter which method is chosen, the differential and/or integral form of Maxwell's equations, along with the computational domain, will be discretized into linear matrice. With the EM modeling becomes more complicated and larger, considerable computational resources and more computational time are required. To improve the computational efficiency at the cost of accuracy, AI techniques can be a good solution, especially when the forward EM problems need repetitive simulations.
For frequency-domain forward EM solver, machine-learning based MoM was reported to solve the static parasitic capacitance extraction and dynamic electromagnetics scattering problems [130,131]. Barmada et al. used trained CNN to output the boundary conditions (BCs) for the BC condition of a reduced finite-element method (FEM) model [132]. 1-D to 3-D Electrostatic problems based on Poisson equation solved by CNN instead of using FEM have been reported [133][134][135][136]. Additionally, Unet is a popularly used neural network for image processing, such as image segmentation [137], colorization [16], etc. It is reported that a 2D-Unet based Finite-difference Frequency-domain (FDFD) solver is employed to evaluate scattered EM fields and achieved 2000 times efficiency gain compared with conventional FDFD method [138] .
For time-domain forward EM solvers, a recurrent convolutional neural network (RCNN), a neural network widely used in time series data, is used to replace FDTD solver for solving scattering problems [139], as the time-marching scheme is suitable for RNNs in nature. In [140,141], ML-based GPR forward solvers are used to simulate the ground penetrating radar in real time, trained by data generated using FDTD solver with tuning parameters including fractal dimension of water fraction, the height of the antenna, etc. Absorbing boundary, such as Mur and PML (CPML,UPML) are essential for most CEM methods. Without high-performance absorbing boundary, CEM methods are unable to solve various EM problems with truncated computational domain. However, the additional boundaries usually lead to the increase of the computational domain with additional artificial layers. Deep-learning-based methods are used to improve the efficiency of FDTD methods with absorbing boundaries [142][143][144].
However, most forward EM solvers using AI cannot solve general EM problems due to (1) most solvers are trained by limited data and the trained data are generated by limited parameters; (2) the trained AI model are constrained for some specific application/problem; (3) the accuracy is low compared with full-wave solvers, especially when the training data is limited. On the other hand, many AI techniques for EM inverse scattering problems (ISP) [145] are reported and show great improvement in both accuracy and efficiency [22]. Similar to the topics of inverse problems in image processing topics, AI methods are suitable for various ISP by nature. Chen et al. summarize the state-of-the-art methods of solving ISPs with deep learning methods in the review paper [22] and discuss how to combine neural networks with knowledge of the underlying physics as well as traditional non-learning techniques.

AI for Microwave Devices: Design, Optimization, and Applications
Microwave Devices, such as antenna, filters, and amplifiers are the most common RF components used in wireless communication systems. With the development of advanced semiconductor techniques, more and more compact microwave devices are demanded to integrate with the other wireless communication devices. Thus, the microwave devices are designed with more complicated structures, smaller sizes and optimal performances, and the difficulty of simulation, optimization, and design has been dramatically increased accordingly. To this end, many researchers leverage AI to assist on the optimization and design of various microwave devices.
Optimization is usually the last phase of microwave device design, using AI techniques, such as ANNs for various EM-based designs has drawn much attention since 2004 [146]. Combining the time-domain solver, such as FDTD and Transmission Line Matrix (TLM) methods [147,148] with neural networks, the design of microwave devices can be optimized more efficiently. In order to optimize the microwaves devices efficiently, a fast-forward solver is usually required to avoid extensive simulations. To this end, AI can be applied to learn from a collection of simulated data to replace a time-consuming full-wave solver. In [149], ANN is adopted as the surrogate model to the time-consuming electromagnetic model to speed up the homotopy filter optimization process. A dual bandpass filter is designed using ANN to extract the filter transfer function [150]. In [151], DNN with smooth rectified linear unit (ReLU) activation function is proposed to extract the coupling matrix for multi-coupled cavity filter based on the desired S parameters. The smooth ReLU function avoids the discontinuity of the derivative of the DL model with the conventional ReLU function. In [152], DNN is used to obtain the S-parameters from geometrical variables of filters and the operating frequency. Unlike the methods using DNNs, Chen et al. extract the coupling matrix of fourth-order and sixth-order coupling filters using a manifold Gaussian process (MGP) ML method based on differential evolution (DE) algorithm [153]. Similarly, the forward simulation of slotted waveguide antenna [154], patch antenna [155], and planar inverted F-antenna (PIFA) [156] are reported as alternatives using DNNs at an acceptable cost of accuracy. In [157], a semi-supervised co-training algorithm based on Gaussian process (GP) and support vector machine (SVM) by using a few labeled samples to obtain a relatively high-precision surrogate model is proposed for the optimal design of Yagi microstrip antenna (MSA) and GPS Beidou dual-mode MSA.
Deep reinforcement learning (DRL) methods, one of the significant parts of deep learning method in recent years, have been applied in the automatical design of some microwave devices. Ohira et al. propose a deep Q-learning network (DQN) based finetuning method for bandpass filter design using two NNs by supervised learning. One of the two NNs is the forward model, also named as environment, for calculating the coupling matrix from the filter structural parameters [158]. The other is the inverse model based on DQN for tuning the filter by giving the optimal actions according to the reward for a certain state from the environment, named as agent. The discontinuous action space of the method is to change the structural parameters by increasing or decreasing 0.05 mm. Similarly, in [159], a double deep Q-learning (DDQN) approach is proposed to fine tune microwave cavity filters. DQN is one of the value-based methods in DRL. In addition to value-based DRL, another class of the DRL method is the policy gradient method. Wang et al. [160] present a framework based on deep deterministic policy gradient for tuning cavity filters, where continuous action space is valid. The Experience Replay and the Target Network of DQN is preserved to ensure the stability of the algorithm based on their previous work [161]. Unlike tuning the limited filter parameters, such as width and length of resonators, Liu et al. [12] developed a relational induction neural network (RINN) as the agent of the DRL method. Microwave components, such as filters and antennas can be designed with curved shapes and achieve the design goals, such as S-parameters and antenna gain. The structure of the microwave devices is defined as a set of parameterized mesh, and when each mesh changes, the simulation result is calculated by EM solvers, such as ADS or Ansys EM.

AI for Electromagnetic Compatibility (EMC) and Electromagnetic Interference (EMI) Applications
The sky-rocketing increase in device complicity and operation frequency bring demands for machine learning (ML)-based techniques with groundbreaking efficiency improvement. The complex electromagnetic behavior in signal integrity (SI), power integrity (PI), and EMC make it difficult to categorized and describe the coupling nature with machine learning directly. Machine learning has been successfully facilitated in tackling EMC problems in many aspects with superior performance [20].
Machine learning is utilized in the performance evaluation of large-scale integrated circuits [177,178], and semiconductors [179]. Recently, a DNN-based macro modeling approach for the "black-box" problem [180] is proposed with the partial element equivalent circuit (PEEC) model [181]. ML also shows the capability in solving the optimization problems in EMI and SI/PI analysis [9,10]. In [182,183], the electromagnetic interference of PCB is analyzed by neural networks. The behavior of nonlinear circuit elements is predicted with a radial basis function neural network in [184]. Echo state networks and SVMs were proposed in [185,186] to model electromagnetic immunity of integrated circuits. A knowledge-based CNN is applied to establish and classify the entities to organized the EMC management [187]. In [188,189], the channel modeling and optimization are carried with an ANN to extract the lump circuit element matrix. Random-fuzzy uncertainty quantification is modeled with Bayesian optimization to propagate the uncertainty on the performance of the system [190]. The source reconstruction is essential while challenging for interference prediction in EMC, emission, and immunity analysis, which are formulated as regression problems governed by field integral equations. DNN, ANN, or generic [10,191,192] based algorithms are proposed with superior accuracy or efficiency.

AI for Quantum Related Topics
The development of AI, especially its subfield of ML, has revolutionized many areas in science and engineering. The research of AI and quantum photonics and beyond can be categorized into three parts: (1) quantum machine learning [193], which utilizes dedicated quantum computers as a new computing model to speed up machine learning; (2) utilization of machine learning to solve quantum physics problems [34,194]; (3) application of machine learning in the development of quantum devices and quantum computers. The review focuses on the third part. The methodology to utilize machine learning methods in design and optimization of photonics devices [195] has been reviewed in the previous sections. The sub-section will further discuss the application of machine learning in quantum control and quantum information processing.
Quantum systems are controlled by unitary operations engineered by a set of physical operations on the quantum systems. Quantum control enables quantum devices to perform the physical operations for information processing and quantum computation. To efficiently manipulate the quantum devices and programmable quantum computers, a large set of unitary operations has to be designed to control the devices to perform intended operations or stay in intended quantum states. This is a non-trivial task as the superposition of quantum state spans in a continues space with limited driving options in general. Haug et al. demonstrates an approach to prepare a continuous set of quantum states using deep reinforcement learning [196]. In order to achieve a robust control to qubits or gate operations, designed control protocol has to consider potential noises coming from the qubit environment or the control and readout processes. Wise et al. utilizes deep learning to extract the noise spectrum associated with a qubit surrounded by an arbitrary bath and mitigate the impact of qubit noise environment in the development of dynamical decoupling protocols [197]. For given noise models, August and Ni trains a recurrent neural network to optimize the sequences of dynamical decoupling to suppress errors in quantum memory [198]. Kim et al. uses neural networks to infer the amount of probability adjustment on the measurement and improve the accuracy of noisy intermediate-scale quantum (NISQ) algorithms without relying on extensive error characterization [199].
With the development of cloud-based quantum computing hardware [200,201], variational quantum-based algorithms (VQAs) [202], such as variational quantum eigensolver (VQE) have emerged as a promising candidate to achieve a practical quantum advantage over classical algorithms [203,204]. However , it is too challenging to run advanced deep neural networks over the existing quantum computing platforms due to the intractability of deep quantum circuits. Chen et al. [205] showed a proof-of-principle demonstration of variational quantum circuits to approximate the deep Q-value function of a DQN deep reinforcement learning . Lockwood and Si [206] explored pure and hybrid quantum algorithms for DQN and Double DQN, and found both hybrid and pure quantum variational circuit can solve reinforcement learning tasks with a smaller parameter space. With the development of quantum techniques, it is expected that more and more AI methods could be used in super-fast quantum computers.

Conclusions
AI has become a heated focus in and beyond photonics research in the past few years. As we have discussed in this review article, photonics benefits a great deal from AI for efficient soft computing and inverse design. Meanwhile, AI, in turn, benefits from photonics by carrying out AI algorithms, such as complicated deep neural networks using photonics components that use photons rather than electrons. We introduced the applications of AI on photonics modeling, simulation using soft computing, and inverse design based on a GAN model. Beyond the photonics domain, the other related research areas or topics governed by Maxwell's equations share remarkable similarities in using the help of AI. The studies in computational electromagnetics, the design of microwave devices, EMC/EMI, and quantum computing greatly benefit from AI. We investigated the applications of AI for the forward and inverse CEM methods, the modeling and simulation of RF components (antennas, filters, etc.) and EMC/EMI problems using deep-learning models, inverse RF component design based on deep reinforcement learning, and a brief introduction on the recent AI techniques for quantum fields. We believe the relationship between AI and physics will continue to flourish in a mutually advancing manner both in photonics and beyond. Acknowledgments: This work is supported by RGANS1901, "AI-Enabled Electronic-Photonic IC Design". This work is also supported by the A*STAR RIE2020 Advanced Manufacturing and Engineering (AME) Programmatic Fund [A20H5g2142].

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: