Key Technologies of Photonic Artiﬁcial Intelligence Chip Structure and Algorithm

: Artiﬁcial intelligence chips (AICs) are the intersection of integrated circuits and artiﬁcial intelligence (AI), involving structure design, algorithm analysis, chip fabrication and application scenarios. Due to their excellent ability in data processing, AICs show a long-term industrial prospect in big data services, cloud centers, etc. However, with the conceivable exhaustion of Moore’s Law, the size of traditional electronic AICs (EAICs) is gradually approaching the limit, and an architectural update is highly required. Photonic artiﬁcial intelligence chips (PAIC) utilize light beam propagation in the silicon waveguide, contributing to a high parallelism conﬁguration, fast calculation speed and low latency. Due to light manipulation, PAICs perform well in anti-electromagnetic interference and energy conservation. This invited paper summarized the recent research on PAICs. The characteristics of different hardware structures are discussed. The current widely used training algorithm is given and the Photonic Design Automatic (PDA) simulation platform is introduced. In addition, the authors’ related work on PAICs is presented and we believe that PAICs may play a critical role in the deployment of data processing technology.


Introduction
The capacity of computing systems is in an arms race with the massively growing amount of data. The AIC is considered to be an effective way to embrace the data explosion era and provides excellent ability in data computing. However, the traditional EAIC is based on electronic computing, which is gradually entering the bottleneck period with the upcoming limit of Moore's law [1,2]. Integration size and large power consumption bring big challenges in calculation ability and data processing capacity for EAIC. Unlike electrical interconnects in EAICs, the PAIC is constructed by optical waveguide interconnects, forming optical neural networks (ONN) or optical elemental devices to realize optical linear or nonlinear computing. It offers great potential for orders of magnitude improvement in energy conservation and computing capacity due to its natural parallel processing, being less susceptible to interference, free superposition, etc. [3]. In addition, with the growth of nanophotonics technology, the PAIC has been applied in various practical scenarios, assisted by the prospect of an algorithm and material [4][5][6].
PAICs include the hardware's structural design and the matched optimization algorithm. The hardware's structure refers to accomplishing the computing by changing the phase and intensity of the light signal through optical devices. The matching algorithm refers to data training and module control. In PAIC design, the compatibility of the structure and algorithm is also significant. Before fabrication, in order to obtain a workable PAIC, a system-level simulation platform is necessary to simulate the whole performance.

Hardware Structures for Linear Operation
ONN is formed by passive optical waveguide interconnects and has natural advantages for the linear operation. Due to the different computing tasks, the ONN is mainly classified into three types: feedforward neural network (FNN), convolutional neural network (CNN), and recurrent neural network (RNN). The networks can be implemented by optical waveguides. Among them, the response function of a single MZI can be expressed as a Jones matrix [9] and thus, the MZI array is an excellent candidate for linear matrix multiplication [10][11][12]. In the case of SOA, modulation and amplification are used to control the intensity or phase of the input signal and complete the matrix multiplication [13,14]. MRR can also be operated according to different wavelengths. In MRR, optical signals of different wavelengths are modulated, and the calculation is completed [15][16][17]. Furthermore, complex 3D-routed waveguides are created by two photon polymerization [18,19], which efficiently connects many IO channels. When the channels are added by a combiner, the 3D waveguide works as a summator. Above all, these waveguides are deployed for ONN, intending to realize linear operation. For the three types of ONN, the scheme is shown in Table 1. The feedforward neural network (FNN) is the simplest one-way neural network, including an input layer, hidden layer and output layer. The signal transmits unidirectionally from the input layer to the output layer. The essence of realizing the matrix multiplication Appl. Sci. 2021, 11, 5719 3 of 13 function is to complete the multiplication and accumulation (MAC). PFNN can be realized by the MZI array, SOA, MRR and 3D waveguide.
In the case of PFNN, the MZI is cascaded without reflection. The light stays in forward propagation. In PAICs, linear computing is implemented in the MZI array as shown in Figure 1 [20]. The input optical signal with multiple channels propagates in the paralleled MZI waveguide. The phase changes when passing through the MZI array. In this way, the output matrix can be obtained by completing the linear operation. In 2017, Shen proposed and demonstrated the first photonic interference computing unit chip based on the MZI array [21]. The whole network uses an array of 56 MZIs and 213 phase shifters to complete the matrix operation through abundant phase changes.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 3 of 13 ally from the input layer to the output layer. The essence of realizing the matrix multiplication function is to complete the multiplication and accumulation (MAC). PFNN can be realized by the MZI array, SOA, MRR and 3D waveguide.
In the case of PFNN, the MZI is cascaded without reflection. The light stays in forward propagation. In PAICs, linear computing is implemented in the MZI array as shown in Figure 1 [20]. The input optical signal with multiple channels propagates in the paralleled MZI waveguide. The phase changes when passing through the MZI array. In this way, the output matrix can be obtained by completing the linear operation. In 2017, Shen proposed and demonstrated the first photonic interference computing unit chip based on the MZI array [21]. The whole network uses an array of 56 MZIs and 213 phase shifters to complete the matrix operation through abundant phase changes. In the basic idea of SOA, the MAC operation is realized by a single amplifier. Once the amplifier is cascaded, it enables a large-scale MAC operation. Generally, weighting is expressed by the attenuation or gain. At the output, the light of the different channels with multiple weighting is summed through the wavelength division multiplexer. Thus, SOA can complete the N × M matrix multiplication. The FNN based on SOA is illustrated in Figure 2. By adjusting the drive current of the SOA, the gain coefficient changes. It can be considered that the variation of the gain coefficient corresponds to the change in the transmission matrix coefficient. Then, through the arrayed waveguide grating (AWG) or multimode interference optical coupler (MMI), the weighted results of multiple SOAs are summed. In this way, the weighted summation of each row in the matrix is obtained. This realizes the matrix's multiplication operation [22].  [22]. The chip integrates 8 weighted addition circuits, which are used for 8 WDM input vectors and provide 8 WDM outputs. SOA is used to assist the weight sum of each layer. Highlight the block in use in gray.
C. the micro-ring resonator (MRR). In the basic idea of SOA, the MAC operation is realized by a single amplifier. Once the amplifier is cascaded, it enables a large-scale MAC operation. Generally, weighting is expressed by the attenuation or gain. At the output, the light of the different channels with multiple weighting is summed through the wavelength division multiplexer. Thus, SOA can complete the N × M matrix multiplication. The FNN based on SOA is illustrated in Figure 2. By adjusting the drive current of the SOA, the gain coefficient changes. It can be considered that the variation of the gain coefficient corresponds to the change in the transmission matrix coefficient. Then, through the arrayed waveguide grating (AWG) or multimode interference optical coupler (MMI), the weighted results of multiple SOAs are summed. In this way, the weighted summation of each row in the matrix is obtained. This realizes the matrix's multiplication operation [22]. ally from the input layer to the output layer. The essence of realizing the matrix multiplication function is to complete the multiplication and accumulation (MAC). PFNN can be realized by the MZI array, SOA, MRR and 3D waveguide.
In the case of PFNN, the MZI is cascaded without reflection. The light stays in forward propagation. In PAICs, linear computing is implemented in the MZI array as shown in Figure 1 [20]. The input optical signal with multiple channels propagates in the paralleled MZI waveguide. The phase changes when passing through the MZI array. In this way, the output matrix can be obtained by completing the linear operation. In 2017, Shen proposed and demonstrated the first photonic interference computing unit chip based on the MZI array [21]. The whole network uses an array of 56 MZIs and 213 phase shifters to complete the matrix operation through abundant phase changes. In the basic idea of SOA, the MAC operation is realized by a single amplifier. Once the amplifier is cascaded, it enables a large-scale MAC operation. Generally, weighting is expressed by the attenuation or gain. At the output, the light of the different channels with multiple weighting is summed through the wavelength division multiplexer. Thus, SOA can complete the N × M matrix multiplication. The FNN based on SOA is illustrated in Figure 2. By adjusting the drive current of the SOA, the gain coefficient changes. It can be considered that the variation of the gain coefficient corresponds to the change in the transmission matrix coefficient. Then, through the arrayed waveguide grating (AWG) or multimode interference optical coupler (MMI), the weighted results of multiple SOAs are summed. In this way, the weighted summation of each row in the matrix is obtained. This realizes the matrix's multiplication operation [22].  [22]. The chip integrates 8 weighted addition circuits, which are used for 8 WDM input vectors and provide 8 WDM outputs. SOA is used to assist the weight sum of each layer. Highlight the block in use in gray.
C. the micro-ring resonator (MRR).  [22]. The chip integrates 8 weighted addition circuits, which are used for 8 WDM input vectors and provide 8 WDM outputs. SOA is used to assist the weight sum of each layer. Highlight the block in use in gray.
The implementation scheme based on MRR is similar to that of SOA. MRR works at different wavelength when the MRR's length or the radius are adjusted. The MRR is featured in wavelength selection. The light after different MRRs forms different channels with forward propagation. Then, through the arrayed waveguide grating (AWG) or multimode interference optical coupler (MMI), the light is summated with multiple channels. A matrix operation is completed. The transmittance of MRR can be adjusted by thermal adjustment [23,24], ESC [25], phase adjustable materials, etc.
The PFNN based on MRR is illustrated in Figure 3 [26]. The input signals are ported into different MRRs, and then, each channel carries a signal with a specific wavelength. After nonlinear modulation, the output matrix is obtained. In this structure, photonic neurons' output signals are fixed to certain wavelengths. With the WDM signal porting in the network, each connection between a pair of neurons is independently configured by an MRR weight, and each channel has a signal monitor. After that, the operation is completed.
The implementation scheme based on MRR is similar to that of SOA. MRR works at different wavelength when the MRR's length or the radius are adjusted. The MRR is featured in wavelength selection. The light after different MRRs forms different channels with forward propagation. Then, through the arrayed waveguide grating (AWG) or multimode interference optical coupler (MMI), the light is summated with multiple channels. A matrix operation is completed. The transmittance of MRR can be adjusted by thermal adjustment [23,24], ESC [25], phase adjustable materials, etc.
The PFNN based on MRR is illustrated in Figure 3 [26]. The input signals are ported into different MRRs, and then, each channel carries a signal with a specific wavelength. After nonlinear modulation, the output matrix is obtained. In this structure, photonic neurons' output signals are fixed to certain wavelengths. With the WDM signal porting in the network, each connection between a pair of neurons is independently configured by an MRR weight, and each channel has a signal monitor. After that, the operation is completed.  [26]. MRR is mainly responsible for the weight configuration of neural network, and the red part indicates that the weight can be changed through the external environment. The whole network is integrated except for bit pumped lasers. D. 3D waveguide.
In addition, Yu et al. [27] and Moughames et al. [28] construct a feedforward neural network through a 3D waveguide written directly by a laser, as shown in Figure 4. At present, there are only two layers of network, so the signal is a one-way transmission. The multiple IO channels are finally combined into one output port. Three-dimensional waveguide achieves the goal of dimension expansion, but the signal is still a one-way transmission. The input signal is summed by the N × 1 beam combiner. This completes the interconnection between different layers and the waveguides works as a summator.  [28]. A small network hosting simple coupler. Chirality of the connections avoids the intersection of individual waveguides between the input and output ports.  [26]. MRR is mainly responsible for the weight configuration of neural network, and the red part indicates that the weight can be changed through the external environment. The whole network is integrated except for bit pumped lasers. D. 3D waveguide.
In addition, Yu et al. [27] and Moughames et al. [28] construct a feedforward neural network through a 3D waveguide written directly by a laser, as shown in Figure 4. At present, there are only two layers of network, so the signal is a one-way transmission. The multiple IO channels are finally combined into one output port. Three-dimensional waveguide achieves the goal of dimension expansion, but the signal is still a one-way transmission. The input signal is summed by the N × 1 beam combiner. This completes the interconnection between different layers and the waveguides works as a summator.
The implementation scheme based on MRR is similar to that of SOA. MRR works at different wavelength when the MRR's length or the radius are adjusted. The MRR is featured in wavelength selection. The light after different MRRs forms different channels with forward propagation. Then, through the arrayed waveguide grating (AWG) or multimode interference optical coupler (MMI), the light is summated with multiple channels. A matrix operation is completed. The transmittance of MRR can be adjusted by thermal adjustment [23,24], ESC [25], phase adjustable materials, etc.
The PFNN based on MRR is illustrated in Figure 3 [26]. The input signals are ported into different MRRs, and then, each channel carries a signal with a specific wavelength. After nonlinear modulation, the output matrix is obtained. In this structure, photonic neurons' output signals are fixed to certain wavelengths. With the WDM signal porting in the network, each connection between a pair of neurons is independently configured by an MRR weight, and each channel has a signal monitor. After that, the operation is completed. Figure 3. MRR wavelength division multiplexer [26]. MRR is mainly responsible for the weight configuration of neural network, and the red part indicates that the weight can be changed through the external environment. The whole network is integrated except for bit pumped lasers. D. 3D waveguide.
In addition, Yu et al. [27] and Moughames et al. [28] construct a feedforward neural network through a 3D waveguide written directly by a laser, as shown in Figure 4. At present, there are only two layers of network, so the signal is a one-way transmission. The multiple IO channels are finally combined into one output port. Three-dimensional waveguide achieves the goal of dimension expansion, but the signal is still a one-way transmission. The input signal is summed by the N × 1 beam combiner. This completes the interconnection between different layers and the waveguides works as a summator.   [28]. A small network hosting simple coupler. Chirality of the connections avoids the intersection of individual waveguides between the input and output ports.
The basic principle is similar to the corresponding photonic feedforward neural network. The essence of convolution is matrix operation. The MZI array is used to implement matrix multiplication operations. The signal after matrix decomposition is input into the MZI array by segments. The front and back operations are implemented by using cascaded MZI. This completes a convolution operation. In 2018, Bagherian utilized chips to extend the original simple fully connected neural network to high-dimensional image recognition. It uses time division multiplexing to complete convolution computing. The main steps are as follows: matrix multiplication based on the MZI array, image convolution by time division multiplexing, and construction of the convolutional neural network layers (five convolutional layers and one fully connected layer). This realizes the recognition of colored numbers 0-9 [29].

B. MRR
In the CNN [34], MRR completes matrix operation by the weight of the micro-ring. The multiplexed wavelengths enter the MRR array. The amplitude of each wavelength is multiplied by its corresponding micro-ring weight, and then, output. The multiplication is realized by adjusting the resonant peaks of the micro-rings to their respective laser wavelengths [31]. Figure 5 shows the convolution operation with input feature mapping. This greatly facilitates the calculation.
The basic principle is similar to the corresponding photonic feedforward neural network. The essence of convolution is matrix operation. The MZI array is used to implement matrix multiplication operations. The signal after matrix decomposition is input into the MZI array by segments. The front and back operations are implemented by using cascaded MZI. This completes a convolution operation. In 2018, Bagherian utilized chips to extend the original simple fully connected neural network to high-dimensional image recognition. It uses time division multiplexing to complete convolution computing. The main steps are as follows: matrix multiplication based on the MZI array, image convolution by time division multiplexing, and construction of the convolutional neural network layers (five convolutional layers and one fully connected layer). This realizes the recognition of colored numbers 0-9 [29].

B. MRR
In the CNN [34], MRR completes matrix operation by the weight of the micro-ring. The multiplexed wavelengths enter the MRR array. The amplitude of each wavelength is multiplied by its corresponding micro-ring weight, and then, output. The multiplication is realized by adjusting the resonant peaks of the micro-rings to their respective laser wavelengths [31]. Figure 5 shows the convolution operation with input feature mapping. This greatly facilitates the calculation.

Photonic Recurrent Neural Network (PRNN)
The recurrent neural network is also known as the reservoir calculation. The information transmits forward and backward to form a loop structure. The reservoir is mainly composed of an input layer, middle layer and output layer. Only the output weights are trained in the RNN. For photonic storage pool network, there are two structures: one is the parallel scheme, and the other is the serial scheme, as shown in Figure 6 Different from the previous two neural networks, the reservoir network is mainly used for dimension transformation of data.

A. Parallel scheme
The nodes in the parallel structure's reservoir can be implemented by SOA [35], silicon-based micro-ring resonators [36], silicon-based waveguide delay lines [37], etc. Optical reservoirs based on SOA and MRR, respectively, utilize the gain saturation of SOA and the nonlinear effects of MRR (free carrier dispersion and thermo-optical effects). No matter what kind of device, the simplest way to realize interconnection is to return the output signal to the input node. In this way, feedback can be achieved within the network. When there are multiple inputs, parallel operation is realized by inputting different signals.
B. Serial scheme

Photonic Recurrent Neural Network (PRNN)
The recurrent neural network is also known as the reservoir calculation. The information transmits forward and backward to form a loop structure. The reservoir is mainly composed of an input layer, middle layer and output layer. Only the output weights are trained in the RNN. For photonic storage pool network, there are two structures: one is the parallel scheme, and the other is the serial scheme, as shown in Figure 6 Different from the previous two neural networks, the reservoir network is mainly used for dimension transformation of data.

A. Parallel scheme
The nodes in the parallel structure's reservoir can be implemented by SOA [35], siliconbased micro-ring resonators [36], silicon-based waveguide delay lines [37], etc. Optical reservoirs based on SOA and MRR, respectively, utilize the gain saturation of SOA and the nonlinear effects of MRR (free carrier dispersion and thermo-optical effects). No matter what kind of device, the simplest way to realize interconnection is to return the output signal to the input node. In this way, feedback can be achieved within the network. When there are multiple inputs, parallel operation is realized by inputting different signals.

B. Serial scheme
The nodes of the serial structure's reservoir operation can be implemented using modulators, SOA, etc. In recent years, the delay of serial loop RNN based on MRR or MMI has made great progress. At the same time, this type of photonic RNN also tries to use multi-stage or more complex time division multiplexing to further accelerate computing speed.
The nodes of the serial structure's reservoir operation can be implemented u modulators, SOA, etc. In recent years, the delay of serial loop RNN based on MRR or M has made great progress. At the same time, this type of photonic RNN also tries to multi-stage or more complex time division multiplexing to further accelerate compu speed.
(a) (b) Figure 6. Optical reservoir plan (a) MRR structure [36]. The data matrix is input into the reservoir network composed of MRR, and after network circulation, the data matrix is output. (b) Silicon-based delay line [37]. The data are input into a network composed of delay lines, which make the front and back input signals interact with each other. PD is used to detect and convert optical signal into electrical signal, which is processed by a micro-processor.

Comparison of three Linear ONNs
The above is the analysis of the structural characteristics of the three neural netwo From the analysis, it can be seen that PFNN is the simplest of the three network is the easiest structure to implement. However, because of its simple structure, the c putation of the matrix is limited.
The PCNN can complete convolution operation on account of its complex hid layers, which is the core of convolution. However, it also causes the complexity of structure. Furthermore, because the hidden layers participate in the operation, the op tion time will be longer than PFNN.
PFNN and PCNN mostly use MZI and MRR to realize matrix operation. For an M its system is relatively simple and has stronger versatility. However, due to excessive during cascading, it is not suitable for large-scale integration. For MRR, its size is relati small and large-scale integration is easier. However, it is more sensitive to temperat There are great challenges in achieving precise control.
Different from PFNN and PCNN, PRNN has the structural characteristics of a re rent network. Its application is to enrich or compress data dimensions. It can also be u to deal with tasks related to time series. Its structure uses a reservoir. For the reserv the internal principle utilizes random projection to transform the dimensionality of data. Therefore, there is no need for complicated control, and it has strong fault toler for integrated technology. However, due to this special principle, it is difficult to app most occasions.

Hardware Structures for Nonlinear Operation
The linear operation of a neural network can solve relatively simple problems. nonlinear activation function is the root of the artificial neural network's powerful exp sion ability. This affects the speed of network convergence and the accuracy of reco tion. As shown in Figure 7, the position of nonlinearity in the system is behind the li neural network.  [36]. The data matrix is input into the reservoir network composed of MRR, and after network circulation, the data matrix is output. (b) Silicon-based delay line [37]. The data are input into a network composed of delay lines, which make the front and back input signals interact with each other. PD is used to detect and convert optical signal into electrical signal, which is processed by a micro-processor.

Comparison of three Linear ONNs
The above is the analysis of the structural characteristics of the three neural networks. From the analysis, it can be seen that PFNN is the simplest of the three networks. It is the easiest structure to implement. However, because of its simple structure, the computation of the matrix is limited.
The PCNN can complete convolution operation on account of its complex hidden layers, which is the core of convolution. However, it also causes the complexity of the structure. Furthermore, because the hidden layers participate in the operation, the operation time will be longer than PFNN.
PFNN and PCNN mostly use MZI and MRR to realize matrix operation. For an MZI, its system is relatively simple and has stronger versatility. However, due to excessive loss during cascading, it is not suitable for large-scale integration. For MRR, its size is relatively small and large-scale integration is easier. However, it is more sensitive to temperature. There are great challenges in achieving precise control.
Different from PFNN and PCNN, PRNN has the structural characteristics of a recurrent network. Its application is to enrich or compress data dimensions. It can also be used to deal with tasks related to time series. Its structure uses a reservoir. For the reservoir, the internal principle utilizes random projection to transform the dimensionality of the data. Therefore, there is no need for complicated control, and it has strong fault tolerance for integrated technology. However, due to this special principle, it is difficult to apply to most occasions.

Hardware Structures for Nonlinear Operation
The linear operation of a neural network can solve relatively simple problems. The nonlinear activation function is the root of the artificial neural network's powerful expression ability. This affects the speed of network convergence and the accuracy of recognition. As shown in Figure 7, the position of nonlinearity in the system is behind the linear neural network. There are a lack of practical photonic devices that express nonlinear functions. Currently, photon calculation is an optoelectronic hybrid computing architecture. The nonlinear calculation part is all performed in the electrical domain. In this way, multiple photoelectric and electro-optical conversions are involved in the network. This not only limits the speed of the photonic neural network, but also brings additional energy consumption. Ideally, some optical nonlinear devices can realize the nonlinear calculation, and they There are a lack of practical photonic devices that express nonlinear functions. Currently, photon calculation is an optoelectronic hybrid computing architecture. The nonlinear calculation part is all performed in the electrical domain. In this way, multiple photoelectric and electro-optical conversions are involved in the network. This not only limits the speed of the photonic neural network, but also brings additional energy consumption. Ideally, some optical nonlinear devices can realize the nonlinear calculation, and they have the features of low threshold, reconfigurability, easy integration, and fast response. The current ideas are divided into two types: special materials (saturated absorbers and two-dimensional graphene materials) and the combined structure of the optical modulator (MZI, MRR and SOA).

Nonlinear Operation Based on Special Material
The main special materials used to realize nonlinear operation are a saturated absorber and two-dimensional graphene materials. Nonlinearity can be realized by placing the special material in the dotted line of Figure 7.
The saturated absorber mainly uses its transmission characteristics. The absorption coefficient has a reverse relationship with the intensity of the incident light. The transmission coefficient increases with the growth of the optical power. Thus, the Relu function can be realized [21]. Two-dimensional graphene materials work analogous to the saturated absorber. However, compared with the saturated absorber, graphene has the advantages of low threshold and that it is easy to be excited in nonlinear effects.

Nonlinear Operation Based on Optical Modulator
Another way to realize nonlinear structure is to use the optical modulator. The optical modulators that can realize nonlinearity are placed in the dotted line of Figure 7. The basic principle is that the light signal of the weighted sum is converted into a voltage by the photodetector, and then, applied to the optical modulator. This affects the transmission spectrum. The realization of the optical nonlinear function is by changing the transmittance of the optical signal through the modulator. The structures of optical modulators can be implemented by MZI modulators [38], electro-absorption modulators [39], and micro-ring resonator modulators [40], as shown in Figure 8. Different nonlinear functions can be realized by changing the bias voltage of the modulator. There are a lack of practical photonic devices that express nonlinear functions. Currently, photon calculation is an optoelectronic hybrid computing architecture. The nonlinear calculation part is all performed in the electrical domain. In this way, multiple photoelectric and electro-optical conversions are involved in the network. This not only limits the speed of the photonic neural network, but also brings additional energy consumption. Ideally, some optical nonlinear devices can realize the nonlinear calculation, and they have the features of low threshold, reconfigurability, easy integration, and fast response. The current ideas are divided into two types: special materials (saturated absorbers and two-dimensional graphene materials) and the combined structure of the optical modulator (MZI, MRR and SOA).

Nonlinear Operation Based on Special Material
The main special materials used to realize nonlinear operation are a saturated absorber and two-dimensional graphene materials. Nonlinearity can be realized by placing the special material in the dotted line of Figure 7.
The saturated absorber mainly uses its transmission characteristics. The absorption coefficient has a reverse relationship with the intensity of the incident light. The transmission coefficient increases with the growth of the optical power. Thus, the Relu function can be realized [21]. Two-dimensional graphene materials work analogous to the saturated absorber. However, compared with the saturated absorber, graphene has the advantages of low threshold and that it is easy to be excited in nonlinear effects.

Nonlinear Operation Based on Optical Modulator
Another way to realize nonlinear structure is to use the optical modulator. The optical modulators that can realize nonlinearity are placed in the dotted line of Figure 7. The basic principle is that the light signal of the weighted sum is converted into a voltage by the photodetector, and then, applied to the optical modulator. This affects the transmission spectrum. The realization of the optical nonlinear function is by changing the transmittance of the optical signal through the modulator. The structures of optical modulators can be implemented by MZI modulators [38], electro-absorption modulators [39], and micro-ring resonator modulators [40], as shown in Figure 8. Different nonlinear functions can be realized by changing the bias voltage of the modulator.  [38], (b) electro-absorption modulator [39], (c) micro-ring cavity modulator [40].
In 2019, Alexandris et al. realized the sigmoid nonlinear function based on the serial structure of MZI composed of two SOAs and a single SOA [41]. The schematic diagram is shown in Figure 9. This structure is mainly based on the cross-phase modulation effect and cross-gain modulation effect of SOA. When the input pulse width is very small, within the integration window, the arrangement of the input light pulse signal and the  [38], (b) electro-absorption modulator [39], (c) micro-ring cavity modulator [40].
In 2019, Alexandris et al. realized the sigmoid nonlinear function based on the serial structure of MZI composed of two SOAs and a single SOA [41]. The schematic diagram is shown in Figure 9. This structure is mainly based on the cross-phase modulation effect and cross-gain modulation effect of SOA. When the input pulse width is very small, within the integration window, the arrangement of the input light pulse signal and the number of pulses will affect the pulse width of the output signal. SOA has a high nonlinear coefficient, and at the same time, has a gain effect on optical signals.
In 2019, Alexandris et al. realized the sigmoid nonlinear function based on the serial structure of MZI composed of two SOAs and a single SOA [41]. The schematic diagram is shown in Figure 9. This structure is mainly based on the cross-phase modulation effect and cross-gain modulation effect of SOA. When the input pulse width is very small, within the integration window, the arrangement of the input light pulse signal and the number of pulses will affect the pulse width of the output signal. SOA has a high nonlinear coefficient, and at the same time, has a gain effect on optical signals. Figure 9. SOA-based implementation of the nonlinear function. SOA is placed in the two arms of MZI, which mainly performs phase modulation. The input signal and the reference signal go through the 3dB coupler, then are input into the two SOA for phase modulation. Cross-modulation is completed in SOA3.

Comparison of Two Nonlinear Types
From the above analysis, we can see that the two types of nonlinear structures have their own advantages and disadvantages.
The nonlinear structure based on the special materials is easier to be integrated into the chip. However, it is difficult to obtain, and it also has higher requirements for the integrated environment.
The nonlinear structure based on the optical modulator is easier to be adjusted and obtain nonlinearity. However, the structure is larger and it is harder to integrate.

Training Algorithm
Currently, in terms of algorithm training, the simulation model of the photonic computing network is trained on the electronic computer. Then, the trained model parameters are loaded onto the photonic chip. However, even if it is trained in the electrical domain, its effect is still restricted by two aspects: the accuracy of the simulation model's description and the computing speed. The training algorithm includes forward propagation [42], finite difference calculation gradient (MIT), in situ back propagation [43] and gradient measurement (Stanford). These attempts have essentially been completed at the level of computer simulation, and have not been used for training on actual physical chips.
The training algorithm problem of the photonic network is a restrictive factor for expanding the application of the PAIC. Photons cannot be stored like electrons. We cannot directly record the state of photons. Therefore, backward propagation algorithms that are Figure 9. SOA-based implementation of the nonlinear function. SOA is placed in the two arms of MZI, which mainly performs phase modulation. The input signal and the reference signal go through the 3dB coupler, then are input into the two SOA for phase modulation. Cross-modulation is completed in SOA3.

Comparison of Two Nonlinear Types
From the above analysis, we can see that the two types of nonlinear structures have their own advantages and disadvantages.
The nonlinear structure based on the special materials is easier to be integrated into the chip. However, it is difficult to obtain, and it also has higher requirements for the integrated environment.
The nonlinear structure based on the optical modulator is easier to be adjusted and obtain nonlinearity. However, the structure is larger and it is harder to integrate.

Training Algorithm
Currently, in terms of algorithm training, the simulation model of the photonic computing network is trained on the electronic computer. Then, the trained model parameters are loaded onto the photonic chip. However, even if it is trained in the electrical domain, its effect is still restricted by two aspects: the accuracy of the simulation model's description and the computing speed. The training algorithm includes forward propagation [42], finite difference calculation gradient (MIT), in situ back propagation [43] and gradient measurement (Stanford). These attempts have essentially been completed at the level of computer simulation, and have not been used for training on actual physical chips.
The training algorithm problem of the photonic network is a restrictive factor for expanding the application of the PAIC. Photons cannot be stored like electrons. We cannot directly record the state of photons. Therefore, backward propagation algorithms that are widely used in electrical neural network training are difficult to directly transplant to photonic artificial intelligence network training. In order to solve this problem, Hughes et al. proposed an on-chip training algorithm in 2018 [43]. By recording the light field distribution and the phase distribution of the phase shifter, we can obtain the gradient value that decreases toward the convergence direction. Then, we calculate the phase configuration of the chip phase shifter in the next iteration to gradually converge to a better result. Hughes et al. trained a specific two-optical interference unit (OIU) neural network on the chip through simulation. It implements exclusive OR logic to verify the effectiveness of the algorithm.
Zhang et al. proposed an effective training algorithm based on neuro-evolution strategy in 2019 [44]. It uses genetic algorithms (GA) and particle swarm optimization (PSO) to train the hyperparameters in ONNs and optimize the connection weights. The trained ONNs are used to complete the classification task for performance evaluation. The calculation results show that its accuracy and stability are sufficient to compete with traditional learning algorithms. The system also uses the photonic artificial intelligence network to realize the classification of the modulation format of the communication signal. In the future, this algorithm can be further expanded and transplanted to larger-scale PAICs. It can gradually obtain the best configuration of the chip through on-chip training to complete specific functions. Figure 10 shows the results of autonomous learning using a gradient descent algorithm. It can be seen that with iterative learning, signal recovery is improving [42].
Appl. Sci. 2021, 11, x FOR PEER REVIEW 9 of 13 widely used in electrical neural network training are difficult to directly transplant to photonic artificial intelligence network training. In order to solve this problem, Hughes et al.
proposed an on-chip training algorithm in 2018 [43]. By recording the light field distribution and the phase distribution of the phase shifter, we can obtain the gradient value that decreases toward the convergence direction. Then, we calculate the phase configuration of the chip phase shifter in the next iteration to gradually converge to a better result. Hughes et al. trained a specific two-optical interference unit (OIU) neural network on the chip through simulation. It implements exclusive OR logic to verify the effectiveness of the algorithm.  [44]. It uses genetic algorithms (GA) and particle swarm optimization (PSO) to train the hyperparameters in ONNs and optimize the connection weights. The trained ONNs are used to complete the classification task for performance evaluation. The calculation results show that its accuracy and stability are sufficient to compete with traditional learning algorithms. The system also uses the photonic artificial intelligence network to realize the classification of the modulation format of the communication signal. In the future, this algorithm can be further expanded and transplanted to larger-scale PAICs. It can gradually obtain the best configuration of the chip through on-chip training to complete specific functions. Figure 10 shows the results of autonomous learning using a gradient descent algorithm. It can be seen that with iterative learning, signal recovery is improving [42].

Software Simulation Platform
Efficient algorithms and a variety of network models are important cornerstones to support the continued development of photonic chips. At present, photonic computing networks are mainly simulated and trained through electronic computers. Then, the trained model parameters are loaded onto the photonic chip. By far, there are two main software simulation platforms in use: one is IPKISS [45], and the other is INTERCON-NECT. In addition to the above two commonly used simulation tools, the Institute of Microelectronics of the Chinese Academy of Sciences has designed a system-level simulation and verification tool, named PDA.
PDA designers use Python to package various models of optical devices. Users can modify the parameters and interface of the devices, and connect the device by a simple function statement. It can realize the extremely complex network structure at the link level, or the simulation task of the framework. After the completion of the system, each optical path can be monitored and the simulation diagram in the time domain can be output. This greatly facilitates the observation and analysis of the experimental results. Because of the flexibility of Python, PDA can also interact with MATLAB to complete the simulation according to the operators' different needs. PDA also has the advantage that it Figure 10. Results of algorithm training [42]. Three different training times are selected with 50 times,100 times and 200 times.

Software Simulation Platform
Efficient algorithms and a variety of network models are important cornerstones to support the continued development of photonic chips. At present, photonic computing networks are mainly simulated and trained through electronic computers. Then, the trained model parameters are loaded onto the photonic chip. By far, there are two main software simulation platforms in use: one is IPKISS [45], and the other is INTERCONNECT. In addition to the above two commonly used simulation tools, the Institute of Microelectronics of the Chinese Academy of Sciences has designed a system-level simulation and verification tool, named PDA.
PDA designers use Python to package various models of optical devices. Users can modify the parameters and interface of the devices, and connect the device by a simple function statement. It can realize the extremely complex network structure at the link level, or the simulation task of the framework. After the completion of the system, each optical path can be monitored and the simulation diagram in the time domain can be output. This greatly facilitates the observation and analysis of the experimental results. Because of the flexibility of Python, PDA can also interact with MATLAB to complete the simulation according to the operators' different needs. PDA also has the advantage that it can be used in conjunction with the open-source layout tool Klayout to realize the complete chip design process of the layout driver and schematic driver. This provides great convenience for the overall development and design of the chip. It is worth mentioning that it is easy to operate.

Our Work in Lab
Although the PAIC has great potential, there is still no mature system technology for it. In order to seek a new breakthrough, we have performed a considerable amount of research work on the system-level function of PAIC.
First, we have designed and optimized the aspect of hardware structure. It is composed of MZI and a reservoir to complete the calculation and transform the dimension of data. This is the important part for the preprocessing of input data. According to the design structure, an integrated optical operation module is built, as shown in Figure 11a. data. This is the important part for the preprocessing of input data. According to the de sign structure, an integrated optical operation module is built, as shown in Figure 11a.
Second, we have written a training algorithm to realize the cooperation of softwar and hardware. In the aspect of the algorithm, a gradient descent algorithm is designed t train the output data. This training algorithm functions on both the control module an the operation module, which is shown in Figure 11b. The work's principle is shown i Figure 11c. Lastly, based on the comprehensive analysis of devices, circuits and algorithms, PAIC is formed, as shown in Figure 12a. It is mainly composed of a control module an optical operation module, and the data preprocessing of the reservoir network is included Second, we have written a training algorithm to realize the cooperation of software and hardware. In the aspect of the algorithm, a gradient descent algorithm is designed to train the output data. This training algorithm functions on both the control module and the operation module, which is shown in Figure 11b. The work's principle is shown in Figure 11c.
Lastly, based on the comprehensive analysis of devices, circuits and algorithms, a PAIC is formed, as shown in Figure 12a. It is mainly composed of a control module and optical operation module, and the data preprocessing of the reservoir network is included.

Our Work in Lab
Although the PAIC has great potential, there is still no mature system technology for it. In order to seek a new breakthrough, we have performed a considerable amount of research work on the system-level function of PAIC.
First, we have designed and optimized the aspect of hardware structure. It is composed of MZI and a reservoir to complete the calculation and transform the dimension of data. This is the important part for the preprocessing of input data. According to the design structure, an integrated optical operation module is built, as shown in Figure 11a.
Second, we have written a training algorithm to realize the cooperation of software and hardware. In the aspect of the algorithm, a gradient descent algorithm is designed to train the output data. This training algorithm functions on both the control module and the operation module, which is shown in Figure 11b. The work's principle is shown in Figure 11c. Lastly, based on the comprehensive analysis of devices, circuits and algorithms, a PAIC is formed, as shown in Figure 12a. It is mainly composed of a control module and optical operation module, and the data preprocessing of the reservoir network is included. The self-developed PAIC is applied to intelligent computing. The results show that it can complete image segmentation, image recognition and other functions effectively. The processed image is shown in Figure 12b. It mainly monitors whether workers are wearing safety helmets. In a specific example test, the marked 6057 image data are divided into five groups and input into the algorithm model. The average test accuracy is 89.7%, and the energy efficiency ratio of algorithm deployment is 1.23 Tops/W.

Conclusions
In conclusion, PAICs have great advantages due to their power consumption and small size. They have aroused considerable attention from scholars. This paper mainly summarizes the structural design, algorithm matching of PAICs and the software platforms that can be used in large-scale simulation. Additionally, we put forward the related work of our laboratory, an integrated information processing system (Self-developed PAIC), which provides one useful solution for the in-depth research of PAICs. They contain much significance and unlimited possibilities. We believe that with the help of PAICs, the optical computer is in sight.
Author Contributions: Conceptualization, L.P. and B.B.; data curation, J.W. and X.Z.; writingoriginal draft preparation, Z.X.; writing-review and editing, L.P., T.N., J.Z. and J.L. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.