A Convolution Neural Network Implemented by Three 3 × 3 Photonic Integrated Reconﬁgurable Linear Processors

: The convolution neural network (CNN) is a classical neural network with advantages in image processing. The use of multiport optical interferometric linear structures in neural networks has recently attracted a great deal of attention. Here, we use three 3 × 3 reconﬁgurable optical processors, based on Mach-Zehnder interferometers (MZIs), to implement a two-layer CNN. To circumvent the random phase errors originating from the fabrication process, MZIs are calibrated before the classiﬁcation experiment. The MNIST datasets and Fashion-MNIST datasets are used to verify the classiﬁcation accuracy. The optical processor achieves 86.9% accuracy on the MNIST datasets and 79.3% accuracy on the Fashion-MNIST datasets. Experiments show that we can improve the classiﬁcation accuracy by reducing phase errors of MZIs and photodetector (PD) noises. In the future, our work provides a way to embed the optical processor in CNN to compute matrix multiplication.


Introduction
The computing power of state-of-the-art Artificial Intelligence (AI) equipment increases gradually (doubling every 3.5 months on average). The CNN is used broadly in the decision-making of image classification, speech recognition, and self-driving cars. However, the computational complexity of the CNN is high. The throughput and energy efficiency ratio may soon become a new bottleneck to the electrical device. Matrix multiplication is an essential and computationally intensive step in CNN. Due to the inherent parallelism of optics, the silicon photonic device is a promising optimization platform for linear multiplication and addition calculations (MAC) to reduce computation time from O(N 2 ) to O(1) [1,2].
Implementing any linear transformation matrix through an on-chip reconfigurable multiport interferometer has been researched plentifully in neural networks. Optical neural networks have proven to be promising in computational speed and power efficiency, allowing for increasingly large neural networks. In 2017, Shen et al. proposed an integrated and programmable MZI-based nanophotonic circuit to realize MAC of electrical fully connected neural networks, and its accuracy is 76.7% [3]. In 2018, Bagherian et al. proposed the concept of an all-optical CNN based on an MZI-based nanophotonic circuit, which reduces a fraction of energy compared to state-of-the-art electronic devices [4]. In 2019, Shokraneh et al. implemented a 4 × 4 MZI-based optical processor used in a single-layer neural network [5]. The experimental results show that the optical processor achieves 72% classification accuracy. In 2020, Nahmias et al. investigated the limits of analog electronic crossbar arrays and on-chip photonic linear computing systems, providing a

Convolution Neural Network
The CNN was proposed firstly by LeCun et al. to handle complex tasks such as image classification [10]. A CNN consists of successive convolution layers, pooling, nonlinearities, and a final fully connected layer (FCL). The input image is stored across w channels (When w = 1, it represents a gray image. When w = 3, it represents a color image with red, green, or blue intensities). There are N convolutional layers, and every convolutional layer contains multiple channels. The L + 1(L∈N) convolutional layer's values of nodes in each channel are computed using the information from all channels in the L convolutional layer. Thus, the value of a node Z L+1,B on channel B in layer L + 1 is computed as Z L+1,B = Act((Z L,1 , Z L,2 , Z L,3 , · · · Z L,N L ; K L+1,B,1 , K L+1,B,2 , K L+1,B,3 , · · · K L+1,B,N L ) + b L+1,B ) (1) where Act(*) is an activation function, b L+1,B ∈R is a bias associated with output node, and where (Z L,1 , Z L,2 , Z L,3 , · · · Z L,N L ; K L+1,B,1 , K L+1,B,2 , K L+1,B,3 , · · · K L+1,B,N L ) =Z L,1 ⊗ K L+1,B,1 + Z L,2 ⊗ K L+1,B,2 + Z L,3 ⊗ K L+1,B,3 + · · · + Z L,N L ⊗ K L+1,B,N L (2) where Z L,NL is the r × r matrix on channel N L in layer L. K L+1,B,NL is the N L -th n × n kernel on channel B (B = 1,2,3 . . . , N L+1 ) in layer L + 1, and where The n × n kernel K L+1,B,NL slides vertically and horizontally on the input r × r Z L,NL . The n × n kernel K L+1,B,NL divides the r × r Z L,NL into q (q = (r − n + 1) 2 ) subparts, when the stride = 1 and without padding. Convolution can be realized by linear combinations (e.g., matrix multiplication). FCL maps the convolution output to a set of classification outputs. As we know, the superposition of several small kernels reduces computational complexity when the connectivity remains unchanged. However, overly small kernels cannot represent the map's characteristics. Thus, multiple suitable kernels are chosen in convolution. The neural network's kernel size and number depend on the input features' dimension. The MNIST handwritten digital datasets and Fashion-MNIST datasets are used in the input layer based on the CNN model. Here, the MNIST datasets is a handwritten digital dataset composed of numbers 0-9. It contains a training set of 60,000 samples and a test set of 10,000 samples. Each image in the MNIST datasets contains 28 × 28 pixels, and these numbers are normalized and fixed in the center. The Fashion-MNIST datasets are tencategory clothing datasets. It is the same with the MNIST datasets on the number of training sets, test sets, and image resolutions. However, unlike the MNIST datasets, the Fashion-MNIST datasets are no longer an abstract number symbol but a specific clothing type. Tables 1 and 2 show the classification accuracy of the MNIST handwritten digital datasets and the Fashion-MNIST datasets with different kernel number and kernel size. As can be seen from Tables 1 and 2, classification accuracy increases with the kernel number when the kernel number is less than 3. When the kernel number is 3 with a 3 × 3 kernel size, the classification accuracy reaches its maximum value. Thus, three 3 × 3 convolution kernels for each layer are chosen to construct CNN.

Reconfigurable Linear Optical Processors
To understand the structure of MZI-based reconfigurable optical processors, we provide details of single MZI and matrix decomposition based on MZI.
Each phase-modulated MZI consists of two 50:50 beam-splitter operators B (blue) and two phase-shift operators R θ , R ϕ (orange) (depicted in Figure 1), with required ranges of 0 ≤ θ ≤ π and 0 ≤ ϕ < 2π, respectively. R θ is an internal phase shifter between the two arms of MZI, which controls the output modes' splitting ratio. R ϕ between two MZI controls the relative phase of the output mode. The MZI's transfer matrix T(θ, ϕ) can be expressed as Σ . Eleven grating couplers (GCs) labeled 2 to 10 on the left side provide optical input ports. GCs labeled 1 and 11 are used for alignment testing during packaging. Fifteen PDs on the right side extract electrical signals from the chip. PD 1, PD 5, PD 6, PD 10, PD 11, and PD 15 are used to calibrate voltages of the internal phase shifters (θi) of MZIs.  We define the transmissivity and reflectivity of the MZI as when θ = π, refle = 1, trans = 0 (MZI is on "bar-state (BS)"), and when θ = 0, refle = 0, trans = 1 (MZI in on "cross-state (CS)"). This MZI can implement any matrix in the special unitary group of degree two (i.e., SU (2)), composed of all complex square matrices whose conjugate transpose is equal to its inverse (unitary) and with a determinant equal to 1 (special unitary) [11]. The convolution kernel is a real-valued matrix. Real-valued matrix (M) may be decomposed by singular value decomposition (SVD) as where U is an m × m unitary matrix, V T is the complex conjugate of the n × n unitary matrix V, Σ is an m × n diagonal matrix with non-negative-real numbers on the diagonal. Universal N-D unitary matrix can be implemented using a mesh of N (N − 1)/2 MZIs proposed by Reck et al. [12] or a mesh of N (N − 1)/2 proposed by Clements et al. [13]. Optical attenuators or optical amplification materials can be used to implement Σ [14]. Any N-D unitary matrix U can be decomposed into: where (m, n) is a position and represents the rotation and translation operation of the U matrix' m and n rows (or columns). Thus, S defines a specific ordered sequence. k is the serial number of MZI, D(γ 1 , γ 2 , . . . , γ N ) is a diagonal matrix with complex elements with a modulus equal to one on the diagonal. Matrix products for a sequence {O (k) } can be expressed: Photonics 2022, 9, 80 5 of 11 T (k) m,n (θ k , ϕ k ) can be expressed as: We choose three 3 × 3 convolution kernels for each layer to implement CNN. Any 3 × 3 real-valued kernel M j 3×3 (j = 1, 2, 3) can be illustrated: Figure 1 depicts the schematic of the designed three 3 × 3 integrated reconfigurable linear processors. The structure comprises six SU (3) and three diagonal matrix multiplication for implementing three 3 × 3 kernels. As shown in Figure 1, the SU (3) contains the MZIs labeled 1 to 3 (or 1 to 3 ), constructing the unitary transformation matrix U 3×3 (or V T 3×3 ). While the middle section consists of MZIs labeled 4 to 6, implementing a non-unitary diagonal matrix Σ 3×3 . Eleven grating couplers (GCs) labeled 2 to 10 on the left side provide optical input ports. GCs labeled 1 and 11 are used for alignment testing during packaging. Fifteen PDs on the right side extract electrical signals from the chip. PD 1, PD 5, PD 6, PD 10, PD 11, and PD 15 are used to calibrate voltages of the internal phase shifters (θ i ) of MZIs.

Training and Simulation
The two-layer convolutional neural network is trained on the computer. The training progress is based on the abstract model of a CNN. Then, kernels obtained by training are converted into the MZIs' phase in the photonic network. The abstract model's input matrix, output matrix, and transmission matrix (kernel) correspond to the photonic network's optical intensity signal (domain of optical powers). The transmission matrix (kernel) can be realized by controlling the output optical intensity of MZI. The output optical intensity of MZI is realized by changing phase shifters' modulation voltage. We need to find a relationship between the modulation voltage and the output optical intensity of the MZI.
We used the INTERCONNECT (Ansys. Lumerical. Cor, V2020) to simulate the optical circuit. The basic parameters are set as follows: with the output optical intensity of MZI, when voltage = V π~V2π . Thus, the whole system works on this range. need to find a relationship between the modulation voltage and the output optical intensity of the MZI.
We used the INTERCONNECT (Ansys. Lumerical. Cor, V2020) to simulate the optical circuit. The basic parameters are set as follows: optical source: wavelength = 1550 nm; power = 27 mW; half-height width = 20 pm; detector: response wavelength = 1550 nm; and the dark current = 20 nA.
As shown in Figure 2a, the transmission of an MZI can be tuned by the voltage of the MZI's internal phase shifter. The modulation voltage nearly has a linear relationship with the output optical intensity of MZI, when voltage = Vπ~V2π. Thus, the whole system works on this range. The phase errors σθ and the PD noises (σD) affect the classification accuracy of the two-layer CNN implemented by the optical processor. Here, the classification accuracy of the network is simulated with different σθ and σD. As shown in Figure 2b, the classification accuracy of the network achieved 98%, when σθ ≤ 0.01 (yellow region) and σD ≤ 0.025. The phase error σθ = 0.01 corresponds to an 8-bit modulation accuracy of the phase modulation voltage, affected by the half-wave voltage VHalf (=V2π − Vπ). PD error σD generally originates from the dark current. Here, σD = 0.025 corresponds to 25 nA dark current.

Experimental
The device in this work operates at a wavelength of 1550 nm and fabricates on an SOI substrate with 220 nm × 450 nm cross-section. As mentioned above, the reconfigurable MZI-based optical processor is a mesh of tunable MZIs with 2 × 2 ports. Each MZI has two heater-based phase shifters Rθ and Rφ, which control the output power and two MZI outputs relative phase, respectively. To reduce the loss and enhance the robustness of the device against processing errors, MZI's beam splitter adopts a 3 dB multimode interference (MMI). The relationship between the split ratio and MMI structure is simulated by FDTD (Ansys. Lumerical. Cor, V2020) through adjusting the coupler length, the The phase errors σ θ and the PD noises (σ D ) affect the classification accuracy of the two-layer CNN implemented by the optical processor. Here, the classification accuracy of the network is simulated with different σ θ and σ D . As shown in Figure 2b, the classification accuracy of the network achieved 98%, when σ θ ≤ 0.01 (yellow region) and σ D ≤ 0.025. The phase error σ θ = 0.01 corresponds to an 8-bit modulation accuracy of the phase modulation voltage, affected by the half-wave voltage V Half (=V 2π − V π ). PD error σ D generally originates from the dark current. Here, σ D = 0.025 corresponds to 25 nA dark current.

Experimental
The device in this work operates at a wavelength of 1550 nm and fabricates on an SOI substrate with 220 nm × 450 nm cross-section. As mentioned above, the reconfigurable MZI-based optical processor is a mesh of tunable MZIs with 2 × 2 ports. Each MZI has two heater-based phase shifters R θ and Rϕ, which control the output power and two MZI outputs relative phase, respectively. To reduce the loss and enhance the robustness of the device against processing errors, MZI's beam splitter adopts a 3 dB multimode interference (MMI). The relationship between the split ratio and MMI structure is simulated by FDTD (Ansys. Lumerical. Cor, V2020) through adjusting the coupler length, the coupler width, and the taper width. As shown in Figure 3b, the 3 dB coupler is achieved (the two output ports' optical intensity are equal and maximum), when the coupler width equals 5.1 um, the taper width equals 1.3 um, and the coupling length equals 45 um or 90 um. Considering the processing error, the coupler length is set to be 90 um. Every MZI has 66 µm wide and 672 µm long, with two identical 150 µm heaters. The device can be reconfigured by applying voltages to the phase shifters, and electrical pads connect DC voltages and heaters. A fiber array with eleven ports couples vertically to the grating couplers. Fifteen PDs are silicon-doped lateral PIN Ge PD on the right side. Three 3×3 kernels have been realized in a silicon photonic platform through multi project wafer (MPW) in CUMEC (China Cor) [15]. Figure 3a depicts a microscope image of three V T 3×3 parts and three Σ 3×3 parts of three 3 × 3 reconfigurable linear optical processors. This device has a total of 27 thermo-optic phase shifters connected to electrical pads, where 18 of them belong to V T 3×3 and 9 of them for realizing ∑ 3×3 . The chip has been packaged, and all electrical pads have been wire bonded.
To program the device experimentally, it is essential to determine the required DC voltages of every phase shifter. Exact accuracy on controlling phase shift is challenging in the experiment due to several error sources, including voltage fluctuations, the thermal crosstalk between MZIs, and fabrication process variations. Here, internal phase shifters (θ i ) of MZIs are analyzed mainly, which control the optical power splitting ratio (transmission) at the outputs of MZIs. For the external phase shifters with ϕ i , an optical vector analyzer can be used to determine the required DC voltages directly [5]. Thus, before programming the device for a given application, θ i of all MZIs must be calibrated first [16,17].
(the two output ports' optical intensity are equal and maximum), when the coupler width equals 5.1 um, the taper width equals 1.3 um, and the coupling length equals 45 um or 90 um. Considering the processing error, the coupler length is set to be 90 um. Every MZI has 66 µm wide and 672 µm long, with two identical 150 µm heaters. The device can be reconfigured by applying voltages to the phase shifters, and electrical pads connect DC voltages and heaters. A fiber array with eleven ports couples vertically to the grating couplers. Fifteen PDs are silicon-doped lateral PIN Ge PD on the right side. Three 3×3 kernels have been realized in a silicon photonic platform through multi project wafer (MPW) in CUMEC (China Cor) [15]. Figure 3a depicts a microscope image of three V T 3×3 parts and three 3 3 × Σ parts of three 3 × 3 reconfigurable linear optical processors. This device has a total of 27 thermo-optic phase shifters connected to electrical pads, where 18 of them belong to V T 3×3 and 9 of them for realizing ∑3×3. The chip has been packaged, and all electrical pads have been wire bonded. To program the device experimentally, it is essential to determine the required DC voltages of every phase shifter. Exact accuracy on controlling phase shift is challenging in the experiment due to several error sources, including voltage fluctuations, the thermal crosstalk between MZIs, and fabrication process variations. Here, internal phase shifters (θi) of MZIs are analyzed mainly, which control the optical power splitting ratio (transmission) at the outputs of MZIs. For the external phase shifters with φi, an optical vector analyzer can be used to determine the required DC voltages directly [5]. Thus, before programming the device for a given application, θi of all MZIs must be calibrated first [16,17]. Figure 4 illustrates the schematic of the experimental setup. Here, a tunable laser generates continuous light (1550 nm, 27 mW). The optical signal is split into nine channels using four 1 × 3 optical splitters. The optical signal in each channel passes through a variable optical attenuator (VOA) which regulates the presence or absence of the input optical signal. DC voltage controlling unit (VCU) supplies modulated voltage of MZIs. PD is connected to the semiconductor analyzer (SA), with a constant bias voltage of 1 V.    The calibration scheme is based on the topology of the SU (3), choosing the simplest path for each MZI. It starts from MZI (2 ) and MZI (6) on the path GC4-PD5 of the structure shown in Figure 4. The configuration of MZI (2 ) in its BS and MZI (6) in its CS allows for calibrating MZIs (2)   In the classification experiment, VOAs are used to adjust the input optical signal amplitude based on the MNIST handwritten digital test datasets and Fashion-MNIST test datasets. Simultaneously, convolutional kennels are deployed in the experiment based on parameters in Tables 3-5.

Results and Discussion
The calibration process is carried out from the beginning MZI to the end MZI. Figure 5 shows the ER 1 of 10.54 dB, the bias voltage (V CS,1 ) of 2.21 V, and the half-wave voltage (V Half,1 ) of 0.88 V of MZI (1 ) in M 1 3×3 . Here, ER i (= P BS,i − P CS,i ) represents the extinction ratio of the MZI (i), where P BS,i is the transmitted optical power of MZI (i) in BS, and P CS,i is the transmitted optical power of MZI (i) in CS. The heaters exhibited a P π,1 (i.e., the power for a π phase shift of MZI (1 )) of 8.43 mW with 553 Ω resistance. Tables 3-5 list the corresponding bias voltages (V CS,i ) and the half-wave voltage (V Half,i ), P BS,i , P CS,i , ER i , P π,i Photonics 2022, 9, 80 9 of 11 of the phase shifters θ i of the MZIs in M j 3×3 . These parameters of MZI can be used in the following experiment.

Results and Discussion
The calibration process is carried out from the beginning MZI to the end MZI.  The classification experimental results of the optical processor (implement a two-layer CNN) are shown in Figure 6. It shows that the classification accuracy of MNIST datasets achieves 86.9%, and the classification accuracy of Fashion-MNIST datasets achieves 79.3%. The classification accuracy for each label in these datasets is different. As MNIST datasets (as shown in Figure 6a), the accuracy of label 1, label 3, label 8, and label 9 are higher than 90%. The classification ability of label 5 is slightly worse, with accuracy rates of 71%. Due to label 3, label 5, and label 7 being similar, the model misclassifies label 5 into other labels. As Fashion-MNIST datasets (as shown in Figure  6b), the accuracy of label 1, label 5, and label 8 is higher than 90%. The classification ability of label 4 and label 6 is slightly worse.
As shown in Figure 2a, the classification accuracy of MNIST datasets retains about 94.7%, when σθ ≤ 0.01 rad and σD ≤ 0.01. In this work, the precision of VCU is 10 mV and The classification experimental results of the optical processor (implement a two-layer CNN) are shown in Figure 6. It shows that the classification accuracy of MNIST datasets achieves 86.9%, and the classification accuracy of Fashion-MNIST datasets achieves 79.3%. The classification accuracy for each label in these datasets is different. As MNIST datasets (as shown in Figure 6a), the accuracy of label 1, label 3, label 8, and label 9 are higher than 90%. The classification ability of label 5 is slightly worse, with accuracy rates of 71%. Due to label 3, label 5, and label 7 being similar, the model misclassifies label 5 into other labels. As Fashion-MNIST datasets (as shown in Figure 6b), the accuracy of label 1, label 5, and label 8 is higher than 90%. The classification ability of label 4 and label 6 is slightly worse. the PD's dark current is 20 nA. A phase shifter's less than 10 mV voltage inaccuracy corresponds to a phase deviation of approximately 0.037 rad. For σθ ≥ 0.01 rad, the precision of the voltage regulators must be higher than 3.24 mV. The PD's dark current is 20 nA, corresponding to σD ≈ 0.02. Moreover, the degradation in the classification accuracy is also attributed to MZIs' thermal crosstalk. The external temperature control (ETC) platform is used to minimize the error caused by thermal crosstalk in the experiment. As reported in Reference [18], the error originating from the thermal crosstalk can be eliminated by using an n doping cross-section. We compare devices in terms of several features for photonic analog processors. These features include network types, platforms, footprint, datasets, method, classification accuracy, central wavelength, and phase shifter power consumption (estimated through average Pπ). Table 6 illustrates these features on the reconfigurable linear optical As shown in Figure 2a, the classification accuracy of MNIST datasets retains about 94.7%, when σ θ ≤ 0.01 rad and σ D ≤ 0.01. In this work, the precision of VCU is 10 mV and the PD's dark current is 20 nA. A phase shifter's less than 10 mV voltage inaccuracy corresponds to a phase deviation of approximately 0.037 rad. For σ θ ≥ 0.01 rad, the precision of the voltage regulators must be higher than 3.24 mV. The PD's dark current is 20 nA, corresponding to σ D ≈ 0.02. Moreover, the degradation in the classification accuracy is also attributed to MZIs' thermal crosstalk. The external temperature control (ETC) platform is used to minimize the error caused by thermal crosstalk in the experiment. As reported in Reference [18], the error originating from the thermal crosstalk can be eliminated by using an n doping cross-section.
We compare devices in terms of several features for photonic analog processors. These features include network types, platforms, footprint, datasets, method, classification accuracy, central wavelength, and phase shifter power consumption (estimated through average P π ). Table 6 illustrates these features on the reconfigurable linear optical processors. For a proper comparison, the table also lists the values taken from other MZI-based reconfigurable chips.

Conclusions
In this paper, three 3 × 3 MZI-based optical processor is investigated. It is proved that this optical processor can realize an arbitrary unitary matrix. We obtain the experimental modulation voltage corresponding to the required phase shifts in the calibration process. MNIST datasets and Fashion-MNIST datasets are used to verify the classification performance of the optical processors. The classification accuracy of MNIST datasets is 86.9%, and the classification accuracy of Fashion-MNIST datasets achieves 79.3%. The experimental results show that experimental and fabrication imperfections degrade the classification accuracy of the optical processor. In the future, the optical processor can be embedded in computer architecture as an accelerator for computing matrix multiplication.