Training and Inference of Optical Neural Networks with Noise and Low-Bits Control

: Optical neural networks (ONNs) are getting more and more attention due to their ad-vantages such as high-speed and low power consumption. However, in a non-ideal environment, the noise and low-bits control may heavily lead to a decrease in the accuracy of ONNs. Since there is AD/DA conversion in a simulated neural network, it needs to be quantiﬁed in the model. In this paper, we propose a quantitative method to adapt ONN to a non-ideal environment with ﬁxed-point transmission, based on the new chip structure we designed previously. An MNIST hand-written data set was used to test and simulate the model we established. The experimental results showed that the quantization-noise model we established has a good performance, for which the accuracy was up to about 96%. Compared with the electrical method, the proposed quantization method can effectively solve the non-ideal ONN problem.


Introduction
With the explosion of information, more data need to be processed. The neural network is considered to be a promising candidate for bulk information processing [1]. Thus far, we have many optimization methods and non-iterative linear supervised learning predictors that can improve computing power, such as multilayer perceptron, support vector machines, and neural-like structures of the successive geometric transformations model (SGTM) [2]. In recent years, optical neural networks (ONNs) have gained a large amount of attention due to their high-speed, low power consumption, and low delay [3][4][5][6]. It has been shown that matrix multiplication and parameterization can be obtained on an optical neural network made by Mach-Zehnder interferometer (MZI) arrays [7][8][9].
With the development of ONNs, some important issues have occurred in the non-ideal case, which may reduce the accuracy of the optical neural network. Therefore, these issues need to be understood deeply. In practice, the following conditions may increase the error of the optical device. One is the phase shift generated by optical devices, which cannot achieve arbitrary precision in physics [3,4,10,11]. The other is quantum-limit noise on optical devices [10][11][12]. Up to now, the goal of large-scale, rapidly reprogrammable photonic neural networks has not been realized. There are still plenty of opportunities for improving ONNs [13,14]. Processing large amounts of data remain challenging for ONNs for computer vision in real life.
Most research on ONN has focused on different types of devices and novel architectures, while limited work has been undertaken on the impact of noise problems on 2 of 11 accuracy in photonic chips. Several groups have begun to study these issues. In 2017, Yichen Shen's chip implemented a neural network that can recognize four basic vowels [3]. In 2019, Ryan Hamerly presented a new type of photon accelerator based on coherence detection capabilities [4]. They also simulated noise in this device. Other papers propose a noise perception quantization scheme to help design a robust ONN model [10]. The above work illustrates that ONN architecture requires special hardware implementation and, ideally, low-bit control.
In this paper, based on the new chip structure we designed previously [15], a noise quantization model is proposed to analyze the influence of quantization on the accuracy of ONNs, so as to make it closer to reality. We also optimize the algorithm so that the chips that are run in real conditions can achieve high precision. A method that solves ONNs' low-bit control and simulation on devices is proposed for the first time.

Neural Networks and ONNs
A fully connected neural network consists of an input layer, hidden layers, and an output layer, as shown in Figure 1 [16]. The image of the handwritten data set (MNIST [17]) is input into the network for simulation [18]. Since the handwritten data set is composed of 0-9, the output layer has 10 outputs. In our previous work [15], we designed an image classification and recognition model based on a fully connected neural network (FCNN) and mapped it to a silicon-based photonic integrated circuit, as shown in Figure 2 [15]. Previous simulation experiments showed that the optical modulator enabled the chip to perform fast and high-precision classification of hand-written numbers with an accuracy greater than 97%. Speeds of up to 80 Gbps can be achieved at the currently reported rates of silicon-based modulators. Up to 80 Gbps can be achieved in the recently reported rates of silicon-based modulators.
A silicon ONN chip is mainly composed of five parts. In the first part of Figure 2a, the input port can be realized by a side coupler or grating coupler. The fan-out structure can be realized by cascading 1 × 2 multi-mode interference (MMI). The second part is the input light divided into 128 parts, with one Mach-Zehnder modulator (MZM) for every two channels. The third part consists of 256 MZMs, half of which are used to encode the target file and the other half to load the weight signal. Then, two waveguides, a silicon waveguide and a sinusoidal waveguide, are used to realize the matching structure of the target file signal and the weight signal. In the fourth part, there are 128 balance detectors, each of which multiplies the one-way target file signal with the weight signal. The final addition and activation functions can be implemented via circuits, which are composed of the fifth part. The ONN chip is designed to recognize 10 complete (0 to 9) handwritten digital images simultaneously. We resize the digital images into 11 × 11 grayscale matrices with 8-bit resolution and flatten them into vectors. These vectors are fed through parts one and two, and are multiplexed in the time domain. Next, in the third and fourth part, the pairing of the encoded target file and the loaded weight signal and the multiplicative accumulation operations are performed. Finally, different node/neuron outputs are obtained by sampling the results of the previous step. The final output of the ONN is represented by the intensity of the output neurons, with the highest intensity of each test image corresponding to the prediction category. The peripheral system, including signal sampling, nonlinear functions, and merging, is implemented electronically by means of digital signal processing hardware. Figure 3 shows the schematic diagram of the optoelectronic system. After being input through the input port, the signal is multiplied and added by the optical neural network chip, and output through the balance detector. Shot noise is generated in the balanced detector. The balanced detector enters the circuit through AD/DA conversion, and the computer is used to process the data in the nonlinear part. In this part, quantization of the data is required. Since the optical neural network we designed has three layers, this process needs to be cycled three times to get the final output. We will elaborate on the specific models for noise and quantization in the following sections. The power consumption of this ONN chip comes mainly from the modulator, which uses a PIN structure electro-optical phase modulator with a single loss of 1 mW and a speed of 100 MHz, and a CMOS (Complementary Metal-Oxide-Semiconductor) process to make the chip suitable for mass production. The cost is related to volume. If the volume gets higher, the cost becomes lower. The production process is a conventional silicon optical chip processing technology, which can be performed in all major international foundry platforms.

Noise
Some factors could affect the accuracy of the chip in reality. Among them, quantum-limit noise is the root of the fundamental limit of optical devices [3,14]. As we mentioned in Section 2.1, shot noise would be produced in the balanced detector during transmission.
In a neural network, each layer of neurons x i is transmitted to the next layer of neurons x i+1 . Each neuron is a homodyne detector that interferes with the broadcast signal to the weighted signal A ij [16,19]. A ij and x i multiply and accumulate (MAC), as shown in Equation (1): x i is the input of the current layer, where x i+1 is the output of the current layer. As reminded in Section 2.1, input vector x i is encoded temporally as pluses. Then, the weights enter into channels in the form of time coding, the same as input vectors. These data will be processed optically by MAC calculations. The nonlinear activation function is implemented by electrical methods. Finally, we get the output. Power consumption can be calculated by Assuming that the input signal and the weight signal have a perfect spatiotemporal mode match, this can be normalized so that |x i | 2 , A ij 2 correspond to the number of photons per pulse. When a pulse with an amplitude of u enters, the output current can be described by Poisson distribution: Q e ∼ Poisson |u| 2 . Each photocurrent Q(±) is the sum of many Poisson random variables. In the useful limit of many photons per neuron (although not necessarily per MAC), this will approximately lead to a Gaussian random variable as follows: where w i (k) ∼ N(0, 1) are Gaussian random variables. Then, the next layer of neurons x i+1 with the influence of noise can be represented in Equation (3) [3].
where ||·|| is the 2-norm, n MAC is the number of photons per MAC, N is the number of input neurons, and N is the number of output neurons. n MAC is related to the total energy consumption of the layer which is given by Etot = NN n MAC . We can figure out that the total energy consumption between computation layers is 1.64 × 10 7 J.
In the previous work, we simulated a layer and multi-layer model with granular noise [15]. To get closer to reality, the noise effect expressed by Equation (3) was added to the simulation process exactly following the procedure mentioned in Section 2.1. The result in Figure 4 shows that when the photon/n MAC is large enough, the error rate is not affected.

Quantization
As we mention in Section 3.2, quantization is needed to ensure higher accuracy when converting analog and digital circuits. Integer quantization is an optimization strategy that converts a 32-bit floating-point number (FP32), such as weights and activation outputs, to the nearest 8-bit fixed-point number (INT8). This leads to smaller models and faster reasoning, which is valuable for low-power devices such as microcontrollers [18].
Two main methods are used in the quantification process: (1) post-training integer quantification-using FP32 weight and input to train the model, and then quantifying the weight [20]. The main advantage of this is that it is easy to use. The drawback is the decrease in accuracy. (2) Quantization-aware training-weights are quantified in the training process, and calculated for quantization [21]. This is the best result when using INT8 quantization, but is much more complex than other methods.
A large amount of work has shown that a more efficient deep neural network (DNN) can be achieved through low bit quantization [22,23]. Experimental results using low precision numerical representations indicate that these experiments require higher precision than eight bits to deal with backward propagation and gradient [2,24,25]. This will make the implementation of the training more complicated. Therefore, after training the model, it is reasonable to only use the quantized weight for reasoning [21].
The quantization equation from FP to INT is shown as Equation (4): The inverse quantization equation from INT to FP is shown as Equation (5): where R represents the real FP value, Q represents the quantized INT value, Z represents the quantized INT value corresponding to the FP value, and S is the minimum scale that can be represented after the quantization of INT. The evaluation equations of S and Z are shown in Equations (6) and (7), respectively: where R max represents the maximum FP value, R min represents the minimum, Q max represents the maximum INT value, and Q min represents the minimum.
where each symbol represents the meaning as in the description above.
Here, S and Z are quantized parameters, while Q and R can be evaluated by the equation. Truncation would be needed where the quantized Q or the FP value R obtained by backward derivation are beyond their maximum range.

Simulation and Results
In normal electrical neural networks, float numbers are generally used in the model. In the analog neural network, due to the AD/DA conversion, we needed to quantify the FP32 into INT8 in the model. Quantification is common in deep learning and is faster because there are fewer bits, making models lighter. In order to verify the influence of quantization on optical neural networks, we conducted two steps. First, we quantify the model as INT8 after training. Then, we added a noise model for inference. Python language, TensorFlow framework, and MNIST data set were used for simulation.

Evaluation Criteria
Different classification algorithms use different variants. We need to select the algorithm according to the specific task. A suitable algorithm must be selected out according to the specific task. Accuracy is the most common evaluation index in the classification algorithm, as shown in Equation (8): where TP is the number of cases that are correctly detected positive, and TN is the number of cases correctly classified as negative. P and N are positive and negative cases, respectively. 'TP + TN' presents all numbers that have been recognized correctly. 'P+N' presents all numbers obtained in MNIST. Accuracy means the proportion of the samples that are correctly predicted in all samples. Generally speaking, the higher the accuracy, the better the classifier.

Model Establishment
The model is established to classify and identify MNIST by the common fully connected network and evaluated with 3.1 evaluation standard, and the accuracy rate was 98%. Up to 98% accuracy rate was obtained through the evaluation standard. Then, the model is frozen to obtain a protocol buffer (PB) model file. Data can be viewed from each node in Neuron, as shown in Figure 5, where each node is FP. FP32 is converted to INT8 using TensorFlow Graph, and the result can also be viewed in Neuron.
The models we established are shown in Figure 5. In Figure 5, M represents Matmal and QM represents quantization of the results of Matmal. AF represents activation function, and QAF represents quantization activation function. S represents Softmax, which is a classification function.

Model Training
We conducted a quantitative inference test on the FCNN mentioned in Section 2.1 and the evaluation method used in Section 3.1. The results are shown in Table 1 and Figure 6. In Table 1, the results show that the optimized quantitative model is effective, and a high prediction accuracy was obtained when the INT8 model parameter size was 1/4 of the FP32. The neural network is too parameterized to contain enough redundant information, and cutting out such information will not result in a significant reduction in accuracy. For a given quantization method, there is no significant accuracy gap between the overparameterized large FP32 and INT8 network; Figure 6 shows the reduction in training time when the INT model is used instead of FP. Since the inference of the network takes time, the reduction in training time is not proportional to the reduction in model size.  In Table 1, the results show that the optimized quantitative model is effective, and a high prediction accuracy was obtained when the INT8 model parameter size was 1/4 of the FP32. The neural network is too parameterized to contain enough redundant information, and cutting out such information will not result in a significant reduction in accuracy. For a given quantization method, there is no significant accuracy gap between the overparameterized large FP32 and INT8 network; Figure 6 shows the reduction in training time when the INT model is used instead of FP. Since the inference of the network takes time, the reduction in training time is not proportional to the reduction in model size.

Noise
The basic components of the FCNN layer are MAC operations, which can be easily parallelized. The model structure is shown in Figure 7. Figure 7a shows the model with noise before quantization. The model we designed is based on Equation (3). Figure 7b shows the model with noise after quantization. All the parameters are quantized. In order to achieve high-performance, highly parallel computing paradigms, the method we used is post-training integer quantification.

Noise
The basic components of the FCNN layer are MAC operations, which can be easily parallelized. The model structure is shown in Figure 7. Figure 7a shows the model with noise before quantization. The model we designed is based on Equation (3). Figure 7b shows the model with noise after quantization. All the parameters are quantized. In order to achieve high-performance, highly parallel computing paradigms, the method we used is post-training integer quantification. In Figure 7, M, QM, and S are the same means as Figure 6. Q+ represents quantization plus. We found that when the MAC reached a certain size, the noise had little impact on accuracy. The result is shown in Figure 8. We quantified the noise model and carried out the inference test. The results are shown in Table 2. The results in the table show that our model has high accuracy and stability. When the model was shrunk by 1/4, the evaluation index was reduced by only 0.3-1%. According to the results, it can be observed that there is no significant increase in accuracy when the number of layers is increased to more than two layers. This is caused by the excessive redundant information due to the insufficient amount of data. Compared with the model in Section 3.3, the neural network with noise has more parameters. Therefore, the search space of the model will be larger, so the spatial distribution of the model can be better described only if there are enough data. As a result, the model with noise has higher accuracy.

Conclusions
In this paper, we propose a quantitative method for adapting ONN to a non-ideal environment with INT transmission based on a fully connected neural network image classification and recognition model proposed in previous work [15]. Through the comparison before and after quantization, the optimized quantization model in this paper is effective and has good enough prediction accuracy. Accuracy can be achieved up to about 96%. The experimental results show that, compared with the electrical method, the proposed quantization method can effectively solve the non-ideal ONN problem. We believe that the quantization model established in this paper can be of great help to optical chips in the near future. However, it is still difficult to implement large-scale photonic neural networks based on current technology. Besides, ONNs have a limited number of neurons. In future research, we will extend the model and address photon limitations. Larger datasets for training experiments such as ImageNet will also be used. Data Availability Statement: All data and models generated during the study appear in the submitted article.

Conflicts of Interest:
The authors declare no conflict of interest.