An Analog Multilayer Perceptron Neural Network for a Portable Electronic Nose

This study examines an analog circuit comprising a multilayer perceptron neural network (MLPNN). This study proposes a low-power and small-area analog MLP circuit to implement in an E-nose as a classifier, such that the E-nose would be relatively small, power-efficient, and portable. The analog MLP circuit had only four input neurons, four hidden neurons, and one output neuron. The circuit was designed and fabricated using a 0.18 μm standard CMOS process with a 1.8 V supply. The power consumption was 0.553 mW, and the area was approximately 1.36 × 1.36 mm2. The chip measurements showed that this MLPNN successfully identified the fruit odors of bananas, lemons, and lychees with 91.7% accuracy.


Introduction
The artificial olfactory system, also referred to as the electronic nose (E-nose) system, has been used in numerous applications. These include air quality monitoring, food quality control [1], hazardous gas detection, medical treatment and health care [2], and diagnostics [3]. An E-nose system comprises a sensor array, a signal processing unit, and a pattern recognition system. During recent decades a substantial amount of research and development has been reported on E-nose systems. Because of the complex classification algorithms embedded in the pattern recognition system, a central OPEN ACCESS processing unit (CPU) is usually required [1,2]. Consequently, the majority of E-nose systems is large and consumes considerable power. However, heavy and power-hungry equipment is inconvenient to use, and designing a low-power small device would be preferable. Some researchers and companies [4][5][6] have used microprocessors or field-programmable gate arrays (FPGAs) as a computational cell to develop portable E-noses, but these systems are still too power-intensive and large. To further reduce the power consumption and device area, analog VLSI implementation of the learning algorithm for E-nose application has been proposed [7][8][9][10][11][12][13][14].
The multilayer perceptron neural network (MLPNN) is an algorithm that has been continuously developed for many years. Consequently, when VLSI implementation of a learning algorithm is necessary, MLPNN is a common choice. In 1986, Hopfield and Tank proposed the first analog MLPNN circuit [10]. Since then, several analog VLSI implementations of MLPNN have been proposed [11][12][13][14]. Some have focused on the improvement of the multiplier [14][15][16][17][18] and some have attempted to design a nonlinear synapse to remove the analog multiplier [19,20]. However, the power consumption for most of the MLPNN circuits range from a few milliwatts to a few hundred milliwatts [11][12][13][14]. This power consumption is still too high for portable applications. Consequently, an MLPNN circuit with considerably lower power consumption (lower than 1 mW) is required when an MLPNN is being implemented in a portable E-nose.
This study implemented a low power MLPNN by analog VLSI circuit to serve as a classification unit in an E-nose. Neural networks is one of the most popular algorithms used in an E-nose system [2,5], because it can recognize and identify odor signal patterns. A typical MLPNN contains one input layer; one or more hidden layer(s), depending on the application; and an output layer. Apart from the input layer, both the hidden and output layers contain several neurons with nonlinear activation functions, which constitute the signal processing unit. Synapses connect the neurons of different layers, and a weight unit is included in each synapse. The weight units and the outputs of neurons determine the input of neurons in subsequent layers. An analog circuit realizes a nonlinear function with a simple structure [15]. Thus, implementing an analog MLPNN circuit reduces the need for power and the size of the pattern recognition unit required to build an E-nose.
The weight adaptation algorithm used in this study was the back propagation (BP) learning algorithm [21]. This algorithm allows the weights to be adjusted so that the MLP network can learn the target function; that is, pattern recognition. The details of the MLPNN and BP algorithms are provided below.
The input of a neuron can be represented as: (1) where a j H is the input of the j th hidden neuron; W ji H represents the weight of the synapse that connects the j th hidden neuron and the i th input neuron; X i I represents the output of the i th input neuron; and n I is the number of input neurons. Neuron output is determined by its input and activation function. The hyper tangent function is one of the most commonly used activation functions. This function can be easily implemented by analog VLSI with small chip area and power consumption. Consequently, the hyper tangent activation was chosen for this work. By choosing hyper tangent activation function, the neuron output is: (2) where X j H is the output of the j th hidden neuron, and b j represents the bias value.
Similar to the hidden neuron, the input of the k th output neuron is: (3) The output of the k th output neuron X k O is: (4) where c k represents the bias value. After calculating the output X k O of the output neuron, we derived the adapting value by comparing the circuit output X k O with target output X t .
According to the BP algorithm, the general weight update value W is derived by: (5) where a represents the neuron input;  is the learning rate, and E p is the error term derived from the comparison between circuit output X k O and target X t . In the following content, we used the mean square error and assumed that there was a single output neuron. Consequently, X k O was simplified to X O .
The order of weight adaptation proceeds from the output layer to the input layer. When deriving W in each layer, repeatedly differentiating the error by weight is unnecessary because certain computations have already been performed in later layers (for the hidden layer, the later layer is the output layer). The later layer propagates the computation to the previous layer; this propagating value is called the "back propagation error." For the output layer, the BP error  O is: (6) where D O represents the differentiation of the output neuron's activation function. For the hidden layer, the BP error of the j th hidden neuron  j H is: (7) From Equations (1), (3), (5), (6) and (7), W j O and W ji H are derived as Equations (8) and (9), respectively: The MLPNN was trained by adapting the weights according to Equation (10). During the training phase, the new weight W new in a synapse was acquired from the previous weight W old in the same synapse plus the weight update value W: (10) The rest of this paper is organized as follows: Section 2 describes the system architecture, Section 3 presents the measurement results, and Section 4 presents the conclusion.

Architecture and Implementation
This paper proposes a 4-4-1 MLPNN. The 4-4-1 notation represents four input neurons, four hidden neurons, and one output neuron. This structure was proven by Matlab to be able to learn the odor data we used before really doing chip design. The block diagram is shown in Figure 1. The symbols X 1 -X 4 refer to four signal inputs; X bi is the bias in the input layer; HSs are the synapses between the input and hidden layers; HNs are the hidden neurons; X bj is the bias in the hidden layer; OSs are the synapses between the hidden and output layers; and ON is the output neuron.
Based on Equations (1) to (10), the detailed block diagram of HS, HN, OS, and ON was obtained, as shown in Figure 2. The CM, BPM and GM are multipliers; W is weight; A is the activation function; D is the differentiation of activation function; delta is the BP error generator; and I/V is the current-to-voltage converter. Because several multiplication results are summed in Equations (1) and (3), the output signals of all multipliers are designed as current signals. Using Kirchhoff's current laws (KCL), the current is summed if the outputs of the synapses are connected. Thus, the area and power of the system are reduced because no extra analog adder is necessary. For signals that require transmission to several nodes (e.g., X 1 to X 4 ), a voltage signal is preferred. Further description of the subblocks shown in Figure 2 and the relations between the equations and sub-blocks are provided below. When approximating the equations by analog circuit, second-order effects, such as the body or Early effect, are neglected; thus, particular errors may be introduced. These errors may result in nonlinearity. However, by carefully designing the bias, size, and dynamic range of the circuit, the nonlinearity has little effect on the learning performance of the application in this study.

Synapses
The synapses at the hidden layer and the output layer comprise two multipliers (CM and GM) and a weight unit W; the output layer synapse needs one more multiplier (BPM) to generate the term  O W j O . The terms BPM and CM are both Chible's multipliers [16,17]. The BPM multiplies the weight by BP error, whereas CM is used to multiply the input by weight. The results represent the current. This study used the Chible's multiplier because of its wide operation range [16]. A schematic diagram of Chible's multiplier is shown in Figure 3. The weight and input are voltages, denoted by V W and V X , respectively. The output of the multiplier is I XW . M1, M3 operate in strong inversion regions, whereas M6, M7, M11, and M12 operate in weak inversion regions. According to the equation of MOSFET in strong and weak inversion, the output current is: (11) where V ref is a reference voltage; U T is thermal voltage; and I wp and I wn represent the bias current for M6 and M7 and for M11 and M12, respectively. Both I wp and I wn are related to weight V W . Assuming that the parameters for NMOS and PMOS are equal and V x-ref is sufficiently small, I WX is equal to V W_offset multiplied by V X_offset . Furthermore, V W_offset and V X_offset represent V W and V X plus an offset, respectively.
The GM term represents the Gilbert multiplier [22,23]. This type of multiplier provides good linearity but a small dynamic range. Thus, this block multiplies the neuron output X and error term  in Equations (8) and (9). The schematic is shown in Figure 4. The output X and BP error  are voltages, denoted by V X and V δ , respectively. The output of the multiplier is I W , whereas V ref1 and V ref2 are reference voltages. All of the MOSFETs operate in subthreshold regions. According to the voltage and current relationship of MOSFETs in subthreshold regions, the output current is represented as: (12) when the difference of V X and V δ to the reference voltage are sufficiently small, I ∆W can be simplified to: (13)  The weight unit W is a temporal signal storage device showing the characteristic of Equation (10). The circuit implementation of the weight unit is shown Figure 5. The weight is represented by a voltage that is stored on a capacitor. This study used metal-oxide semiconductors to form a MOSCap. Compared to other types of capacitors, the MOSCap has a larger capacitance per unit area. Thus, the chip area is reduced by using a MOSCap. In the training phase, the weight adaptation is performed using a weight-updating current I ∆W from the GM to charge or discharge the storage capacitor. The control voltage Vc determined the period for the current to charge or discharge the capacitor; in other words, this control voltage determines the learning rate of the network. The weight value is collected simultaneously by a data acquisition device (NI 6229). These weight values are stored in a computer. In the classifying phase, I ∆W no longer updates the weight values; rather, the computer sets the weights on the MLPNN chip using pre-stored weight values through a data application device (NI PCI 6723).

Neurons
A schematic diagram of the activation function circuit A and differential approximation circuit D are shown in Figure 6. The activation function was hyper tangent in this work. An analog circuit can easily implement the hyper tangent function using a differential pair [15]. As shown in Figure 6, the input current I in is delivered through the synapse circuit and converted to voltage by M7 and M8. This voltage is compared with the reference voltage V ref , and an output current is produced. Because M2 and M3 operate in the subthreshold region, the output current is: (14) The output current I x is then converted to voltage V x by M9 and M10. This voltage V x constitutes the output of the neuron.
According to the definition of differentiation: ' (15) This study used the function f D (x) to approximate the actual differentiation f'(x) [12]. Consequently, this involved duplicating the activation function circuit (M12 and M13). The reference voltage for this replica differed from Vref by a small amount. The output current of this replica became: (16) The difference between I x and I x_r is the differential approximation of the activation function: (17) Figure 6. Activation function circuit and its differentiation.
The schematic diagram of a Delta block is shown in Figure 7. It is used to times the back propagate error by the differentiation of activation function. The block is used to multiply the BP error by the differentiation of activation function. The differential approximation of the activation function is represented by current I d , and the BP error is represented by voltage. This circuit used differential pairs operated in the subthreshold region; thus, the circuit output was the same as Equation (5). For ON, V 1 and V 2 in Figure 7 are replaced by X O and X t respectively. For HN, V 1 and V 2 were replaced by V δ × W and V ref , respectively.

Experiment Setup
The entire system is shown in Figure 8. The system can be divided into three parts. The first part is the equipment for odor data collection. The data are collected and stored in a computer. The second part is the PCB for bias generation (PCB_bias). The third part is the designed chip. An E-nose developed in a previous study [24] was used to assess fruit odor samples. Figure 9 shows the patterns of banana, lemon, and lychee odors, respectively. During the experiment, the temperature was between 24-28 °C and the humidity was 59%-78%. We assessed 24 samples, and each sample was assessed individually. The E-nose had eight sensors [24], but the MLPNN chip had only four input channels; thus, four sensors were selected from the original eight. The data were normalized to a voltage between 0.85 and 0.95 V (to ensure the voltage range is sufficiently small for approximation in GM block to be valid) before being fed into the MLPNN chip. The E-nose in the previous study possesses metal-oxide sensors; consequently, the resistance of the sensor is reduced when odor molecules are combined with the sensor. The percentage of resistance change before and after the sensor responds to odor molecules is used to represent sensor activity.
Consequently, the sensor response is initially an array with negative numbers. To perform normalization, first, the absolute value of the sensor response is noted. Second, each sample is divided by the maximum sensor response in the sample, and then is divided by 10; the sample becomes a vector with a maximum value of 0.1. Third, this vector is added by 0.85. Subsequently, every dimension in the input vector to the MLPNN chip is larger than 0.85 V but smaller than or equal to 0.95 V. The noise from the power supply is a critical issue for the analog circuit. Although certain research has reported that adding modest noise in synapse during training can improve the learning performance [25], the power supply noise must still be reduced because the power supply noise cannot be turned on during training and turned off during testing. Furthermore, the noise may be amplified by the circuit. With too much noise, the ANN may fail to converge [25]. Compared with a power supply instrument, a battery provides power with less noise. In this study, the chip was provided with power by a battery through a regulator. However, a battery provides a single voltage, whereas the chip requires multiple biases. To provide various biases of voltages from the same battery, this study designed a printed circuit board (PCB) to generate these biases. The circuits in this PCB contain regulators (Figure 10(a)) for generating power for the MLPNN chip and bias generators (Figure 10(b)) for generating bias voltage. The MLPNN chip was fabricated using the TSMC 0.18 μm standard CMOS process with a 1.8 V supply voltage. The chip area was 1.36 × 1.36 mm 2 . The chip photograph is shown in Figure 11.

Odor Classification by MLPNN Chip
The learning ability of the MLPNN was tested through the following steps. First, the data were divided into two subsets, and one subset was used for training and the other for testing. Each subset contained 12 samples. Each sample was a voltage vector with five dimensions (fruit data and a teacher signal). These voltages were supplied by a data application device (NI PCI 6723). In the second step, the MLPNN was trained by these samples. A data acquisition device (NI USB 6229) sampled the weight values simultaneously and stored these data on a computer. The block diagram for training is shown in Figure 12(a). Although the training procedure includes a PC collecting neuron output and weight values, this is not chip-in-the-loop learning because the PC is not responsible for weight updating. The main goal of the PC is to provide long-term weight storage because the weight value in the storage capacitor in each synapse may decrease by the leakage current after a period of time. After training, a set of weight values was selected. During testing, the weights on the MLPNN were set to these values by the PCI 6723. The final step entailed applying the test data to the MLPNN, monitoring the neuron output, and verifying the results. The block diagram for testing is shown in Figure 12

Training
During training, each sample sustained 40 ms; thus, this study sampled the weights and neuron output once every 40 ms. The charging and discharging period (learning rate) is set to 20 μs for each sample. To improve the learning performance of the system, each class was randomly applied. Because 12 samples are present, and each sample sustains 40 ms, the training epoch repeats every 480 ms. Figure 13 shows the neuron output during training. The red curve represents the teacher signal, and the blue curve shows the neuron output. The neuron output clearly differed from the teacher signal initially, but through training, the neuron output became increasingly similar to the teacher signal. By the end of training, the neuron output was almost the same as the teacher signal. Other than the neuron output, the learning result is also shown in the weight adaptation. Figure 14 shows the weight adaptation during training. The Hwij term represents the weight value of the synapse between the j th hidden neuron and the i th input neuron, and Ow1j refers to the weight value of the synapse between the output neuron and the j th hidden neuron.  The parameters in the circuit varied slightly because of the fabrication process; the majority of weight values converged, but a few did not. Thus, the final weights were not ideal values for testing. The class order during training was random. For example, the training order might be Ba-Ba-Le, while the order was Ba-Le-Ly (Ba means banana, Le means lemon, and Ly means lychee) on the other time. Our analysis showed that selecting the weight values after the sequence Ba-Le-Ly (or other combination of Ba, Le and Ly) could result in optimal classification. Thus, this study applied the weights obtained after Ba-Le-Ly to the MLPNN during testing.

Testing
The test results are shown in Figure 15. The class order during testing was Le-Ly-Ly-Ba-Ba-Le-Ly-Le-Le-Ba-Ly-Ba. The classification boundaries were set at 0.88 V and 0.92 V. When the output voltage was larger than 0.92 V, the input was classified to banana; under 0.88 V, the input was lychee; between 0.88 and 0.92 V, the input was lemon.
The Y-axis of Figure 15 is the output voltage of the output neuron. The X-axis shows different samples. For example, the first sample causes the network to produce an output voltage of approximately 0.9 V, between 0.88 and 0.92 V. Consequently, the first sample is classified to lemon by the network. Because the first odor data applied to the network is a lemon odor, the network correctly classified this sample. The results showed that only the third sample was misclassified, whereas the others were correctly classified. The testing accuracy was 91.7%. The power consumption was 0.553 mW. The overall circuit specifications are listed in Table 1.

Conclusions
This study proposed an MLPNN chip using BP learning by the TSMC 0.18 μm standard COMS process. The use of an analog VLSI design with a simple structure meant that the proposed MLPNN had a relatively small size and low power requirements. The supply voltage to the chip was 1.8 V, and the power consumption was 0.553 mW. The measurement results showed that this design was capable of recognizing three fruit odors. Because of its small size, low power requirements, and accuracy in classification, the MLPNN chip can be integrated in a portable E-nose in the future. This would reduce the size and power requirements of the E-nose and would simultaneously increase the application field of the E-nose.