3.1. Weight Quantizaiton
To implement a system capable of cognitive computation utilizing the memristors researched to date, memristors capable of representing at least 100 or more weights are needed, and these memristors must be stable to drive and reproducible enough to distinguish the resistance between each weight level. However, the current memristors are not directly applicable to the implementation of cognitive computing systems due to the limited number of weight levels, retention problems, and endurance problems. Various problems have been raised with these memristor devices, such as the migration of oxygen atoms [
25], changes in the electrical properties through the diffusion of hydrogen [
26,
27], the destruction of the device in localized areas by pulse signals [
28], and problems with the uniformity of the memristor filaments [
29]. If the number of weight levels that the model needs to represent can be minimized through quantization techniques, cognitive computing systems using memristor devices can be made easier.
The experimental dataset utilized both 5 × 5 and 7 × 7 numeric inputs, with a binary representation using 0 and 1. The training consisted of randomly entering ten numbers from zero to nine into a 5 × 5 or 7 × 7 grid cell. Each number was represented by a 1 in the corresponding grid cell (
Figure 1a,b). In the case of the 5 × 5 input, marking ‘2’ would be depicted as ([1,1,1,1,1], [0,0,0,0,1], [1,1,1,1,1], [1,0,0,0,0,0], [1,1,1,1,1]) (
Figure 1d). Subsequently, the data were flattened and fed into a single input array. The 5 × 5 input initially entered the input layer as a (1 × 25) array and underwent sigmoid computation with a weight1 of size (25 × 10) and an activation function, resulting in a hidden layer of size (1 × 10). The data then proceeded through another layer with weight2 of size (10 × 10) before undergoing sigmoid computation once more. The output layer provided a probability value ranging from zero to one for each digit, selecting the highest probability and converting it into a (1 × 10) output where one element was represented by 1. For the 7 × 7 numeric input, the input data took the form of (1 × 49), with the subsequent weights determined as (49 × 10). The succeeding hidden layers followed the same format as the 5 × 5 input, with sizes of (1 × 10) for the first layer and (10 × 10) for the second layer. Among these parameters, the values in the input layer remained fixed throughout the backpropagation process, with only weight1 and weight2 being quantized subsequently. During the quantization, weight1 and weight2 were not treated individually but instead combined. This combined set of weight1 (W1) and weight2 (W2) is denoted by W.
Figure 2a illustrates the flowchart detailing the optimization and quantization process for each weight within a 5 × 5 input dataset. In the unconstrained case, achieving a 100% recognition rate for randomly input values necessitated approximately 500,000 times the backpropagation for the values of W1 and W2. We initially trained the model with 100% accuracy because the 5 × 5 and 7 × 7 data are relatively simple datasets consisting of 0s and 1s, so the accuracy can easily reach 100% if the training volume is sufficient. After sufficient training, we moved on to the next step, quantization. In this process, we only used W, which stands for W1 and W2 among the model parameters, to perform the quantization. The subsequent quantization process aimed to represent a continuum of values divided into specific steps. The key components for implementing this include the “weight range” (Equation (1)) and a “level (
q)” for quantization.
q is a number more than 0 and has a range below
p, which is the maximum value.
Figure 2b–d are the result of the quantization dependent on level (
q) (
q = 1, 2, and 4). The black line is the original value, and the red line is the value after the quantization. As the level (q) increases, the number of steps increases, which makes the difference between the values before and after the quantization smaller. The level (
q) serves as the target value for converting weights from their original analog form to the discrete values aimed for during quantization. The weight range represents the difference between the minimum and maximum values of W, expressed as the sum of W1 and W2 (Equation (2)). The interval is defined as the weight range divided by the desired level (
k), indicating the range of values represented by each single level (Equation (3)). The value quantizing the actual range value at
level(i, middle) is then determined by multiplying the interval by a value of
i + 0.5 and
(
Wtotal) (Equation (4)).
where,
i is the number of levels. Equations (1), (3), and (4) are used when the list of values being quantized consists of only positive numbers greater than zero. In this case, “Weight rangeA” in Equation (3) and “Weight range” in Equation (1) have the same value because all the numbers are positive. Conversely, if the list of values being quantized contains both positive and negative numbers, “Weight rangeA” is used to set the interval.
The initial weight values used in each experiment were used as seed values to consider 60 cases, and these values were randomly generated. The process of generating and learning a new initial weight was repeated 100 times for each individual seed value. Thus, 6000 models were created for each dataset, with each model subsequently quantized to assess the results. The quantization was performed by starting with
q = 1 and incrementing the value of q sequentially until an optimal value of q was obtained that satisfied α (α is the target recognition rate accuracy). To proceed with the quantization while ensuring accuracy, if the recognition rate accuracy value is greater than or equal to the value of α, move to the next step; otherwise,
q is added to 1 and the quantization is performed again. In this experiment, α was set to 100%, and, in general, the smaller α is, the lower the level value that can be reached. The quantization process runs under the condition that
q <
p, with the maximum value set to
p, and the quantization process terminates when
q exceeds
p, as shown in the flowchart in
Figure 2. Although all the points of
level(i, middle) are established by
p, there are cases where
level(i, middle) is not generated in a certain interval because the number of W does not exist between those intervals. In this study, the non-generated cases were not considered when determining the final level.
After training the neural network based on the 5 × 5 and 7 × 7 numeric input datasets, quantization was performed based on the calculated weights. Of the parameters generated from the 5 × 5 numeric dataset, only W1 and W2 were used for the quantization. There are 350 numbers in W as an array, including 250 W1 and 100 W2. An array of W generated from a 7 × 7 number dataset has 590 numbers. The weight quantization procedure outlined in
Figure 2 was implemented for the 5 × 5 dataset. Following the division of the 350 weights generated by the 5 × 5 dataset into four levels, the recognition rate remained unchanged (
Figure 3). This signifies a substantial reduction of 98.9% in the weight levels compared to their original state. Similarly, for the 590 variables in the 7 × 7 dataset, a reduction of 99.3% was observed, resulting in a reduction of four levels.
Figure 3 illustrates the distribution of the 350 weights derived from a single 5 × 5 dataset. The
x-axis is arranged in ascending order of weights, while the
y-axis represents the corresponding weight values at each point. These weights vary from 0 to 12.5, with a prevalence of lower values. The dashed black line represents the unconstrained weight distribution, exhibiting 350 distinct values. By segmenting the 350 weight values into four levels, the quantized weights, depicted in red in
Figure 3, were derived. The first weight level condenses the use of 296 distinct weight values into a single value, while the second, third, and fourth levels amalgamate 30 or fewer weight values into a singular value. It is noteworthy that, while many of the values are close to zero, the levels are created not only for these values but also for those with lesser numerical significance. Specifically, the distribution containing 296 items exhibits a high distribution of values, whereas the distribution with only five items demonstrates a notably lower distribution. It is important to note that not all 6000 models employed in this experiment underwent quantization with four weights. Only in some datasets was it computationally observed to maintain a 100% recognition rate even after reducing the weights to four levels. Based on the experimental results, the stable reduction in the weight levels varied depending on the magnitude of the maximum and minimum values of the weights, which aligns with the definition of “Weight range” explained earlier.
3.2. Weight Range
When designing and implementing actual neural network systems in hardware, the challenge is to implement them while maintaining the accuracy of the various weights. Quantization, which is used to compensate for this drawback, has the advantage of reducing the complexity and implementation difficulty when implemented in hardware (using a memristor device). When designing a neuromorphic model, the neural network model must be finite, and, in the process, all the parameters must be represented by elements, which increases the processing difficulty of the elements, increases the complexity, and increases the cost of implementation. An algorithm used to address this is quantization, which is commonly used in the field of reducing model complexity and increasing performance.
The existing quantization technique offers the advantage of converting the single-precision floating point (FP32) to a half-precision floating point (FP16) or normalizing the layer [
14]. This technique reduces the number precision by decreasing the calculation bits while maintaining the range, aiming to cut the memory use and computational complexity. Split into FP32 and FP16, FP32 employs 32 bits, while FP16 uses 16 bits. FP16 offers reduced memory and increased throughput in software but minimal advantages in hardware, where FP32 and FP16 provide no practical benefits. To address this, our proposed quantization technique emphasizes actual hardware device implementation, simplifying the model’s weight into a straightforward list of values. Notably, our study focuses on minimizing the unique weights by setting a low level and conducting quantization. Therefore, in order to reduce the level, the correlation was investigated using the concept of weight range mentioned above.
Figure 4a presents the outcomes of an experiment aimed at determining the optimal levels for various weight ranges. The initial weights were seeded from one to sixty, and 100 iterations were conducted to generate 6000 distinct models for 5 × 5 and 7 × 7 inputs, respectively. The correlation between the weights employed in each model and the resulting number of levels in the final weights was examined. The number of levels in the final weights was defined as the point at which the recognition rate reached 100%, denoted as
q when α is 100% in
Figure 2a. Notably, the minimum level achievable for the quantized levels (
q) appears to be directly proportional to the weight range. For both the 5 × 5 and 7 × 7 input systems, models with α satisfying 100% are found at
q = 4, and it has been observed that weight ranges have significantly lower values of
q around 21. An interesting aspect is the phenomenon of having the same weight range but different values of
q level. This suggests that there are other factors that determine the level of
q along with the weight range. For models with a weight range greater than 60, the value of
q is determined between 10 and 40. The points A (8, 32) and B (25, 62), shown in
Figure 4a, were plotted in
Figure 4b for a representative model with weight ranges of 32 and 62. As already mentioned in
Figure 3, the weights of the 6000 models used in this experiment are mostly clustered around zero, and this trend becomes more pronounced as the weight range increases.
In this comprehensive analysis encompassing 6000 models, a significant correlation between “weight range” and “
q”, the quantizable level, was established. For the same input signal, the weight range can be an important factor in determining
q. In addition, for the system satisfying quantization level
q = 4, the change in the recognition rate as a function of the weight range over the number of epochs was calculated as shown in
Figure 5. The relationship between quantization level (
q) and weight range was studied by comparing the post-quantization recognition rate accuracy for 10 randomly selected models in different training states using five seed values.
Figure 5a illustrates a graph depicting the distribution of the training weight ranges, where 100 models were generated for each “Seed”. The figure shows a sequential increase in both the mean value and distribution of the weight ranges from Seed A to Seed E. These seeds are integral for refining the initial weights and facilitating the subsequent training. The outcomes of computing the recognition rate accuracy based on the initial weights categorized by their respective weight ranges for Seeds A, C, and E are presented in
Figure 5b–d, respectively. The accuracy of the recognition rate was evaluated as the number of computations increased for each of the three seeds in a system satisfying quantization level
q = 4. Notably, for ‘Seed A’, characterized by the smallest weight range, the recognition rate accuracy demonstrated a consistent upward trend with each epoch, exceeding 90% accuracy after 400 epochs, as depicted in
Figure 5b. ‘Seed C’ showed the same trend of gradual improvement in recognition rate accuracy as ‘Seed A’ as the number of epochs increased, with a recognition rate accuracy of about 70% after 600 epochs. After that, there was no further improvement in the recognition rate with increasing epochs, as shown in
Figure 5c. In contrast to Seeds A and C, Seed E demonstrated a notable absence of improvement in the recognition rate with the advancement of training epochs. Initiating at the initial accuracy level of 20%, it maintained this accuracy level throughout the duration, even at epoch 1000, as illustrated in
Figure 5d. This evaluation was performed by artificially implementing seeds with a very large range of weight ranges, which shows that the weight range quantization level,
, of the weight can have a direct impact on the recognition rate.
3.3. Circuit Implementation
So far, it has been shown through neural network calculations that a 100%-digit recognition rate can be obtained by using the relationship between weight range and quantization level
q. The device simulation for an actual device chipset was completed via PDK. The PDK simulation was performed by selecting the weight that converges to the lowest
q level of 4 in the computational calculation. The circuit is shown in
Figure 6a, where the input is represented by voltage and the output is represented by current. Compared to
Figure 1,
I is mapped to
i,
K to
k, and
N to
n. Between layers 1 and 2, a reference resistor is used to convert the current to voltage and to compensate for the current. In
Figure 6, the reference resistors used for each line are all the same. To check the influence of the threshold voltage of the diode added to prevent sneak current, the difference between the recognition rate and the output current as a function of the magnitude of the input voltage was compared through circuit simulation.
Figure 6b shows the accuracy of the recognition rate for the final output of the PDK simulation and the difference between the current computed by the neural network and the PDK computation. The accuracy shows the digit recognition rate for 10 different inputs (0–9), with a 100% recognition rate for all the input values. This experiment shows that even a system consisting of only four weights, which is reduced to four by quantizing the weights when building a chip that runs on a real device, can still provide reliable digit recognition. The rate of the current gap shown in
Figure 6b represents the difference between the output current from the PDK simulator for device fabrication and the computational calculation for the neural network. The graph shows that the input number “9” has the rate of current gap 27% (red line in
Figure 6b, using a 1 V input voltage). It can be seen that input numbers 0, 5, and 6, including 9, have more than 10% gaps regarding the output value compared to the computational neural network calculation. These gaps are the difference between the calculated output current value and the current value in the PDK simulation for driving the actual device, so they do not contribute much to the decrease in recognition rate. This trend was similar when increasing the data representing the input numbers (increasing the input information in 5 × 5 inputs). This shows that the output current value maintains a certain margin, ensuring accuracy even in different environments (gaps generated by random sources; in this case, gaps generated by device fabrication).
The difference in the current values between the neural network calculation and the PDK simulation is due to the diodes used as selectors. The diode used in this study has a threshold voltage of 0.65 V, so, if the input voltage is similar to this, the threshold voltage of the diode will be affected. As it goes through the circuits of W1 and W2, the value of the output current for each matrix slightly deviates from the calculated value, and the difference is clearly revealed in the final output current. The blue and green lines in
Figure 6b show “the current gap” when the input voltage is increased to 5 V and 10 V, respectively. It can be seen that the ratio of the current gap gradually decreases as the input voltage increases. In a computer calculation, the output current is (V-Vth)/R, but, in a PDK simulation, the voltage actually changes across the resistor after it passes the diode’s threshold voltage and undergoes feedback in the circuit.
Quantization of weights in neural network research refers to the process of reducing the precision of the weights in a neural network. In neural networks, weights are typically represented as floating-point numbers, which require a certain amount of memory and computational resources to store and process. In general, quantization aims to reduce the memory and computational requirements of a neural network by representing the weights with fewer bits. The application of quantization to simple inputs (a 5 × 5 or 7 × 7 matrix) in this study is aimed at reducing the computational requirements of the neural network’s memory and reducing the number of weights that need to be implemented by memristors as much as possible in the application of hardware-based systems, thus facilitating the implementation of device-based neural network systems (
Figure 7). Quantizing weights is essential for deploying neural networks on resource-constrained devices such as mobile phones, IoT devices, and embedded systems. However, quantization can lead to model inaccuracy, which requires careful optimization and tuning to mitigate. To quantize the weights, we introduced the concept of weight range, which provides the possibility to adjust the number of levels that can be quantized. We evaluated the cognitive operation for relatively simple digit recognition using 5 × 5 (25 inputs) and 7 × 7 (49 inputs) inputs and found a 100% recognition rate, but further evaluation of the applicability of this system to large input data is needed. Also, quantization can lead to model inaccuracies, which require careful optimization and tuning to mitigate.