1. Introduction
With rapidly advancing technology, energy efficiency has become one of the major design challenges in digital circuits and systems. Studies demonstrate that energy efficiency can be improved by reducing both the computational time and power consumption [
1]. However, reducing these factors affects the performance of the system. In other words, reducing the power consumption affects the overall performance of the system. This challenge intensifies the current demand for low-power high-performance systems, and therefore a novel methodology to handle this challenge is required.
One such promising technique that exploits probability theory “stochastic computation” can address these limitations [
1]. Stochastic computing (SC), which was invented in the 1960s by Gaines [
2,
3], recently regained significant attention mainly due to its approximate computation method. This computation method offers progressive accuracy scalability [
4] that can be well exploited in the applications where approximated accuracy is accepted. This includes media processing, neural networks, factor graphs, LDPC codes, fault-tree analysis, image processing, and filters [
5,
6,
7,
8,
9,
10].
However, mainstream adoption of SC is limited due to the long run-time and inaccuracy [
1]. As explained in [
11], a random number generator (RNG), also known as a stochastic number generator (SNG), plays a significant role in determining the area and energy consumption. The commonly used SNG is the linear feedback shift register (LFSR), and several optimization techniques to improve the output accuracy of the LFSR-based SNGs are presented in the literature [
12,
13,
14,
15,
16,
17,
18]. As presented in [
19], increasing the length of stochastic sequences (SS) increases operating time and power consumption.
To address this issue, [
11] introduced a quasi-stochastic bit sequence generation (QSNG) that utilizes the distributed memory elements of a field-programmable gate array (FPGA) for designing the SNGs. However, no comment on energy reduction has been reported in [
11]. Therefore, in this work a detailed analysis and methodology for energy reduction is presented to improve the overall performance.
In this paper, a novel energy-performance scalable methodology based on quasi-stochastic number generators is proposed and validated. Compared to the conventional approaches, the proposed methodology utilizes a novel algorithm to estimate the computation time based on the accuracy. Finally, a comprehensive simulation-based study is presented in this paper to demonstrate the reductions in operating time and energy consumption. Overall, a 12–60% reduction in the operating time and a 12–78% saving in terms of the energy consumption relative to the conventional LFSR counterpart are observed.
This paper is organized as follows. In
Section 2, background of Stochastic computing and quasi-stochastic bit sequence generation are discussed.
Section 3 provides a novel energy-efficient quasi-stochastic computing algorithm to calculate the number of clock cycles based on the peak signal-to-noise ratio. The simulation results to validate the proposed approach are presented in
Section 4. Finally,
Section 5 asserts the conclusion.
3. Energy Performance Scalability of Novel Quasi-Stochastic Computing Approach
We begin this section by discussing major factors affecting the accuracy of a processed image. Next, the effect of computation time on accuracy and energy consumption is demonstrated. Lastly, the proposed energy efficient algorithm that introduces energy-performance scalability in SC is discussed in detail.
In most of the image processing techniques, the quality of the processed image is determined by its accuracy. Accuracy can be quantified using several error metrics, such as maximum error, mean square error (MSE), and so on [
23]. In this work, PSNR is used to quantify the acceptability of noisy image. It is measured in the unit of
and determines the similarities between two images (e.g., input image and processed output image). PSNR value can be calculated by Equation (
1) [
23]:
where
=
is the mean square error between the error-free and the erroneous image,
is the maximum image pixel value (e.g., 255 in 8-bit grayscale image),
m and
n represent the width and height of the target image in terms of the number of pixels, and
and
represent the pixel values of the error-free image and the erroneous/noisy image, respectively. For the gray scale images, MSE is determined based on their brightness values.
As seen from Equation (
1),
plays an important role in determining the accuracy of the image and the length of the SS. According to [
11,
19], high precision (in terms of accuracy) output can be achieved when an SC circuit operates on a large number of stochastic bit streams. Since each bit of an SS takes a clock cycle to be processed, computation time linearly increases with the increase in the size of the stochastic bit stream. Therefore, with increasing accuracy, computation time tends to increase. Note the computation time refers to the total number of clock cycles required to generate output SS. In physics, power is how fast energy is used or transmitted and power is calculated as the amount of energy divided by the time it took to use the energy. Its unit is the watt, which is one joule per second of energy used. Likewise, power is the amount of energy used per each unit time (i.e., 126 clock cycles) in a clocked digital circuits. Then, energy can be calculated by multiplying power by the total number of clock cycles used. Therefore, the number of clock cycles and energy consumption are proportional. In a conventional digital circuit designed to process data given in binary radix encoding, energy-performance scalable computing is quite limited, as the total number of clock cycles needed to process inputs to generate output is solely determined by how the circuit is designed and optimized. Also, power consumed per clock cycle is purely dependent upon the complexity of the circuit. Besides, Stochastic computing has much higher inherent potential for efficient utilization of energy-performance scalability. The term energy-performance scalability in this paper refers to the fact that when accuracy is high, energy consumption will be high. However, for many image processing applications, a desirable accuracy is more than enough. Therefore, savings in energy can be achieved for acceptable accuracy. If more clock cycles are used, more energy will be needed, but higher quality output will result and vice versa. Such an inherent tradeoff can be beneficial in certain application domains such as image processing and artificial neural networks where quick low-power approximation is desired. The proposed quasi-stochastic computing approach is to address the slow convergence problem of conventional Stochastic computing while offering excellent energy-performance scalability.
To prove that the proposed approach is viable, an edge detection scheme is performed on the gray scale image “clock.” The impact of computation time on accuracy and energy is depicted in
Figure 3. As seen from the graph, the accuracy in terms of PSNR of the image and the energy consumption tends to increase linearly with the number of clock cycles. Hence, it is practical to choose the minimum number of clock cycles that can satisfy the minimum required accuracy for the best possible energy-performance balance. To address this energy-accuracy trade-off, we propose an energy-accuracy scalable EQSNG design that can determine the number of iterations based on the acceptable PSNR threshold for an image.
The acceptability of the target image can be achieved by just comparing the equivalent error rate with the corresponding acceptable error rate threshold. This acceptable error rate threshold is assumed to be a user-defined value in this work. The general design of the energy efficient QSNG model (EQSNG) is depicted in
Figure 4. The optimal number of iterations is calculated based on the user-defined peak signal-to-noise ratio (PSNR).
The process to estimate optimal number of iterations is shown in Algorithm 1. The first step is to store the pre-computed direction vectors in the random-access memory (e.g., look up tables of the FPGA). Then, each bit of n-bit directional vectors is multiplied with the n-bit binary counter output using an AND gate. The resulting binary numbers are XORed up to obtain the final LD sequence. This LD sequence is compared to the binary input value to generate an SS on which stochastic operations are performed. The resultant stochastic output is again converted to binary number at the stochastic binary conversion block. This post-processed binary output is processed in MATLAB to determine the image quality (i.e., accuracy).
Algorithm 1: EQSNG Algorithm |
|
To determine the image accuracy, the mean square error (MSE) that accurately measures the error in the reference image is calculated first. The resultant MSE value is used for calculating PSNR (). If the calculated is less than the user defined target PSNR value (), the counter is incremented and the whole process is carried out till the desired is achieved. Since the counter is incremented by increasing the clock cycles, the total energy consumption is calculated by multiplying the power by the number of clock cycles. As the proposed approach can converge at a much faster rate, they require few clock cycles to achieve the desired PSNR value, which in turn further reduces the energy consumption.
Hence, the proposed approach provides acceptable image quality with fewer clock cycles and less energy consumption. Compared to the conventional SC approach based on LFSR, the EQSNG methodology can generate an acceptable quality edge detection image with excellent energy efficiency. To demonstrate and verify the energy-performance scalability of the EQSNG approach, the proposed methodology is implemented on a stochastic edge detection circuit for 8-bit grayscale image processing. In the next section, the proposed methodology is applied to several test images and comparative results are presented and analyzed.
4. Simulation-Based Energy-Performance Scalability Analysis
This section compares the results for various test images implemented using conventional LFSR and EQSNG approaches. These test images on which edge detection is performed are shown in
Figure 5, which are called clock, crowd, and aerial. The edge detection circuit based on Robert’s cross algorithm [5] was used for the proposed energy-performance scalability analysis. To study the impact of the proposed approach on energy consumption, target PSNR values are arbitrarily selected. Next, the computation time (i.e., number of clock cycles) required to achieve the specified accuracy is determined and corresponding energy consumption is calculated.
The circuits have been realized on a Xilinx Virtex 4 SF FPGA (XC4VLX15) device and synthesized using Xilinx ISE 12.1 design suite. The QSNG uses the LD sequence and distributed memory elements (LUTS) of the FPGAs for designing the SNGs. Therefore, an FPGA is used. The performance of the proposed technique has been extensively evaluated using a 8-bit grayscale images (i.e., each pixel value is represented using a stochastic bit-length of = 256 bits) as an example in this section. A cycle-accurate simulator has been implemented in MATLAB to generate simulation results for the proposed technique. The pixel values of the images were extracted using MATLAB and were given as the 8-bit binary input to the stochastic edge detection circuit. Then, the output extracted from the post-synthesis simulation results was processed in MATLAB to determine the accuracy.
To quantitatively demonstrate and verify the performance of the proposed approach, energy consumption is determined by using the following simulation parameters: 8-bit grayscale images and its desired PSNR value.
Table 1 shows the number of clock cycles and energy consumed for achieving the desired quality of image. As per the results shown in the table, energy consumption for the proposed EQSNG methodology is significantly lower than the traditional approach (LFSR) for the same target PSNR. As seen from the
Table 1, the number of clock cycles for EQSNG to achieve the desired quality of the image is considerably less than LFSRs. The proposed EQSNG implementation of the edge detection circuit reduces the computation time by a factor of 3.5 times on average when compared to LFSR based approach. For instance, to achieve a PSNR of 31.53 dB for the Aerial test image, the energy consumed by the EQSNG and LFSR approach are 0.14
J and 0.63
J, which is a substantial saving.
To quantitatively demonstrate and verify the performance of the proposed approach, energy consumption is determined by using the following simulation parameters: 8-bit grayscale images and their desired PSNR value.
Table 1 shows the number of clock cycles and amount of energy consumed for achieving the desired quality of image. As per the results shown in the table, energy consumption for the proposed EQSNG methodology is significantly lower than the traditional approach (LFSR) for the same target PSNR. As seen from
Table 1, the number of clock cycles for EQSNG to achieve the desired quality of the image is considerably less than LFSRs. The values in
Table 1, are obtained by designing both the LFSR and EQSNG models and verified via simulation studies.
The proposed EQSNG implementation of the edge detection circuit reduces the computation time by a factor of 3.5 times on average when compared to the LFSR based approach. For instance, to achieve a PSNR of 31.53 dB for the aerial test image, the energy consumed by the EQSNG and LFSR approach are 0.14
J and 0.63
J, which is a substantial saving. Therefore, the energy consumption reduces by 77.7%. Similarly, the energy consumed by LFSR and EQSNG methodologies to achieve a PSNR of 28 dB for the clock test image is 0.054
J and 0.05
J energy. Thus, the proposed approach reduces energy consumption by 12.2% as presented. Compared to the LFSR approach, for the Crowd test image with a PSNR of 40.30 dB, the EQSNG approach saves about 18.6% of energy. The, reduction in energy consumption for various PSNR values by using the proposed approach is depicted in
Figure 6.
From
Table 1, it should be noticed that as the PSNR (i.e., accuracy) increases the number (#) of clock cycles utilized also increases. Therefore, the higher the computation time, the better the quality of the image as illustrated in
Figure 7,
Figure 8,
Figure 9,
Figure 10,
Figure 11, and
Figure 12. These figures show that the proposed approach utilizes a smaller number of clock cycles to achieve the same accuracy as the LFSR approach due to faster stochastic value convergence. Therefore, using the proposed EQSNG methodology, execution time and energy consumed can be reduced while achieving an acceptable level of accuracy.
In summary, 12%–78% reduction in the energy consumption is observed. Moreover, compared to LFSR based approach, the proposed EQSNG implementation on average reduces the computation time by a factor of 2.5 times. This excellent energy-quality scalability of the proposed approach may also be beneficial to the other application domains (e.g., signal processing, machine vision, and deep learning) where efficient reduced-precision computation is desired.