Noise-Aware and Light-Weight VLSI Design of Bilateral Filter for Robust and Fast Image Denoising in Mobile Systems

The range kernel of bilateral filter degrades image quality unintentionally in real environments because the pixel intensity varies randomly due to the noise that is generated in image sensors. Furthermore, the range kernel increases the complexity due to the comparisons with neighboring pixels and the multiplications with the corresponding weights. In this paper, we propose a noise-aware range kernel, which estimates noise using an intensity difference-based image noise model and dynamically adjusts weights according to the estimated noise, in order to alleviate the quality degradation of bilateral filters by noise. In addition, to significantly reduce the complexity, an approximation scheme is introduced, which converts the proposed noise-aware range kernel into a binary kernel while using the statistical hypothesis test method. Finally, blue a fully parallelized and pipelined very-large-scale integration (VLSI) architecture of a noise-aware bilateral filter (NABF) that is based on the proposed binary range kernel is presented, which was successfully implemented in field-programmable gate array (FPGA). The experimental results show that the proposed NABF is more robust to noise than the conventional bilateral filter under various noise conditions. Furthermore, the proposed VLSI design of the NABF achieves 10.5 and 95.7 times higher throughput and uses 63.6–97.5% less internal memory than state-of-the-art bilateral filter designs.


Introduction
Image denoising methods have been rapidly evolved as a critical component of image processing pipeline [1] and high-level vision tasks [2,3]. Recently introduced methods, such as improved block-matching and three-dimensional (3D) filtering (BM3D) [4] and deep learning-based algorithms [5,6], reduce a noise significantly under various environments. However, these methods are infeasible for battery-powered mobile systems, in which the low-power operation is essential, owing to high complexity requiring high-end central processing units (CPUs) or graphics processing units (GPUs). Alternatively, the bilateral filter [7] has been started to adopt for mobile systems [8,9], owing to relatively low complexity and edge-preserving characteristic. Furthermore, very-large-scale integration (VLSI) designs for real-time and low-power bilateral filtering in mobile systems have been recently presented [10][11][12][13].
However, the bilateral filter is problematic to use practically owing to image quality degradation by noise. This is because the pixel intensity is changed randomly by the noise that is generated in image sensors [14], but the range kernel cannot distinguish whether this variation is caused by noise or not. Furthermore, a non-optimal parameter defined by a user can degrade the quality of filtered image.
In addition, VLSI designs of the bilateral filter suffer from increased complexity due to the range kernel. The range kernel is typically implemented using a look-up table (LUT) memory, which stores pre-defined weights according to the difference in pixel intensity. Because differences in pixel intensity are random, separate LUT memories are required for the parallel processing of pixels within a pixel window, and this increases the resource use in proportion to the pixel window size.
In this paper, we propose a noise-aware bilateral filter (NABF) to resolve the aforementioned image quality degradation problem, which replaces the conventional range kernel with an intensity difference-based image noise model [15] to estimate the probability that an intensity difference is caused by noise and adjusts the weight dynamically according to the estimated noise probability. A user-defined parameter that affects the quality of the bilateral filter is also eliminated by this replacement. In addition, the proposed range kernel is approximated into a binary kernel using the statistical hypothesis test method to reduce the usage of the LUT memory and the logic considerably in a VLSI design [16]. Finally, a fully parallelized and pipelined VLSI of the NABF based on the proposed binary range kernel is designed, which was successfully implemented in a field-programmable gate array (FPGA). The experimental results demonstrate that the proposed NABF and its approximated one are more robust to noise than the conventional bilateral filter under various noise conditions. Furthermore, the proposed VLSI design achieves 10.5 and 95.7 times higher throughput and uses 63.6-97.5% less internal memory when compared with recent VLSI designs of the bilateral filter.
The rest of this paper is organized, as follows. Related studies are presented in Section 2. Section 3 presents the NABF and its approximation method. Section 4 presents the VLSI architecture of the NABF. Experimental results are shown in Section 5. Lastly, Section 6 concludes the paper.

Optimal Parameter Selection of Bilateral Filter
Because ground truth images do not exist in the real environment, statistical approaches, such as Stein's unbiased risk estimator (SURE), Poisson unbiased risk estimator (PURE), and Chi-Square unbiased risk estimator (CURE), are adopted for optimal parameter selection of the bilateral filter [17][18][19]. SURE, PURE, and CURE use Gaussian noise, Poisson, and Chi-Square noise distribution, respectively. Furthermore, the output of bilateral filter is adopted as an estimate of the noiseless ground truth image, and unbiased estimated risk is modeled. Finally, the values of the parameters are selected by minimization of the estimated risk. Despite this, usefulness cannot be adopted for real-time mobile systems owing to high complexity. For example, the CURE requires dozens of minutes of execution time in a PC [20]. Recently, a training method that is based on neural network is presented [21]. Image texture features from training images and their optimal parameter are used for inputs of training model, and peak signal-to-noise ratio (PSNR) is adopted as a loss function. Furthermore, the two-dimensional (2D) wavelet transform and gray-level co-occurrence matrix are adopted to extract image texture features. Finally, a forward path of trained neural network is used as predictive model of optimal parameter for bilateral filtering. However, this method also requires a high-end CPU or GPU, owing to high complexity caused by neural network and feature extraction.

VLSI Design of Bilateral Filter
A VLSI design based on the integral histogram-based bilateral filter [22] is presented in order to process a large size of window in real-time [10]. However, their design requires many internal memories to store the histogram. Moreover, external memory access, which causes large power consumption and delay in mobile systems, is needed. In [11,12], the equation of bilateral filter is implemented to parallelized VLSI designs without modification. In particular, Gabiger-Rose et al. introduced a resource-efficient approach that divides pixels within the window into multiple groups and assigns each group to the separate pipeline [12]. Although this approach reduces the usage of resources, the processing speed is degraded and the large lookup memory to store weights is required.
In [13], a VLSI that is based on a fast bilateral filtering method using an approximation [23] is designed to support arbitrary size of the window without the increase of resources. However, the speed is low owing to its iterative nature.

Proposed Approach
In order o preserve the details of images, the range kernel ( f r ) of a bilateral filter controls the degree of smoothing by the spatial kernel ( f s ) using the intensity difference between two pixels: Here, I(x) and I BF (x) are the pixel intensity in position x, and its filtered intensity, respectively, and Ω is a pixel window. In addition, x i ∈ Ω are neighboring pixels of x within a pixel window; f s and f r are the spatial and range kernel, respectively; σ s and σ r are standard deviations of f s and f r , respectively, and W p is a normalization term. However, I(x) and I(x i ) can be changed randomly by the noise generated in image sensors [14], e.g., shot noise, but the range kernel ( f r ) cannot recognize whether this change is caused by noise or not. A noise-aware bilateral filter is proposed to resolve this issue.

Noise-Aware Bilateral Filter (NABF)
An image noise model is adapted for estimating the noise at each pixel. In particular, the image noise model [15] that is based on the intensity difference between two pixels is adopted among various image noise models to determine whether the intensity difference, which is the input of the range kernel, is caused by noise or not: Here, k is the intensity difference between two pixels; p(k) and µ are the noise probability function and its parameter, respectively; and, B k (·) is the modified Bessel function of the first kind as: ( The probability that an intensity difference is caused by noise is modeled, as shown in Figure 1a. This noise probability function is dependent on µ, as shown in Figure 1b. Furthermore, µ varies proportionally with the pixel intensity: where I is a pixel intensity, and c 0 and c 1 are constants due to a camera setting such as a camera gain. By capturing the colorchecker board, there is the linear relationship between intensity and noise parameters for each homogeneous color patch [15]. From the line, the values of c 0 and c 1 can be determined by the fitting process. For details that are related to equations and their derivation, refer to the literatures [15]. The range kernel of (1) with the above difference-based image noise model is replaced, as expressed in (5), and the degree of smoothing by the spatial kernel is controlled according to the noise probability of intensity difference: Here, f N ABF r is the noise-aware range kernel, and I(x) and I(x i ) are the intensities of a target pixel and a neighboring pixel, respectively. When the noise probability of intensity difference is small, the proposed range kernel judges that the difference is caused by a real scene change, and the neighboring pixel is excluded from the smoothing by assigning a smaller weight.
All of the noise probabilities according to |I(x i ) − I(x)| and µ are obtained by calibration [15] and they are stored in LUTs in advance. Because the proposed range kernel refers to the noise probability simply from the LUTs using |I(x i ) − I(x)| during operation without extra computations, the speed is not degraded in comparison with the conventional bilateral filter.  Intensity difference-based image noise model [15]. Here, I and k denote a pixel intensity and the difference in intensity between two random pixels, respectively; and, p(k) and µ are the noise probability function and its parameter, respectively.

Binary Noise-Aware Bilateral Filter (B-NABF)
The quality degradation due to noise can be mitigated substantially by the proposed NABF. However, the NABF increases the LUT usage considerably owing to storing noise probability values corresponding to all intensity differences, µs, and camera gains. A binary NABF (B-NABF) which approximates (5) using the statistical hypothesis test is introduced to significantly reduce the computational complexity of the bilateral filter as well as the LUT usage [16]: Here, f B−N ABF r is the binary noise-aware range kernel. I(x) and I(x i ) are the intensity of a target pixel and that of a neighboring pixel, respectively; K C (x) is the critical value corresponding to the I(x); F is the cumulative distribution function of the f N ABF r ; and α is the significance level, which is commonly set from 0.01 to 0.1 [16].
The K C (x) value is determined to be a k, which is the cumulative value of f N ABF r between two symmetric ks (A[k]) that is closest to 1−α. A weight is assigned to 1 or 0 by comparing the intensity difference with K C (x), as shown in Figure 2. Consequently, the use of LUT is reduced because the B-NABF only requires K C s instead of noise probability values at all intensity differences. Furthermore, the computational complexity is reduced owing to a binarization of weights.
To further reduce the LUT usage, only the K C s of partial camera gains are stored in the LUTs instead of storing the K C values corresponding to all camera gains, and an interpolated K C for a new camera gain is generated, as follows: where R(·) is a round operation;K i C and G i are the critical value and camera gain of an index i, respectively; and, G i↓ and G i↑ are the closest lower and upper gains with i, respectively. is the cumulative value of probability between two symmetric ks; K C denotes the critical value for a binary range kernel; and, W N ABF and W B−N ABF are weights by a noise-aware range kernel and a binary noise-aware range kernel, respectively.

VLSI Design
A VLSI is designed to accelerate the proposed binary noise-aware bilateral filter. To process a 5×5-sized pixel window per clock, the hardware design is fully parallelized and pipelined. It mainly consists of the main controller, a binary range kernel unit to compute (6), a spatial kernel unit to perform f s of (1), a K C memory to store critical values, and a K C interpolation unit to calculate (7), as shown in Figure 3. The operation of each unit is described in detail, as follows.

Main Controller
The main controller comprises a state controller for handling the operational sequence of all units which are pipelined pixel by pixel, as shown in Figure 4, and a host interface module to communicate with the host central processing unit (CPU). First, the state controller activates the line memory in the binary range kernel unit for the line buffering of an image. The pixel intensity (I) of each line is sent to the two dimensional (2D) binary range kernel element (BE) array module sequentially to produce the 2D pixel window. When two lines of the line memory are full and three Is of the third line are stored, the state controller starts the operation of the 2D BE array module in order to compute (6). Subsequently, the spatial kernel unit is enabled after one clock. The K C interpolation unit is activated before three clock cycles of the start of the 2D BE array module to provide K i C on time. As a result, the final output of the B-NABF is generated per clock.
The data, such as K C s, α, and the spatial kernel weight ( f s in (1)), are sent by the host CPU. The host interface module stores them in registers, and the data are sent to each unit.

Binary Range Kernel Unit
The binary range kernel unit computes (6) in parallel for all pixels within the window. To scale the size of the pixel window easily, a unit module, called the binary range kernel element (BE), is designed for computing (6) in one pixel, and the BE is duplicated and it is connected to process the pixel window in parallel, as shown in Figure 3. A subtractor calculates I(x i ) − I(x) in the BE, and the absolute value is computed using the sign of its most significant bit (MSB) and a two s complement. Furthermore, a comparator is employed to compare |I(x i ) − I(x)| with K i C , and a multiplexer is connected to select the input intensity or 0 as an output (O xi ) according to the compared result. The compared result, which is 1 or 0, is sent to the spatial kernel unit as a valid signal (V xi ) with O xi .
The line memory is designed using first-in first-out (FIFO) memories connected serially and a shift register in the BE in order to provide the intensities of the neighboring pixels to each BE in parallel. The output ports of each FIFO memory are connected to the input port of the BEs in the first column in the 2D BE array module, and intensities are shifted to the right BE while using the shift register connected horizontally, as shown in Figure 3. The output of the shift register in BE21 is used as the center pixel of the window (I(x)) and it is connected with all of the BEs.
In addition, the output of the third FIFO memory (I 20 ) is sent to the K C interpolation unit in advance to generate the K i C corresponding to I(x) and synchronize it with |I(x i ) − I(x)|.

K C Memory & K C Interpolation Unit
A memory with data width of eight bits and a depth of 256 is used for the K C memory, because the intensity and its difference are between 0 and 255. Four memories are used to store K C s of four camera gains. Furthermore, a logic in the host interface module is added to update the K C memory by the host CPU at the initialization.
The K C interpolation unit accesses the K C memory using I 20 , which is sent from the binary range kernel unit, as an address, and the four K C s corresponding to each camera gain are obtained. Here, K i↓ C and K i↑ C are selected among the four K C s by a multiplexer and its selection signal (K C _Sel), which is computed in advance while using a current camera gain in the host CPU. Subsequently, (7) is computed using two parallel multipliers and an adder. The precomputed (7) are sent from the host CPU. Finally, to generate K i C , the output of the adder is summed with its MSB of the fractional part for a round operation.

Spatial Kernel Unit
The 24 neighboring pixels in the window are classified into six groups according to the distance from the center pixel. The spatial kernel element (SKE) module adds up the four O xi s of each same group, and the f s of the group is computed by multiplying the sum with the spatial kernel weight that corresponded to the group. Furthermore, to obtain the normalization term (W p ) of each group, the four V xi values, which denote the valid pixel after conducting the binary range kernel, are summed up. According to the sum of V xi s, W p is selected among the multiples of W s , which are generated using the shift logic and the adder, as shown in Figure 3. Then the f s values and W p values computed by the six parallel SKE modules are added up, respectively, by the adder tree. Finally, a divider divides the sum of f s s by the sum of W p s, in order to generate the final output (O B−N ABF ).

Image Quality by Denoising
The quality of images are evaluated, which were filtered by the proposed NABF and B-NABF, via PC simulation before designing a dedicated VLSI. Five highly-textured test datasets are created, which were captured by a PointGrey Flea3 camera, as shown in Figure 5. Furthermore, a scene is captured 1,000 times and averaged the images in order to produce ground truth images. Six camera gain settings are used to verify the effect of noise variation. In addition, σ s , σ r , and α were varied to evaluate the quality variation according to the parameter setting.  Table 1 shows the average peak signal-to-noise ratios (PSNRs) of the conventional bilateral filter (BF), the proposed NABF, and B-NABF. The 15(I) of camera gain settings used the K C s, which were interpolated using (7) based on measured K C s in 10 and 18. The NABF improves the PSNR when compared with the input image under all noise conditions and parameter changes. However, the conventional BF decreases the PSNR when the noise level is low. Furthermore, the PSNR degradation by B-NABF and by using interpolated K C s is negligible when compared with that by NABF and when measured K C s were used, despite the approximation. In addition, the conventional BF is quite sensitive to variation of σ s and σ r . Whereas, the PSNR by NABF is stable despite variations of σ s and α. For instance, the PSNR differences of the NABF and the conventional BF according to the parameter setting are 0.2 dB and 6.4 dB, respectively, when the camera gain is 0 dB.  (7) based on measured K C s in 10 and 18. Table 2 shows the averaged values of PSNR of all camera gains for each test set. With scene variation, the NABF and B-NABF are also more robust to parameter variation than the conventional BF, and the PSNR is improved.  Figure 6 presents the qualitative results of the BF, the proposed NABF, and B-NABF with the ground truth images. The NABF shows clearer edges and textures similar to those of the ground truth images, while the conventional BF produces less distinct results. Furthermore, the results of the B-NABF are almost the same as them of the NABF.

Implementation Result and Comparison
Based on an architecture proposed in Section 4, a VLSI of B-NABF is designed while using Verilog hardware description language. It is verified that the register-transfer level (RTL) simulation results of the designed VLSI coincide with simulation results from the C model of B-NABF which is used in Section 5.1. Finally, the verified VLSI design was implemented in a Xilinx Virtex 7 FPGA (XC7VX330T).

System Configuration for Measurement
A PC is connected with a FPGA board using a peripheral component interconnect express (PCIe) to verify the proposed VLSI design and measure its performance, as shown in Figure 7. A test image from the PC is stored in an external memory of a FPGA board, and the VLSI design is enabled by the register setting from the PC. After bilateral filtering is finished, an interrupt signal is generated by the main controller of VLSI and then sent to the PC. The interrupt service routine reads the filtered image from the external memory and displays it. The PSNR is calculated by comparing the filtered image with the ground truth image. To execute these processes automatically, a software programs for visualization and analysis is implemented, as shown in Figure 7b. Moreover, the throughput is measured using a clock counter, which is placed in the proposed design. The total number of clocks is probed using Xilinx integrated logic analyzer (ILA), and it is displayed in the Vivado logic analyzer via joint test action group (JTAG) interface. The logic and memory usage are reported after post-implementation in the Vivado design suite tool.

Comparison with Recent VLSI Designs of Bilateral Filter
As in Table 3, VLSI designs [12,13] are considered that do not use an external memory in this study, because designs [10] that use an external memory are inappropriate for mobile systems due to the large power consumption. One earlier design [12] handles a large image in realtime owing to parallelized and pipelined implementation of (1). Pixels in a pixel window are grouped according to the distance from the center, and the pixels in each group are processed subsequently in a separate computing pipeline in order to reduce the usage of the logic and internal memory. However, the throughput is reduced due to the sharing of computing pipeline. Another design [13] implements an iterative algorithm of the bilateral filter, which was presented in [23], in order to handle an arbitrary pixel window size. However, the throughput is reduced despite a small image size due to the iterative nature of this algorithm. Moreover, a large amount of the internal memory is used for the storage of intermediate data between iterations.
In contrast, the throughput of the proposed VLSI design is 10.5 and 95.7 times higher than those of [12,13], respectively, owing to the proposed binary noise-aware bilateral filtering scheme. Furthermore, 63.6% and 97.5% less internal memory are used in comparison to [12,13], respectively. The logic usage is also significantly reduced. In terms of the image quality after filtering, as described in Section 5.1, the proposed noise-aware bilateral filter provides more stable and better quality under various noise conditions in comparison to the conventional bilateral filter adopted in [12,13].

Integration of Implemented VLSI Design and Image Sensor
A rapid prototyping system (http://huins.com/en/m11.php?m=rd&no=86) is used, which is based on Xilinx-6 XC6VLX760 FPGA, to integrate implemented VLSI design and CMOS image sensor (CIS). Omnivision OV5642 CIS is connected with the VLSI design in XC6VLX760 FPGA, and filtered output image is displayed in a monitor via high definition multimedia interface (HDMI), as shown in Figure 8.
In addition, a VLSI design of the conventional bilateral filter is implemented for comparing with a result of the proposed VLSI design qualitatively. As a result, an image that is filtered by the proposed VLSI design shows clearer edges and textures when compared with an image from a VLSI design of the conventional BF, as shown in Figure 9. Figure 9 presents test results, which are filtered by the conventional bilateral filter and the proposed B-NABF, respectively, with the original image. A noise of the original image is reduced by a VLSI design of the proposed B-NABF. Furthermore, an image of the proposed VLSI design shows clearer edges and textures when compared with an image from a VLSI design of the conventional bilateral filter.

Conclusions
We have proposed a noise-aware bilateral filter (NABF) to overcome the disadvantage of the conventional bilateral filter in which image quality is degraded by randomly generated noise in image sensors. The NABF estimates the noise using an intensity difference-based image noise model and dynamically adjusts the weight of the range kernel according to the estimated noise. In addition, a light weighting scheme is introduced, which approximates the range kernel of the NABF into a binary kernel while using the statistical hypothesis test method, for resource-limited mobile systems. Finally, a fully parallelized and pipelined VLSI architecture of NABF based on the proposed binary range kernel is presented, which was successfully implemented in a FPGA. Our experimental results demonstrated that the proposed NABF is more robust to noise than the conventional bilateral filter under various noise conditions. Furthermore, the proposed VLSI design of the NABF achieves 10.5 and 95.7 times higher throughput and uses 63.6%-97.5% less internal memory than recent designs of the bilateral filter.

Conflicts of Interest:
The authors declare no conflict of interest.