Deep Concatenated Residual Networks for Improving Quality of Halftoning-Based BTC Decoded Image

This paper presents a simple technique for improving the quality of the halftoning-based block truncation coding (H-BTC) decoded image. The H-BTC is an image compression technique inspired from typical block truncation coding (BTC). The H-BTC yields a better decoded image compared to that of the classical BTC scheme under human visual observation. However, the impulsive noise commonly appears on the H-BTC decoded image. It induces an unpleasant feeling while one observes this decoded image. Thus, the proposed method presented in this paper aims to suppress the occurring impulsive noise by exploiting a deep learning approach. This process can be regarded as an ill-posed inverse imaging problem, in which the solution candidates of a given problem can be extremely huge and undetermined. The proposed method utilizes the convolutional neural networks (CNN) and residual learning frameworks to solve the aforementioned problem. These frameworks effectively reduce the impulsive noise occurrence, and at the same time, it improves the quality of H-BTC decoded images. The experimental results show the effectiveness of the proposed method in terms of subjective and objective measurements.


Introduction
The block truncation coding (BTC) is a type of lossy image compression technique under the block-wise processing manner [1]. In the encoding process, an input image is firstly divided into a set of image blocks, in which one block is non-overlapping with the other blocks. Each image block is processed individually to yield two extreme quantizers, namely high and low mean values, and a binary image. The high and low mean values are computed in such a way using the average value (mean value) and standard deviation on each processed image block. The magnitude of high mean value is higher compared to the low mean value. These two means keep the image block statistics unchanged. The required bit to represent each image block can be significantly reduced using this strategy, while the underlying statistical property of an image block (the mean value and standard deviation) can be still maintained. In the decoding process, a pixel value of a binary image is simply replaced with high or low mean value. From this point of view, the BTC compression is very easy to implement. However, the false contour and blocking artifacts often destroy the quality of the BTC decoded image. These artifacts are more noticeable while the size of the image block is increased.
On other hand, the halftoning-based block truncation coding (H-BTC) method beats the performance of the classical BTC by introducing the digital halftoning technique to generate visual illusion on its binary image [2][3][4]. The H-BTC replaces the binary image used in the BTC technique with the common digital halftone image obtained from the ordered dither, error diffusion, and dot diffusion. While these halftoning techniques are image. Herein, we employ the CNN and residual learning with end-to-end mapping ability. The proposed networks receive the H-BTC decoded image and produce the improved quality of this decoded image.
The rest of this paper is organized as follows. Section 2 briefly discusses the H-BTC image compression and related works for improving the quality of the decoded image. Section 3 presents the proposed deep learning framework for H-BTC image reconstruction. Section 4 reports the experimental results and findings. The conclusions are finally drawn in Section 5.

Related Works
This section briefly reviews the three H-BTC compression methods, namely ordered dither block truncation coding (ODBTC), error diffusion block truncation coding (EDBTC), and dot diffused block truncation coding (DDBTC). The two methods for improving the quality of H-BTC decoded image, i.e., wavelet-based approach and VQ based technique, are also presented in this section.

H-BTC Image Compression
The variants of H-BTC image compression, i.e., ODBTC, EDBTC, and DDBTC, transform a continuous-tone image into another representation to reduce the required bit. The ODBTC, EDBTC, and DDBTC compression employ the same extreme quantization values. The main difference between those methods is only on the bitmap image generation process. This bitmap image is generated from the grayscale version of an image block. The ODBTC utilizes a dither array, while the EDBTC uses the error kernel. Whereas, the DDBTC involves class rank and diffused weighting matrices.
Let D(m, n) be a dither array of the same size with the image block, i.e., M × N. The dither array is generated by involving a set of training images and learning process. The optimum dither array is obtained by an iterative process by minimizing similarity error between the input image and its thresholded image after dithering process. Prospective readers are suggested to refer [2] for detailed explanation of dither array generation. Given a set of training images, the dither array is iteratively updated based on the similarity. The scaled versions of this dither array are pre-calculated and stored as a look-up table denoted with D (0) , D (1) , . . . , D (255) . This look-up table significantly reduces the computational time on generating the ODBTC bitmap image [2]. Subsequently, the ODBTC encodes the grayscale version of the image blockÎ(m, n) using the simple thresholding as follows: wherex min denotes the minimum pixel value in the image blockÎ(m, n), and d = q max − q min . In contrast, the EDBTC method generates the bitmap image by means of error kernel [3]. The Floyd-Steinberg kernel is very popular among the other kernels in the error diffusion halftoning. The generation of this error kernel can be traced back in [3]. Yet, the thresholding process for computing the EDBTC bitmap image is formally defined as: where x denotes the mean value over all pixels in the image blockÎ(m, n). The EDBTC further diffuses the error value obtained from this thresholding operation into its neighboring pixels based on the error kernel value. This error diffusion process can be simply formulated as follows: v m,n =x m,n + x m,n , where x m,n = e m,n * k. The value of o m,n =x min if v i,j < x, and o m,n =x max for vice versa. Herein, the symbols e m,n , o m,n , k, and * denote the residual quantization error, intermediate value, the value of error kernel, and convolution operator, respectively. The symbolsx min andx max are the maximum and maximum pixel values, respectively, found inÎ(m, n). Whereas, the DDBTC combines the effectiveness of ordered dithering and error diffusion halftoning techniques to achieve outstanding performance [4]. The DDBTC utilizes the class and diffusion matrices. The class matrix is of the same size as the image block, which determines the pixel processing order, while the diffusion matrix contains information for distributing the residual quantization error into its neighborhood pixels. The process of DDBTC thresholding can be executed using the following strategy as formerly used in [4]: v m,n =x m,n + x m,n , where x m,n = e m,n * d sum , where d, c m,n , and sum indicate the diffusion matrix, coefficient value in class matrix, and the summation of diffused weights corresponding to those unprocessed pixels, respec-tively. The DDBTC performs an identical thresholding operation as used in the EDBTC for generating the bitmap image. All H-BTC methods transmit two extreme quantization values and bitmap image into the decoder side. Yet, the decoder simply replaces the bitmap image of value 1 with the max quantizer value, and vice versa. The decoding process of H-BTC methods is performed using the following strategy: where r is the decoded pixel at position (m, n) for m = 1, 2, . . . , M and n = 1, 2, . . . , N. This decoding process is very efficient, making it very suitable for real-time application.
In this paper, a single bitmap image is utilized in the ODBTC, EDBTC, and DDBTC. As illustrated in Figure 1, we firstly need to compute the grayscale version of a given color image. Subsequently, the image thresholding is performed to convert this grayscale image into a bitmap image. This process effectively reduces the computational burden in the encoding side. In addition, the required bit for storing a single bitmap image is lower compared to that of keeping three bitmap images. But, the quality of H-BTC decoded image using a single bitmap image is slightly inferior in comparison to the three bitmap images. Figure 2 shows visual comparisons of using a single and three bitmap images over ODBTC, EDBTC, and DDBTC compression. The first and second rows are using single and three bitmap images, respectively. This figure demonstrates that quality of the decoded image using a single bitmap image is degraded compared to employing three bitmap images. It will be more challenging if we are able to improve the quality of the H-BTC decoded image using a single bitmap image.

Wavelet-Based H-BTC Image Reconstruction
The quality of the H-BTC decoded image is less satisfied for human vision due to impulsive noise occurrence. Thus, the wavelet-based method [5] tries to improve the quality of this decoded image quality based on the fact of noise placement. Commonly, the image noise remains on high-frequency subbands of the wavelet transformed image, while image information lies on low frequency subbands. More precisely, the H-BTC reconstructed image is obtained from the lowpass filtered and downsampled version of an input image [5]. Figure 3 draws a schematic diagram of wavelet-based image reconstruction [5]. Suppose that the downsampled version of the H-BTC decoded image is denoted as I ↓×2 . This image is of size M × N, where M = H 2 and N = W 2 , respectively. Herein, the discrete wavelet transform (DWT) [22,23] is utilized to decompose an image I ↓×2 as follows: where θ and J{·} denote the DWT image subbands and DWT operator, respectively. This decomposition process yields a transformed image I ↓×2 θ of size M 2 × N 2 . Subsequently, an image interpolation process with upscaling factor 2 is applied to I ↓×2  At the same time, the stationary wavelet transform (SWT) decomposes an image I ↓×2 into the following subbands: where θ * and J * {·} denote the SWT image subbands and SWT operator, respectively. Herein, each image subband I ↓×2 θ * is of size M × N. From this stage, we obtain paired image subbands with the same size, i.e., the upscaled DWT and SWT subbands. Performing the addition process between these paired subbands effectively eliminates the impulsive noise. The addition process can be denoted as follows: where α θ denotes a specific constant controlling the percentage contribution of upscaled DWT and SWT image subbands. This addition process is applied to all subbands θ and indicates the modified image subbands. Finally, the following process is executed for overall I ↓×2 θ : whereÎ is the H-BTC reconstructed image, and J −1 {·} indicates the inverse DWT operator. From this process, one obtains the H-BTC reconstructed image of the same size with the original H-BTC decoded image. However, the quality of the H-BTC reconstructed image is improved compared to the original decoded image.

Fast Vector Quantization Based H-BTC Image Reconstruction
The fast vector quantization (FVQ) [24] is an improved version of classical VQ to further speed up the computational time. This approach avoids the closest matching and similarity distance computation for all codeworks in a specific given codebook [6]. Figure 4 shows a schematic diagram of the FVQ-based approach [6]. In this scheme, the H-BTC decoded image is firstly divided into several overlapping image patches. These image patches are further matched and replaced with selected codewords from a trained codebook. The VQ or K-means algorithms generate the codebook from a set of clean images. Let B be a trained codebook of size N, containing several codewords {C 1 , C 2 , . . . , C N }. Each codeword is of n × n. Given p as an image patch from the H-BTC decoded image. This image patch is of the same size with the codeword. The matching process between an image patch p and codewords in codebook B is formulated as: where Q{·} and C k * represent the matching operation and best closest codewords, respectively. This matching process is performed by checking and evaluating the mean value, variance, and norm of the image patch with the offline precomputed of aforementioned values over all codewords. Let µ p , v p , and n p be the mean value, variance, and norm value, respectively. The first matching process checks the mean value of uninspected codeword µ k whether it falls into the following interval: where d min is "so far" smallest distortion. If codeword C k satisfies (16), its variance value v k is subsequently checked using the following criterion: If the v k value fits on interval (17), the norm value n k is then inspected using: If the constraint (18) is satisfied, the codeword C k can be regarded as the best candidate for the closest codeword. Then, the "so far" smallest distortion needs to be updated. The checking process is then continued for all uninspected codewords in the codebook. Subsequently, the image patch p is then simply replaced with codeword C k * as: where p denotes the replaced image patch. After processing all image patches, the nonalignment H-BTC reconstructed image can be obtained by arranging all image patches as follows: This process can be viewed as an additional operation over all image patches in the correct position. An alignment operation should be conducted for o(m, n) to obtain the corrected H-BTC image reconstruction. The image patch alignment can be performed as follows: where R(m, n) andÎ represent image patch operator and H-BTC reconstructed image, respectively. From this point, the quality of the reconstructed H-BTC image I(m, n) is increased compared to that of the original H-BTC decoded image.

Deep Learning Based H-BTC Image Reconstruction
This section's details present the proposed method for improving the quality of the H-BTC decoded image using a deep learning framework. The proposed method performs image quality enhancement by applying a series of convolutions, downsampling, and upsampling operations. The convolution, downsampling, and upsampling operations play an important role for the proposed method. The convolution operation learns the feature mappings for noise removal, while the downsampling operator effectively reduces the occurred noise. These convolution series effectively extract the important information of the H-BTC decoded image in order to suppress the impulsive noise. Herein, the proposed method inherits the effectiveness of CNN with the residual learning approaches. The proposed method is an extended version of the former scheme published in [14]. The former scheme in [14] mainly focuses on improving the quality of the ODBTC decoded image, while the proposed method aims to increase the quality of H-BTC decoded images, including ODBTC, EDBTC, and DDBTC. Thus, the proposed method can be regarded as a generalized version of the former scheme in [14].
The proposed method is motivated from the downsampling and upsampling operation on an image. In some cases, the downsampling operation effectively reduces the occurrence of noise. The quality of the downsampled image is often better compared to the original size of the noisy image. The H-BTC decoded image can be regarded as an image corrupted with the impulsive noise. The first column of Figure 5 shows the decoded image obtained from ODBTC, EDBTC, and DDBTC compressions. The occurred noise is reduced by performing the downsampling operation with factor 0.5 and upsampling back to the original size of the H-BTC decoded images. This result is shown in the second column of Figure 5. Whereas, the third column of Figure 5 is a set of images after performing the downsampling operator with factor 0.25 and upsampling to the original size. The downsampling and upsampling operations suppress the impulsive noise in the H-BTC decoded image effectively.

Residual Concatenated Networks
The proposed method contains the residual leaning part, namely residual concatenated networks (RCN). This RCN consists of multiple convolution layers. This network concatenates each output feature from previous layers, regarded as input features, to the current processed layer. Figure 6 gives an illustration of RCN. Suppose F be an RCN module with input features denoted as x. Let θ be the network parameters. The feed forward process of RCN can be simply formulated as follows: where y denotes the output features of RCN. Specifically, the output of each layer while the networks consist of K convolution layers can be written in the concatenated form as follows: where x 1 = x, W k , and b k represent the weights and biases of k-th convolution layer, respectively. The symbols [x k , x k−1 , . . . x 1 ], * , and ϕ denote the element-wise concatenation of feature maps, convolution operation of CNN, and Leaky ReLU [8] activation function, respectively. The proposed method exploits the effectiveness of Leaky RELU due to its ability on avoiding the dying RELU while the neuron response is negative. Thus, the proposed method can adapt better learning in the training step. If the first convolutional layer produces n feature maps, then the k-th convolutional layer will generate a n × k number of feature maps accordingly, except for the K-th layer. The number of features in the final layer should be identical to that of the first convolutional layer. Thus, the final layer could receive the residual information transmitted by shortcut connection as an identity mapping of input features. This process is denoted as follows: The RCN can be regarded as a layer with multiple receptive field configurations by expanding the number of feature maps and integrating the concatenation process.

Residual Networks of Residual Concatenated Networks
The proposed method also utilizes the residual network of residual concatenated networks (RRCN). Figure 7 depicts a simple illustration of RRCN. This part is a simple residual network with the RCN module as weight layers. By using this network configuration, the information flow can reach the deeper layer of the network easily [7,9]. Let R denote the RRCN operator. Thus, the feed forward process of RRCN can be formally defined as: whereŷ and θ are the output features of RRCN and the RRCN parameters, respectively. Suppose that L be the number of RCN modules in the RRCN part. Thus, the output features of RRCN can be formulated as follows: whereŷ denotes the output features produced by a RRCN module, with y 1 = x. If the networks is composed from D layers RRCN, then the feed forward process of the network can be written as: where z is the networks outputs, withŷ 1 = x. The RRCN actually mimics the iterative regularization as similarly performed in the image denoising task, while one observes the following forms: Herein, the RCN module performs the aggregation process. In addition, some important information from preceding layers are retained via shortcut connection as residual information.

Reconstruction Networks
This subsection presents the proposed networks for performing the H-BTC image reconstruction. The proposed method is developed and inspired by the effectiveness of wavelet-based approach for suppressing the impulsive noise of the H-BTC decoded image [5] and U-Nets framework [21]. Figure 8 illustrates the proposed networks for H-BTC image reconstruction. The proposed method works on the downscaled version of the H-BTC decoded image. It is based on the fact that the downscaled H-BTC image contains important structural information in comparison to that of the original size of the H-BTC image. The usage of downscaled versions of H-BTC decoded images also significantly reduces the computational overhead required for reconstruction purposes. Compared to the U-Nets framework [21], the proposed method simply utilizes element-wise addition for feature aggregation after downsampling and upsampling operators. The proposed networks firstly extract some important information of the H-BTC decoded image over various resolutions by performing convolution operation over several layers. Subsequently, the proposed networks reconstruct the feature maps produced by the convolution layers. It receives the H-BTC decoded image, I HBTC , as the networks input, and produces the networks output, I rec , as the reconstructed image. This process can be formally defined as follows: where RecNet(·) and Θ are the operator of reconstruction networks and its corresponding parameters, respectively. Herein, the reconstruction networks consist of five main parts, namely patch extraction layer, feature decomposition layers, feature integration layers, feature reconstruction layers, and output layer. The following are the explanations of each layer.
(1) Patch Extractor: This network part owns a single weight layer for performing image patch extraction and representation. Herein, an input image is divided into several overlapping patches. Suppose C denotes the number of image channels. This network part acts as a convolution layer with 32 filters, each of size C × 3 × 3. It generates F feature maps. These feature maps are further processed with nonlinearity mapping activation function, i.e., Leaky ReLU. At the end of the process, this network part yields feature maps of size F × H × W, denoted as F f ull . (2) Feature Decomposer: This network part consists of multiple RRCN and downsampling operators. This network part decomposes and aggregates the extracted feature F f ull into multiple resolutions of feature maps. The downsampling operations are performed by a convolution layer with two-unit strides. The RRCN in this part employs the RCN modules with dilated convolution. Using this configuration strategy, the receptive field area of networks can be expanded without increasing the number of parameters. This scenario is also motivated by the well-known algorithme a'trous, i.e., an algorithm to perform SWT with dilated convolution [10]. The network part produces a feature map F f ull of the same size with F f ull . It also generates other maps, i.e., F hal f and F hal f , each of half size compared to F f ull . Other maps also yield F quart and F quart , each of quarter size in comparison to that of F f ull . (3) Feature Integrator: This network part has almost similar structure compared to that of the feature decomposer part. The main difference is only on the replacement of the downsampling operator with the upsampling operator. The typical interpolation algorithm, such as bilinear or bicubic, can be selected as an upsampling operator in this network part. The long skip connection in this network part connects the aggre-gated feature maps from the feature decomposer part and aggregated feature maps from this network part. This long skip connection aims to ease the network training. It can trivially mitigate the modeling of identity mapping in the networks [11]. Yet, the integration process is performed by pixel-wise addition between each channel of feature maps. (4) Feature Reconstruction and Output Layers: Both these layers contain a single weight layer. These two parts perform feature aggregation between the integrated feature maps of = F f ull and F f ull . After the aggregation operation, we obtain the resulting feature maps denoted as F rec . These feature maps are further processed by the output layer to yield the H-BTC reconstructed image, i.e., I rec . This I rec is regarded as the improved quality of the H-BTC decoded image with a reduced impulsive noise occurrence.

Loss Function
This subsection shows the loss function of the proposed method for obtaining the optimal H-BTC image reconstruction. Herein, the proposed network architecture performs end-to-end function optimization to achieve the optimal parameters as follows: whereΘ denotes the optimized network parameters, and I GT is the clean image (ground truth). In simple words, the optimal image reconstruction can be achieved by minimizing the pixel difference between the original image and the reconstructed image produced from the proposed networks.
In this work, we utilize the mean squared error (MSE) as loss function in order to measure the pixel difference between the original and reconstructed image. This loss function is executed during the training process and denoted as follows: where N denotes the number of batch sizes for a single iteration. Lower value of this loss function indicates better performance during the training process. Thus, the training procedure is stopped if the magnitude of loss function is small enough or under predetermined maximum iteration. It should be noted that the proposed method is only workable for images of size divisible by 4. However, the proposed method can be applied and generalized for the image over various image sizes. If the image is indivisible by 4, the zero padding can be added to the input image such that it becomes divisible by 4. Yet, we simply extract the image with the original size at the end of the proposed network to obtain an image with the correct size. The proposed method is mainly designed for reconstructing the color image, i.e., three-dimensional data. However, the proposed method can be further extended for the grayscale image or two-dimensional data without any modifications of the networks. In simple terms, the grayscale image (two-dimensional data) needs to be converted into the color image (three-dimensional data). Each color channel (Red, Green, and Blue channel) is simply set with the grayscale image. Using this strategy, we obtain the color image or three-dimensional data from the grayscale image. Subsequently, this image is further fed into the proposed networks. At the end of the process, the proposed networks produce the image in color space (or three-dimensional data). An additional operation is applied on the output image, i.e., converting back the color image into the grayscale image. We can apply the color image conversion by using formal color space formulation or simply averaging computation over all color spaces. Another way can also be utilized for the grayscale image. The proposed networks can also receive two-dimensional data or grayscale image as input. However, we need to modify the patch extractor layer and output layers.

Experimental Results
This section reports some extensive experimental results of the proposed method. We firstly describe the image sources including the process of making the H-BTC image dataset. Subsequently, we show the model initialization and experimental setup. The proposed method is visually investigated and compared with the former schemes. At the end of this section, the performance of the proposed method is further compared with some previous H-BTC image reconstruction schemes. We consider the peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) scores [25] to objectively assess the quality of the H-BTC reconstructed image. The PSNR metric evaluates the accuracy of pixel value reconstruction between two images, which is formulated as: PSNR = 20 log 10 (255) − 10 log 10 (MSE), (33) where MSE denotes the mean square error value between the reconstructed image and reference image.
On the other hand, the SSIM measures the degree of structural reconstruction between two images. Let I rec and I re f be the reconstructed H-BTC image and the reference image, respectively. The SSIM metric is formally defined as: where µ rec and σ 2 rec denote the mean value and variance of H-BTC images, µ re f and σ 2 re f are the mean value and variance of original image. The value of σ rec,re f is the covariance between the H-BTC images and original image. The values of c 1 and c 2 are two scalars to stabilize the division operation with weak denominators. The source code of the proposed method will be publicly available on the personal website of the second author.

Experimental Setup
We firstly describe the experimental setup for the proposed method. The proposed method needs three image datasets, regarded as the training, validation, and testing image sets. For the training set, we use a set of images from the DIV2K image dataset [26]. Herein, each image is divided into non-overlapping image patches of size 128 × 128 pixels. Thus, we obtain 167,235 image patches for the training of the proposed H-BTC reconstruction network. In the training process, the proposed method requires a paired image, i.e., the H-BTC compressed image and its corresponding original image. For each training image patch, we simply perform the ODBTC, EDBTC, and DDBTC image compression to create the H-BTC compressed image from the original image patch. The size of the H-BTC compression is set as 8 × 8 and 16 × 16. The proposed method is workable on any arbitrary H-BTC image size, not only 8 × 8 and 16 × 16. We can simply perform image reconstruction using the proposed method with any arbitrary H-BTC block sizes. Yet, we can feed a set of H-BTC compressed and original image patches for the training purpose.
Despite using the training image set, we also need another image sets, i.e., validation set, in the training process. This validation set is to monitor the reconstruction performance during the training process. Herein, we use the downsampled version of the validation set (low resolution with bicubic interpolation of scale ×4) from DIV2K [26]. There are six images for the validation set. We perform a similar process as used in the training set for this validation set. The BSD100 [27], 24 images from Kodak [28] image dataset, and 16 images test from [5,6] are involved in the experiment as testing sets. They contain some broadly used images such as Lenna, Baboon, airplane, peppers, etc. All those image datasets consist of natural scenes. In addition, we also observe the proposed method performance under the URBAN100 [27] image dataset. This dataset is very challenging since it contains some urban scenes with rich details in various frequency bands. It should be noted that the dithering process on H-BTC image compression usually destroys the image details and lines. In our experimental activity, the images for validation and testing purpose are excluded from the training set.

Networks Training and Model Initialization
The network weights for the proposed method are initialized using He's method [12]. Whereas, all biases in the proposed networks are firstly set as zero over all layers on the initial stage. The proposed networks employ the stride of size 1 for all kernels, except some kernels for the downsampling operation. The proposed method simply sets the stride with size 2 for the kernel of downsampling. In addition, the proposed scheme utilizes the kernels with size 3 × 3 over all layers. Herein, we utilize the Adam algorithm [13] for performing the optimization task with β 1 = 0.9, β 2 = 0.999, and = 1 × 10 −8 . The training process is performed on 20 epochs (roughly 200.000 iterations), with batch normalization of size 16 images. The learning rate is set 1 × 10 −4 for the first 10 epoch and set into 1 × 10 −5 for the rest training of epoch. We use two RRCN modules, i.e., D = 2 in each network part. Each module contains three RCN modules, i.e., L = 3. Yet, each RCN modules consists of three convolutional layers, i.e., K = 3. The constant of all Leaky ReLU is set to 0.0001. All experiments were conducted under the Pytorch framework [29] and Python 3. The experimental environment was set on a computer with AMD Ryzen™ Threadripper 1950X 3.4GHz CPU and Nvidia GTX 1080 ti GPU. It approximately requires about 20 h for performing the training process of the proposed method. Figure 9 shows the performance of the proposed method during the training process over the H-BTC image block size 8 × 8 and 16 × 16. Herein, we use the average PSNR to measure the performance over several epochs. The PSNR is computed for all images in the validation set. From Figure 9, the number of epoch 20 is enough to obtain near-optimum networks parameters for the proposed method. During the training process, the ODBTC image compression is relatively hard to achieve the optimum solution compared to the other H-BTC methods. Since the impulsive noise produced by ODBTC is more randomized and perceived, it causes difficulty to the proposed networks to obtain an optimum solution in a limited number of epochs.  It indicates that the proposed method almost finds an optimum solution. Similar to the previous finding, the impulsive noise on ODBTC is also hard to be suppressed. The effect of different epochs on visual quality of the H-BTC reconstructed image is shown in Figure 11.
In this experiment, we show the quality improvement produced by the proposed method on an image from the validation set while the parameters of networks are from epoch 1, 5, and 20. From this figure, we can clearly see that the quality of the H-BTC reconstructed image is progressively improved while the number of epochs goes. Increasing the number of epochs may produce better quality on the H-BTC reconstructed image. In addition, we give an example of image block alignment performed by the proposed method as shown in Figure 12. As shown in this figure, the proposed method effectively increases the quality of the H-BTC decoded image by combining all image features.

Visual Investigation
This section reports the performance of the proposed method based on the visual investigation. We firstly investigate the performance of proposed method over various H-BTC image compression methods, i.e., ODBTC, EDBTC, and DDBTC. The image block for H-BTC is set as 8 × 8 and 16 × 16. In this experiment, we train the proposed networks with the training set, and then validate the performance using the training set. Yet, we overlook the performance by applying the optimum networks parameters on the testing set. Figures 13 and 14 show the visual comparisons for the proposed networks over various H-BTC methods. The quality of H-BTC reconstructed image yielded by our proposed method is better than the original H-BTC decoded image. It clearly reveals that the proposed method can remove the occurrence of impulsive noise on the H-BTC decoded image. In addition, the proposed method is capable of removing the blocking artifact that appeared on the H-BTC decoded image. Yet, the proposed method effectively refines the damaged image details and lines caused by digital halftoning.

Performance Comparisons
This subsection reports the performance comparison between the proposed method and former schemes in terms of objective measurements. In this experiment, the optimum parameters of proposed networks are obtained from the training process as described before. These parameters are then applied to the testing set. Herein, we consider all images from SIPI, Kodak, BSD100, and Urban100 as testing images. For each image, we perform H-BTC image compression under the image block 8 × 8 and 16 × 16. Subsequently, each decoded image is processed by the proposed method with the optimum parameters. The quality of reconstructed image is then measured using two metrics, i.e., SSIM and PSNR. We firstly compare the performance of the proposed method and former schemes in terms of average PSNR value. Herein, we make comparisons against the traditional approaches for H-BTC image reconstruction such as the wavelet-based method [5] and FVQ [6] scheme. In addition, the comparison is also conducted with the former deep learning-based approaches such as in the residual network [15,16] and symmetric skip CNN [17]. The methods in [15,16] inherit the superiority of the residual networks for performing the image denoising. These methods are also very suitable for H-BTC image reconstruction under the similarity of the ability of noise suppression. Meanwhile, the method in [17] performs H-BTC image reconstruction by involving the symmetric skip CNN framework. To make a fair comparison, we simply set the number of layers and feature maps for the former schemes [15][16][17] as identical to the proposed method. Table 1 shows the performance comparisons between the proposed method and former schemes in terms of average PSNR value. Whereas, Table 2 summarizes the performance comparison between the proposed method and former existing schemes in terms of the average SSIM score. These two tables explicitly tell that the proposed method outperforms the other competing schemes in the H-BTC reconstruction task. It is worthy noted that the proposed method yields better results with significant margin in comparison to the various traditional or handcrafted H-BTC compression methods. In addition, the proposed method also yields better performance in comparisons to the former deep learning-based approaches. The proposed method is a good candidate in the H-BTC image reconstruction task. The proposed method can be extended and applied for image compression, vehicle verification [30], and secret sharing [31][32][33].

Conclusions and Future Works
A deep learning approach for H-BTC image reconstruction has been presented in this paper. The H-BTC aims to perform image compression with simple technique. The proposed method inherits the effectiveness of CNN and residual learning frameworks to perform the reconstruction process. It is constructed from RCN and RRCN modules to suppress the impulsive noise and reduce the blocking artifacts of the H-BTC decoded image. The proposed method presented in this paper can be regarded as a post-processing step in the H-BTC image compression technique. As documented in the Experimental Results Section, the proposed method offers better quality on the H-BTC reconstructed image compared to the former schemes. In the real application, the proposed method can be applied on the post-processing of image compression while the decoded image requires the noise suppression or quality enhancement. The proposed method works well on improving the quality of the decoded image obtained from various image compression techniques. The decoded image can be simply fed into the proposed framework and produce the enhanced image. In addition, the proposed method can also be used for improving the quality of the H-BTC decoded image for image retrieval, watermarking, secret sharing, and the other image processing and computer vision applications.
To further improve the performance of the proposed method, the number of layers can be added to the proposed networks in the H-BTC image reconstruction task. The different activation functions can also be investigated to increase the learning ability of the proposed networks. The proposed networks can also use different optimizers to increase its performance. The convolutional operation in the proposed method can also be modified by incorporating the non-local self-similarity, graph convolutional approach, nearest neighbor information, non-local recurrent framework, and so forth. These aforementioned approaches may improve the convolutional ability to capture the non-local information to increase the learning ability and performance. In addition, the adversarial learning can be injected to the proposed networks. The adversarial networks can effectively learn the occurrence of impulsive noise and suppress the detected noise in encoder-decoder based learning. The residual learning with adversarial networks may yield better performance in H-BTC image reconstruction. These alternatives are our future works and suggestions.