Layer Decomposition Learning Based on Gaussian Convolution Model and Residual Deblurring for Inverse Halftoning

Layer decomposition to separate an input image into base and detail layers has been steadily used for image restoration. Existing residual networks based on an additive model require residual layers with a small output range for fast convergence and visual quality improvement. However, in inverse halftoning, homogenous dot patterns hinder a small output range from the residual layers. Therefore, a new layer decomposition network based on the Gaussian convolution model (GCM) and a structure-aware deblurring strategy is presented to achieve residual learning for both the base and detail layers. For the base layer, a new GCM-based residual subnetwork is presented. The GCM utilizes a statistical distribution, in which the image difference between a blurred continuous-tone image and a blurred halftoned image with a Gaussian filter can result in a narrow output range. Subsequently, the GCM-based residual subnetwork uses a Gaussian-filtered halftoned image as the input, and outputs the image difference as a residual, thereby generating the base layer, i.e., the Gaussian-blurred continuous-tone image. For the detail layer, a new structure-aware residual deblurring subnetwork (SARDS) is presented. To remove the Gaussian blurring of the base layer, the SARDS uses the predicted base layer as the input, and outputs the deblurred version. To more effectively restore image structures such as lines and text, a new image structure map predictor is incorporated into the deblurring network to induce structure-adaptive learning. This paper provides a method to realize the residual learning of both the base and detail layers based on the GCM and SARDS. In addition, it is verified that the proposed method surpasses state-of-the-art methods based on U-Net, direct deblurring networks, and progressively residual networks.


Introduction
Printers and copiers are bilevel output devices that reproduce images on a paper by generating homogenous dot patterns using inks or toners. The printed images are in fact bilevel; however, the human visual system with the characteristics of low-pass filtering allows the printed image to be perceived as a continuous-tone image. Digital halftoning is needed to create a halftoned image with uniform dot patterns from a continuous-tone image with discrete gray levels (e.g., 255 gray levels) [1]. The halftoned image determines the spatial position of the inks to be deposited on a paper or controls a laser beam to form a latent image on a photoconductor drum. Digital halftoning has been used in many applications, including animated GIF generation from videos [2], removal of contour artifacts in displays [3], video processing in electronic papers [4], and data hiding [5]. The typically used digital halftoning techniques are dithering, error diffusion, and direct binary search [6].
In inverse halftoning, a continuous-tone image with 255 gray levels or more is reconstructed from its halftoned version [7]. In other words, inverse halftoning is the reverse of digital halftoning. Inverse halftoning is required in several practical applications, such as bilevel data compression [8], watermarking [9,10], digital reconstruction of color comics [11], and high dynamic range imaging [12]. Inverse halftoning is an ill-posed problem with many possible solutions because digital halftoning is a many-to-one mapping. Many studies have been conducted over the last several decades, and various approaches have been introduced based on look-up tables [13], adaptive low-pass filtering [14], maximuma-posterior estimation [15], local polynomial approximation and intersection of confidence intervals [16], and deconvolution [17]. Recently, machine learning approaches have been actively considered based on dictionary learning [18][19][20] and deep convolutional neural networks (DCNNs) [21][22][23][24][25].

Image Decomposition in Deep Learning Frameworks
Image decomposition, which is also known as layer separation in other fields, has been steadily used for image restoration [26], image enhancement [27], and image fusion [28]. Image decomposition is an approach for separating an input image into two or more layers with different gradient and illumination characteristics. Traditional image decomposition has been realized based on image transformations (e.g., wavelets) [29] and image pyramids [30] to achieve multiple resolutions. In addition, sparse representation [31], the Gaussian mixture model [32], and adaptive filtering methods such as bilateral [33] and guided-image filtering [34] have been used for two-layer separation, i.e., base and detail layers. In this study, the base layer corresponds to a layer whose brightness changes smoothly, resembling a low-pass-filtered image, whereas the detail layer refers to a high-pass filtered image whose brightness changes rapidly. The definition of the base and detail layers may vary based on the application field.
Recently, image decomposition approaches have been incorporated into deep learning frameworks. U-net [35], Laplacian-net [36], residual networks (RNs) [37,38], and progressive residual networks (PRNs) [23,25] are representative deep learning models that apply the concept of image decomposition. U-net and Laplacian-net primarily aim to realize multiple resolutions, whereas RNs and PRNs focus on predicting residual layers. In particular, the key factor for improving image quality and accelerating convergence in an RN is that the brightness range of the residual layer should be narrow. In other words, by narrowing the output range in which the solution exists, RNs can obtain the optimal solution more easily. Therefore, it is critical to design a residual layer with a narrow brightness range.

Residual Layer Design for Residual Learning
In an end-to-end manner, RNs and PRNs are learned to map an input image into a residual layer with a narrow output range. For image restoration, the difference image between the original and input images is considered as a residual layer. Residual learning is formulated as follows: indicates the DCNN with parameter θ for estimating the residual layer. As shown in Equation (1), the output of the network is the residual, and it differs from those of conventional DCNNs that directly transform the input image x i to the original image x o with a relatively wide output range. In addition, the residual layer is designed as a difference image between x o and x i , as shown in Equation (1). This is because the measured input images can be modeled physically as the addition of original images and residual layers.
where x i indicates the measured input images. For example, captured noisy images and rain images can be measured images. x r is the residual layer that contains artifacts such as noise and rain streaks. The residual layer x o − x i , as shown in Equation (1), is derived from the additive model of Equation (2). Previous studies [37,38] showed that using the difference image as a residual layer can effectively improve visual quality and increase convergence speed. For example, in image denoising, the noise layer is used as the residual layer, which corresponds to the difference image between the original and noisy images. In general, noise is assumed to exhibit a Gaussian distribution. This implies that most of the pixels in the noisy layer are zero. Therefore, the output range of the noise layer can be narrowed. In rain removal, the rain layer including only rain streaks is used as the residual layer, and it is obtained by subtracting the original image from the input rain image. Because the rain layer includes only rain streaks, a narrow output range can be guaranteed in the residual layer.

Residual Learning Problems for Inverse Halftoning
Digital halftoning is a nonlinear system that includes binary quantization. Therefore, the additive model, as shown in Equation (2), is no longer valid for digital halftoning, that is, This means that residual learning, as shown in Equation (1), cannot be directly applied to inverse halftoning. More specifically, the halftoned image is a bilevel image composed of black and white dot patterns. If the residual layer is defined as the difference image between the original image and the input halftoned image, similar patterns that appear as black and white dot patterns can appear in the residual layer. Inevitably, a sudden change in brightness is accompanied by a residual layer. Hence, merely creating a residual layer based on image difference is not suitable for inverse halftoning.

Progressively Residual Learning Problems for Inverse Halftoning
Progressively residual learning (PRL) [23,25] can be an alternative for solving sudden changes in brightness, as mentioned in the previous subsection. In PRL, the base layer whose brightness changes smoothly is first recovered; subsequently, the remaining detail layer is predicted.
x (b) and x (d) indicate the predicted base and detail layers, respectively. For inverse halftoning, the input halftoned image x i cannot be used as the base layer. However, in PRL, the input halftoned image x i is first converted into the base layer x (b) through the pretrained DCNN f PRL_b θ . The generated base layer resembles a low-pass-filtered image, and it can be considered as an approximation of the original image. If the detail layer, which is defined as x o − x (b) , is used as the residual layer, then a narrow brightness range can be guaranteed. This implies that residual learning, f PRL_r θ , is possible. The additive model of Equation (2) can be used reasonably with PRL for inverse halftoning. For reference, the input halftoned image x (i) can be used with the base layer x (b) , as shown in Equation (3), to estimate the detail layer, thereby compensating for information loss in the predicted base layer.
However, PRL [23,25] applied to inverse halftoning has not been able to present a new deep learning model from the viewpoint of creating base and detail layers. In PRL, f PRL_b θ is trained to generate the base layer. However, the output images of f PRL_b θ cannot be regarded as the base layer. Instead, the output images correspond to the final reconstructed images because they appear similar to the original images. Moreover, the predicted base layers appear better visually than the reconstructed images using traditional inverse halftoning methods based on dictionary learning [19] and look-up tables [13]. If the image quality of the base layers decreases to the level of Gaussian blurring of the original images, then conventional PRL cannot yield satisfactory results. In summary, the PRL hitherto developed for inverse halftoning merely applies inverse halftoning twice in succession.

Contributions
This paper presents three major points. In particular, a new method for creating base and detail layers based on the proposed structure-aware layer decomposition learning (SALDL) is introduced.

•
First, to design the base layer, a new statistical distribution of the image difference between a blurred continuous-tone image and a blurred halftoned image with a Gaussian filter with a narrow output range is shown. Based on this observation, the base layer is reconstructed using a new GCM-based residual subnetwork that predicts the difference between the blurred continuous-tone image and the blurred halftoned image; this method differs completely from the existing PRL [23,25], which uses an initial restored image from a DCNN for base layer generation. • Second, the detail layer is generated based on structure-aware residual learning that predicts the difference image between the predicted base layer and the original image. To more effectively enhance image structures such as edges and textures, an image structure map predictor, which was introduced in a previous study [24], is incorporated into the residual detail layer learning, resulting in structure-enhancing learning. In addition, the predicted base layer is the low-pass-filtered version of the original image. Therefore, the proposed residual detail learning should be used to deblur the base layer, i.e., to remove the blurring of the base layer. This implies that the deblurring strategy is adopted in the proposed residual detail learning, unlike the existing PRL.

•
Third, it is demonstrated that SALDL can be used to recover high-quality images from the predicted base layers whose quality is poor in terms of edge and texture representation. However, the existing PRL [23,25] cannot yield satisfactory results from the same base layers. This reveals that the existing PRL is not suitable for low-quality base layers. By contrast, the proposed structure-aware residual learning method is more effective for describing image structures. To our best knowledge, this is the first study that performed the abovementioned comparison, and the experimental results confirmed the feasibility of the proposed SALDL as a new PRL for inverse halftoning that surpasses state-of-the-art methods such as PRL, U-net, and DCNN.

Motivations
Image decomposition is an approach for analyzing and reconstructing images. Image transformation (e.g., wavelet transformation), structure-adaptive filtering, and sparse coding have been considered as effective tools for realizing image decomposition. However, DCNNs have recently demonstrated excellent performance in image enhancement and restoration. Therefore, this study focuses on incorporating image decomposition into a deep learning framework for inverse halftoning. In particular, a new deep learning model to enable the residual learning of both the base and detail layers is introduced. As discussed in the Introduction, residual learning that directly maps an input image into the residual layer is not applicable to inverse halftoning because the additive model is no longer valid. Moreover, the output range of the residual layer cannot be narrowed, owing to the black-and-white dot patterns. PRL can be considered as an alternative for realizing image decomposition. However, the PRL that has hitherto been developed for inverse halftoning merely applies inverse halftoning twice in succession, since the quality level of the restored base layer is similar to that of the original image. In addition, the PRL merely uses initially reconstructed images through a DCNN for base layer generation; hence, the design of the base layer lacks novelty. Furthermore, existing PRL cannot recover textures and fine details from low-quality base layers. Hence, a new SALDL based on GCM is proposed herein. Figure 1 shows the concept of image decomposition based on the proposed SALDL for inverse halftoning. Unlike traditional approaches such as wavelet transform and image pyramids, residual-learning-based image decomposition is proposed. In particular, novel GCM-based residual learning and structure-aware residual deblurring are introduced for base and detail layer generation, respectively. By adding the predicted base and detail layers, a continuous-tone image can be reconstructed from the input halftoned image. Details regarding the generation of the base and detail layers are provided below. Figure 1 shows the concept of image decomposition based on the proposed SALDL for inverse halftoning. Unlike traditional approaches such as wavelet transform and image pyramids, residual-learning-based image decomposition is proposed. In particular, novel GCM-based residual learning and structure-aware residual deblurring are introduced for base and detail layer generation, respectively. By adding the predicted base and detail layers, a continuous-tone image can be reconstructed from the input halftoned image. Details regarding the generation of the base and detail layers are provided below.

Residual Layer Design for Baser Layer Generation
Unlike the residual layer design based on the additive model of Equation (1), a new GCM is proposed herein to generate the residual of the base layer.
where denotes the residual layer corresponding to the base layer. Herein, the base layer is defined as the Gaussian blurring of the input halftoned image, ⊗ . Here, ⊗ denotes the convolution operation, and indicates the Gaussian smoothing filter. Therefore, Equation (4) indicates that the residual layer corresponding to the base layer is defined as the image difference between the blurred original image and blurred halftoned image through Gaussian filtering. Compared with Equation (1), the proposed residual layer is the filtered version of − . Hereinafter, the proposed model expressed as Equation (4) is referred to as GCM to differentiate it from the additive model expressed in Equation (1). The main objective of residual learning is to narrow the output range. Whether the residual layer generated based on the GCM yields a narrow output range is yet to be elucidated. The histogram distribution for one sample image was analyzed to verify this. Figure 2 shows four images for generating two types of residual layers. The original, halftoned, blurred original, and blurred halftoned images are shown from left to right. Figure  3 shows a comparison of the histogram distributions for the two types of residual layers. One is the residual layer generated using the additive model, which subtracts the original image from the halftoned image. The other is the residual layer generated using the proposed GCM, which subtracts the blurred original image from the blurred halftoned image. As shown in the histogram distributions, the residual layer generated using the proposed GCM yielded a narrow output range compared with the conventional additive model, which yielded a wider output range. This is because the residual layer generated based on the additive model tends to exhibit textures that resemble dot patterns. Meanwhile, the proposed GCM utilizes Gaussian filtering to smooth out sudden changes that appear in halftoned images, thereby enabling the output range of the residual layer to be narrow.

Residual Layer Design for Baser Layer Generation
Unlike the residual layer design based on the additive model of Equation (1), a new GCM is proposed herein to generate the residual of the base layer.
where x r b denotes the residual layer corresponding to the base layer. Herein, the base layer is defined as the Gaussian blurring of the input halftoned image, x i ⊗ k g . Here, ⊗ denotes the convolution operation, and k g indicates the Gaussian smoothing filter. Therefore, Equation (4) indicates that the residual layer corresponding to the base layer is defined as the image difference between the blurred original image and blurred halftoned image through Gaussian filtering. Compared with Equation (1), the proposed residual layer is the filtered version of x o − x i . Hereinafter, the proposed model expressed as Equation (4) is referred to as GCM to differentiate it from the additive model expressed in Equation (1). The main objective of residual learning is to narrow the output range. Whether the residual layer generated based on the GCM yields a narrow output range is yet to be elucidated. The histogram distribution for one sample image was analyzed to verify this. Figure 2 shows four images for generating two types of residual layers. The original, halftoned, blurred original, and blurred halftoned images are shown from left to right. Figure 3 shows a comparison of the histogram distributions for the two types of residual layers. One is the residual layer generated using the additive model, which subtracts the original image from the halftoned image. The other is the residual layer generated using the proposed GCM, which subtracts the blurred original image from the blurred halftoned image. As shown in the histogram distributions, the residual layer generated using the proposed GCM yielded a narrow output range compared with the conventional additive model, which yielded a wider output range. This is because the residual layer generated based on the additive model tends to exhibit textures that resemble dot patterns. Meanwhile, the proposed GCM utilizes Gaussian filtering to smooth out sudden changes that appear in halftoned images, thereby enabling the output range of the residual layer to be narrow.

GCM-Based Residual Subnetwork for Baser Layer Generation
To realize the proposed GCM for base layer generation, a GCM-based residual subnetwork was designed, as shown in Figure 4. To implement the proposed GCM, as shown in Equation (4), Gaussian filtering was first applied to the input halftoned image. In existing deep learning tools, it can be easily implemented through a convolution layer where the convolution filter is fixed as a Gaussian filter. The Gaussian-filtered halftoned image was passed through the GCM-based residual subnetwork to output the residual layer.
where ( ) is the predicted residual layer for base layer generation, and denotes the GCM-based residual subnetwork to be trained. Herein, parentheses in superscripts indicate the predicted values. The standard deviation of the Gaussian filter was set to 1 and the filter size was 5 × 5.
To train , a loss function is defined as follows:

GCM-Based Residual Subnetwork for Baser Layer Generation
To realize the proposed GCM for base layer generation, a GCM-based residual subnetwork was designed, as shown in Figure 4. To implement the proposed GCM, as shown in Equation (4), Gaussian filtering was first applied to the input halftoned image. In existing deep learning tools, it can be easily implemented through a convolution layer where the convolution filter is fixed as a Gaussian filter. The Gaussian-filtered halftoned image was passed through the GCM-based residual subnetwork to output the residual layer.
where x (r b ) is the predicted residual layer for base layer generation, and f GCM θ denotes the GCM-based residual subnetwork to be trained. Herein, parentheses in superscripts indicate the predicted values. The standard deviation of the Gaussian filter k g was set to 1 and the filter size was 5 × 5.
where denotes a training sample, is the batch size, and ‖•‖ is the l2-norm. Compared with the additive model, the proposed GCM-based residual subnetwork can narrow the output range of the residual layer. For the pretrained GCM-based residual subnetwork, the base layer was generated as follows: To train f GCM θ , a loss function is defined as follows: where i denotes a training sample, M is the batch size, and · is the l 2 -norm. Compared with the additive model, the proposed GCM-based residual subnetwork can narrow the output range of the residual layer.
For the pretrained GCM-based residual subnetwork, the base layer was generated as follows: where x (r b ) is the output of the pretrained GCM-based residual subnetwork f GCM θ , and x (b) is the predicted base layer. This equation indicates that the base layer is the sum of the Gaussian-filtered halftoned image added to the predicted residual layer through the GCM-based residual subnetwork. For reference, the entire architecture, as shown in Figure 4, was not trained. Based on heuristic experiments, it was discovered that the learning of the entire architecture did not yield good results.

Detail Layer Design
The predicted base layer is the approximation of the Gaussian-filtered original image.
As shown in Figure 4, details such as textures and edges were absent in the predicted base layer; however, it contained the low-frequency components of the original image. Therefore, the detail layer to be predicted was designed based on the difference between the original image and the predicted base layer.
The predicted base layer x (b) was regarded as an approximation of the Gaussianfiltered original image x o . This implies that the detail layer x d contains textures and edges with small pixel values, and hence the brightness range of the detail layer is narrow. According to the detailed layer design based on the proposed GCM, residual learning can be performed for the detail layer.

Direct Deblurring Approach
The predicted base layer is the approximation of the Gaussian-filtered original image, as shown in Equation (8). Therefore, conventional image deblurring methods can be considered to directly reconstruct the original image from the predicted base layer. Conventional image deblurring methods can restore missing details by removing the Gaussian blurring of the predicted base layer. Image deblurring problems [28] can be formulated as follows: where k h,j indicates high-pass filters such as horizontal and vertical filters. α controls the sparsity, and λ is a constant to weight the regularization term [28]. In general, the motion kernel k g in Equation (10) is unknown; however, a Gaussian filter k g can be used for the motion kernel based on the proposed GCM. Additionally, the motion kernel can be estimated directly from the base layer. This case corresponds to blind image deblurring. It appears that conventional image deblurring can yield good results. However, some issues exist. A comparison between Figures 2 and 4 shows that the predicted base layer differs from the blurred original image. In other words, textures and edges are missing, and noise is generated. In addition, the noises differed from the Gaussian random noise, which has been considered to solve the image deblurring problem. Therefore, conventional image deblurring methods are not suitable for restoring the original image from the predicted base layer. In another image deblurring approach, deep learning tools are used. More specifically, the DCNN can be trained to transform the predicted base layer to the original continuoustone image [39].
where f DDN θ denotes the direct deblurring network (DDN), and x (o) is the reconstructed continuous-tone image. Because the predicted base layer x (b) is the Gaussian-blurred version of the original image, f DDN θ is regarded as a deblurring network. Because the predicted base layer has already lost some texture and sharpness, the input halftoned image x i can be used as additional information.

Proposed Layer Decomposition Learning
In addition to the DDN, as shown in Equation (11), the residual deblurring strategy can be adopted. It is noteworthy that the DDN and residual deblurring network (RDN) were derived from the proposed GCM. In other words, both are the proposed deep-learning architectures. The RDN estimates the detail layer from two types of images, i.e., the input halftoned image and the predicted base layer via residual learning. It appears that the RDN is similar to the conventional PRL [23,25]. However, the significant difference is that the deblurring strategy is adopted in the former. In other words, the predicted base layer is the Gaussian-filtered version of the original image, and the base layer is designed based on the GCM proposed for residual learning. This RDN can provide better performances than the DDN, owing to the effect of residual learning. However, this RDN is restricted in terms of recovering image structures clearly. Hence, a new subnetwork known as the image structure map predictor is incorporated in the proposed SALDL. Figure 5 shows the entire architecture of the proposed SALDL, which comprises two subnetworks. One is the image structure map predictor (ISMP), and the other is the SARDS. The ISMP transforms the input halftoned image into a Laplacian map, which refers to an image obtained by convolving the original image and the Laplacian filter. An example of the predicted Laplacian map is shown on the right side of Figure 5. Even though the predicted base layer can be input to the image structure map predictor, in this case, the detailed representation is not satisfactorily restored because the predicted base layer has already lost some texture information. As shown in Figure 5, the input halftoned image contains more texture information than the predicted base layer.
The ISMP includes a pretrained subnetwork known as the initial reconstruction subnetwork (IRS). This subnetwork generates the initial reconstructed image from the input halftoned image. Because the input halftoned image is quantized, it is preferable to predict the image structures from the initial reconstructed image than from the halftoned image. In fact, the Laplacian map is the filtered version of the original image, which implies that the Laplacian map can be predicted by convolving the Laplacian filter with the initial reconstructed image. However, the initial reconstructed image differs from the original image; hence, more convolution and ReLU layers are required at the back of the IRS. Based on the experiments, it was confirmed that the accuracy of the Laplacian map decreased when the IRS was not adopted, rendering the predicted detail layer less accurate. Therefore, the IRS is key for increasing the accuracy of the ISMP. As shown in Figure 5, the initial reconstructed image was changed to increase the performance of the ISMP while learning the entire network.
the Laplacian map can be predicted by convolving the Laplacian filter with the initial reconstructed image. However, the initial reconstructed image differs from the original image; hence, more convolution and ReLU layers are required at the back of the IRS. Based on the experiments, it was confirmed that the accuracy of the Laplacian map decreased when the IRS was not adopted, rendering the predicted detail layer less accurate. Therefore, the IRS is key for increasing the accuracy of the ISMP. As shown in Figure 5, the initial reconstructed image was changed to increase the performance of the ISMP while learning the entire network. The SARDS requires three input images: the predicted base layer, the Laplacian map, and the input halftoned image. The predicted Laplacian map was stacked on the top of the input halftoned image and the predicted base layer via a concatenation layer. Subsequently, it was input into the SARDS to estimate the detail layer.
where and denote the proposed SARDS and ISMP, respectively. ( ) denotes the predicted Laplacian map, and ( ) denotes the predicted detail layer. In Equation (12), the Laplacian map is predicted from the input halftoned image, not the base The SARDS requires three input images: the predicted base layer, the Laplacian map, and the input halftoned image. The predicted Laplacian map was stacked on the top of the input halftoned image and the predicted base layer via a concatenation layer. Subsequently, it was input into the SARDS to estimate the detail layer.
where f sards θ and f ismp θ denote the proposed SARDS and ISMP, respectively. x (l) denotes the predicted Laplacian map, and x (d) denotes the predicted detail layer. In Equation (12), the Laplacian map is predicted from the input halftoned image, not the base layer. Based on experiments, it was discovered that the Laplacian map was not accurately estimated because the base layer contained missing information. The use of the Laplacian map provided subnetwork f sards θ with spatial information regarding areas that were flat, lined, or textured. This information enabled the entire network to be trained by adapting to local image structures. Consequently, the texture representation of the detail layer can be improved and noisy dot patterns on flat areas can be effectively removed. The ISMP can be regarded as a type of attention network, whereas the predicted Laplacian map is in fact a spatial attention feature map.
The multiloss function was used to learn f sards θ and f ismp θ ; it is expressed as where i denotes the training sample, M the batch size, and ω the weight of the two subnetworks. As shown in Equation (12), the accuracy of f ismp θ affects the accuracy of f sards θ . Therefore, in this study, ω 1 and ω 2 were set to the value of 1.
For the trained f sards θ and f ismp θ , the final continuous-tone image was generated based on the additive model, i.e., x (o) = x (d) + x (b) . As mentioned in the Introduction, the additive model is not suitable for inverse halftoning. However, by generating a Gaussian-blurred version of the original image, layer decomposition learning based on GCM and SARDS can be applied to inverse halftoning.

Experimental Results
The proposed SALDL for inverse halftoning was implemented using MatConvNet [40] and trained with two 2080Ti GPUs on a Windows operating system. To evaluate the proposed method, it was compared with state-of-the-art deep learning methods based on DCNN [37], DDN [39], U-Net [35], and PRL [23,25]. In this study, a Gaussian-blurred halftoned image was used as the base layer in both the DNN and PRL methods to implement Equations (11) and (3), respectively: In other words, the same base layer was used for the pair comparison. This can reveal the effectiveness of the proposed method in recovering image structures, as compared to DDN and PRL. For performance evaluation, the peak signal-to-noise ratio (PSNR) and structure similarity (SSIM) [41] were used to measure the inverse of the MSE in a log space and the structure similarity between two images, respectively. For both the PSNR and SSIM, a higher value indicates higher quality. The source code of the proposed SALDL can be downloaded at https://github.com/cvmllab (accessed on 29 July 2021).

Training Data Collection
For training, public datasets [36] including General 100, Urban 100, BSDS100, and BSDS200 were used to prepare continuous-tone color images. The total number of continuoustone color images was 500. General 100, urban 100, and BSDS200 were used for training, whereas BSDS100 was used for validation. The same training and validation sets were used to train all the deep-learning-based methods: the proposed SALDL, PRL, U-net, DDN, and DCNN. The three subnetworks of the GCM-based residual subnetwork, IRS, and SARDS used the same training and validation datasets. For digital halftoning, the continuous-tone color images were converted into grayscale images; subsequently, error diffusion [42] was used to transform the grayscale images into halftoned images. The Floyd-Steinburg filter [1,42] was used for error diffusion. The Laplacian operator was applied to the grayscale images to obtain Laplacian maps. To obtain the training patches, three types of patches were extracted randomly from the grayscale original images, Laplacian maps, and halftoned images. The extracted patch was of size 32 × 32. In this study, grayscale patches were used for training because error diffusion can be easily applied to them. To apply the proposed trained network to color images in the test phase, the color image was first separated into R, G, and B planes. Subsequently, the proposed network was applied to each plane independently.

Networking Training
All the subnetworks including the GCM-based residual subnetwork, ISMP, IRS, and SARDS were comprised of convolution and ReLU layers. Hereinafter, a pair comprising convolution and ReLU layers is known as a convolution block. In the subnetworks, m filters measuring 5 × 5 × c were used in the convolutional layers. Here, c represents the number of input channels. Table 1 shows the number of filters and channels used in the convolutional layers. In the input layer of the SARDS, c was set to 3 because three input channels (the base layer, the Laplacian map, and the halftoned image) were input to the input layer. The filters were initialized using a random number generator. The number of convolution blocks used in the GCM-based residual subnetwork, IRS, and SARDS was set to 16. The number of convolution blocks used in the ISMP except the IRS was 6. In other words, ISMP uses six more convolution blocks than IRS. To update the convolution filters, the mini-batch gradient descent algorithm [43] was used. The epoch number was 200, and the batch size was 64. Each epoch involved 1000 backpropagation iterations. The learning rate began at 10 −5 and decreased linearly every 50 epochs to 10 −6 . All loss functions were modeled by the l 2 norm.

Visual Quality Evaluation
In this study, two datasets were tested. One was a small texture dataset, as shown in Figure 6, and the other was BSDS100. Because the proposed method is strong at expressing image structures owing to the use of SARDS, a small texture dataset was prepared to contain various types of image structures, including lines, curves, and regular patterns. The BSDS100 dataset was also tested to verify whether the proposed SALDL could improve the detail representation and dot elimination. Clearly, not all test images were included in the training dataset. Figure 7 shows the experimental results for a small dataset. As shown in the red boxes, the proposed method describes the image structures more accurately. In addition, the overall sharpness of the images was better. In particular, the lines of the pants were restored in more detail and were sharper (as shown in the first row) when using the proposed method. The second row shows more clearly expressed cactus thorns. The third row shows the textures on the palm and the hair accessory in more detail. As shown in the fourth, fifth, sixth, and seventh rows, text including the license plate, rip outline, straw, and Gogh's eyes, respectively, were restored more clearly. Moreover, as shown in the blue box in the fifth row, the proposed method suppressed noisy dots on flat areas, unlike the case involving the conventional DCNN [37] and U-Net [35] methods. The blue box in the last row shows that the proposed method can reproduce smooth skin tones in the face areas, whereas the face areas reconstructed using other methods appeared rougher and noisier. Figure 8 shows other experimental results for the BSDS100 dataset. Similar effects were observed. In other words, in the red box, sharper curves were restored using the proposed method. By comparing the proposed method with the DDN/PRL methods, it was verified that the additional use of the ISMP can improve performance for detailed representation and dot elimination. The DDN directly predicts the continuous-tone images from the base layers, as shown in Equation (11). Because the base layers are predicted, some information may be lost. Hence, the flat areas of the reconstructed images appeared slightly noisy, and Experimental results for a small texture dataset: halftoned images, images reconstructed using DCNN [37], images reconstructed using U-net [35], images reconstructed using DDN [39], images reconstructed using PRL [23,25], images reconstructed using the proposed SALDL method, and original images (left to right).

Figure 7.
Experimental results for a small texture dataset: halftoned images, images reconstructed using DCNN [37], images reconstructed using U-net [35], images reconstructed using DDN [39], images reconstructed using PRL [23,25], images reconstructed using the proposed SALDL method, and original images (left to right).    Tables 2 and 3 show the results of the PSNR and SSIM evaluations for the small texture and BSDS100 datasets, respectively. As expected, the proposed SALDL method demonstrated the best performance among all the methods, and it surpassed the state-of- By comparing the proposed method with the DDN/PRL methods, it was verified that the additional use of the ISMP can improve performance for detailed representation and dot elimination. The DDN directly predicts the continuous-tone images from the base layers, as shown in Equation (11). Because the base layers are predicted, some information may be lost. Hence, the flat areas of the reconstructed images appeared slightly noisy, and their sharpness can be further improved. The PRL method uses input halftone images to increase the amount of information for residual learning, as shown in Equation (3). Therefore, the PRL method can provide results with improved image quality, as compared to the DDN method. However, the PRL method lacks image structure representation. In addition, the existing PRL cannot produce satisfactory results from the same base layers. This reveals that the architecture of the existing PRL is not suitable for low-quality base layers. Hence, the proposed SALDL uses the ISMP to identify Laplacian maps from the input halftoned images. Figure 9 shows the Laplacian maps predicted by the ISMP. In this figure, texture lines are detected well, which means that the predicted Laplacian map provides the SARDS with spatial information regarding areas that are flat, lined, or textured. This information enables the proposed SALDL to be adaptive to local image structures. Consequently, the texture representation of the detail layer can be improved, and noisy dot patterns on flat areas can be effectively suppressed. The ISMP can be regarded as a type of attention network, and the predicted Laplacian map is a spatial attention feature map.   Tables 2 and 3 show the results of the PSNR and SSIM evaluations for the small texture and BSDS100 datasets, respectively. As expected, the proposed SALDL method demonstrated the best performance among all the methods, and it surpassed the state-of- Based on Equations (3) and (11), the DDN and PRL methods use the base layers generated using the proposed GCM-based residual subnetwork. The DDN is one of the deep learning architectures proposed for inverse halftoning because it was derived from the GCM proposed to predict Gaussian-blurred images. In the existing PRL methods, no specific models exist for the residual learning of the base layer. In addition, the existing PRL cannot produce satisfactory results from low-quality base layers. To our best knowledge, this study is the first to perform the abovementioned comparison, and the experimental results confirmed that the proposed SALDL can be used as a new deep learning model for inverse halftoning that enables residual learning for both the base and detail layers by incorporating image decomposition into the deep learning framework. Tables 2 and 3 show the results of the PSNR and SSIM evaluations for the small texture and BSDS100 datasets, respectively. As expected, the proposed SALDL method demonstrated the best performance among all the methods, and it surpassed the stateof-the-art inverse halftoning methods based on deep learning. This indicates that the proposed image decomposition model is effective in obtaining high-quality continuoustone images from halftone images. The proposed base layer design, based on the GCM, enables residual learning by narrowing the output brightness range. The structure-aware residual deblurring strategy can remove the blurring of the predicted base layer and restore the image structures effectively. The proposed SALDL is a new PRL for inverse halftoning. By contrast, the PSNR and SSIM of the DDN and PRL were lower than those of the proposed method. This confirmed that the DDN and PRL were restricted in terms of restoring the original images from low-quality base layers. Tables 2 and 3 show that the average PSNR of the U-net was slightly better than that of the PRL. This implies that the U-net is an extremely effective model for inverse halftoning. In other words, decomposing input halftoned images into multiple resolutions is an extremely effective approach. If the SRDAS and GCM-based residual subnetworks are built similarly to U-net, then the performance of the proposed method may be improved.

Conclusions
A new SALDL method for inverse halftoning was proposed. First, a new residual learning method based on the Gaussian convolution model was introduced for base layer generation. Compared to the additive model, which has been used for image denoising and rain removal, this Gaussian convolution model utilizes a statistical distribution in which the image difference between the blurred original image and blurred halftone image with a Gaussian filter can possess a narrow brightness range. Second, a structure-aware residual deblurring strategy was presented. To remove the Gaussian blurring of the base layer and recover the image structures effectively, an image structure map predictor was designed to estimate the image structures from halftone patterns. This image structure map predictor enabled the entire network to be trained adaptively to local image structures; hence, noisy dot patterns on flat areas were suppressed and local image structures such as lines and text were described precisely. The experimental results confirmed that the proposed method surpassed state-of-the-art inverse halftoning methods based on deep learning, such as U-net, DCNN, DDN, and PRL. In addition, it was verified that the proposed image decomposition model was extremely effective in obtaining high-quality continuous-tone images from input halftone images.