An Efﬁcient Convolutional Neural Network Model Combined with Attention Mechanism for Inverse Halftoning

: Inverse halftoning acting as a special image restoration problem is an ill-posed problem. Although it has been studied in the last several decades, the existing solutions can’t restore ﬁne details and texture accurately from halftone images. Recently, the attention mechanism has shown its powerful effects in many ﬁelds, such as image processing, pattern recognition and computer vision. However, it has not yet been used in inverse halftoning. To better solve the problem of detail restoration of inverse halftoning, this paper proposes a simple yet effective deep learning model combined with the attention mechanism, which can better guide the network to remove noise dot-patterns and restore image details, and improve the network adaptation ability. The whole model is designed in an end-to-end manner, including feature extraction stage and reconstruction stage. In the feature extraction stage, halftone image features are extracted and halftone noises are removed. The reconstruction stage is employed to restore continuous-tone images by fusing the feature information extracted in the ﬁrst stage and the output of the residual channel attention block. In this stage, the attention block is ﬁrstly introduced to the ﬁeld of inverse halftoning, which can make the network focus on informative features and further enhance the discriminative ability of the network. In addition, a multi-stage loss function is proposed to accelerate the network optimization, which is conducive to better reconstruction of the global image. To demonstrate the generalization performance of the network for different types of halftone images, the experiment results conﬁrm that the network can restore six different types of halftone image well. Furthermore, experimental results show that our method outperforms the state-of-the-art methods, especially in the restoration of details and textures.


Introduction
Digital halftone is a technique to convert a continuous-tone image into a binary image known by the name of halftone image. Due to the low pass character of the human eyes, the generated halftone image can be perceived as a continuous-tone image when viewed from a certain distance. Thus, the digital halftone technique is widely used in bi-level output devices in order to reproduce the tone of a continuous-tone image, such as printing press machines, printers, fax machines and so on [1,2]. Besides, the digital halftone technique can also be employed as an image compression mode for saving storage space or electric power in special occasions, for example in telemedicine [3] and IoT [4]. Major halftone methods used in practice include ordered dithering (OD), dot diffusion (DD), error diffusion (ED) and direct binary search (DBS) [5].
Inverse halftoning is the reverse process of digital halftone, which is used to restore a continuous-tone image from its halftone version. The reason of using inverse halftoning is different layers of CNN contain different feature information. The low-level features contain more sharp and detailed information, while high-level contain more abstract semantic information. So, in this article, the low-level detail information and high-level semantic information are concatenated with skip connection and then fused by attention blocks, which helps to restore image details. To further improve the restored image fine details, multi-stage loss functions are proposed in the presented network. The final loss function integrates multiple loss functions which come from the restored images at different stages, this helps to obtain more informative features at different restoring stages and thus improve the restored image fine details. We conduct extensive experiments on VOC2012 dataset [27] and six types of halftone images. Results show that our method outperforms the state-of-the-art methods and has better generalization performance for different types of halftone images.
The contributions of the paper are as follows: (1) We introduce the attention mechanism to the proposed network, which can better guide the network to remove noise dot-patterns and restore image details, and improve the network adaptation ability. To the best of our knowledge, this is the first work of using attention mechanism for inverse halftoning. (2) Multi-stage loss functions are employed in the network, which can further enhance the restored image details. (3) The experimental results demonstrate that the proposed method achieves impressive performances compared with the state-of-the-art methods and can be applied to many different types of halftone images.
The rest of the paper is organized as follows: Section 2 briefly introduces the related works about inverse halftoning including traditional inverse halftoning methods and deep learning-based methods. Section 3 details the proposed method. Section 4 discusses the experimental results. Finally, Section 5 concludes the paper.

Related Works
As a classic image restoration problem, inverse halftoning has been widely developed in the past decades, and a large number of inverse halftoning methods have been proposed, including filtering methods [8], wavelets [11], look-up table (LUT) [12] and neural networks [16][17][18]20]. Especially in recent years, with the rapid development of deep learning, supervised learning and unsupervised learning have been widely used in industrial production [28][29][30], which promotes the rapid development of inverse halftoning. As our method belongs to deep learning method, we review the progresses in inverse halftoning according to the traditional inverse halftoning methods and deep learning based inverse halftoning methods.

Traditional Inverse Halftoning
In view of the characteristics of low pass filtering of human eyes, and high frequency characteristics of halftone dot-patterns, the earliest methods were based on low filters, such as the Gaussian filter, median filter or bilateral filter [31]. Low pass filtering simply removes halftone dot-patterns, but image details are also removed by this process. To get more image details in the restored images, adaptive filtering [8], non-linear filtering [32] and transformdomain filter [11,33] were investigated. Kite et al. [8] proposed a multi-scale gradient estimation filter, and then used it to choose the best parameterized smoothing filter from a family of parameterized customized smoothing filters for each pixel. The proposed method can obtain a sharp image with a low perceived noise level for error-diffused halftone images. Kim et al. [32] proposed a non-linear binary permutation filter for reconstructing continuous-tone images from ordered dithered halftone or error diffused halftone images. The presented filter is based on the space and rank orderings of the halftone samples in a halftone observation window. Luo et al. [33] proposed a novel wavelet-based inverse halftoning method, which can remove halftone noise by noise attenuation and intraband filtering in the wavelet space. The method presented by Xiong et al. [11] used highpass wavelet images and cross-scale correlations in the multiscale wavelet decomposition to remove halftone noise while preserving image edges and details information. Unlike the aforementioned filtering method, Mese et al. [12] creatively proposed a look up table (LUT) inverse halftoning method with fast speed. In this method, the LUT was constructed by a halftone template and its corresponding continuous-tone value. Inspired by the successful applications of sparse representation in field of signal processing, a novel inverse halftoning method based on sparse representation was firstly presented by Son [34], where two jointed dictionaries are learned for the concatenated feature spaces of continuous-tone images and halftone images. In the method, there is assuming that the sparse representation coefficients of continuous-tone images and halftone images are the same. To relax the assumption, Zhang et al. [2] proposed a semi-coupled multi-dictionary learning method for inverse halftoning. Son [14] proposed an edge-oriented local learned dictionaries (LLD) method, which can enhance the edge details of the restored image. Considering the quality of inverse halftoning depends on the starting halftone method, Huang et al. [15] proposed an inverse halftoning method based on neural network by integrating the process of digital halftoning and inverse halftoning, where a single-layer perceptron neural network was adopted for halftoning and a radial-basis function neural network was adopted for inverse-halftoning. Although these abovementioned methods produced relatively satisfactory results at that time, the quality of restored image by these conventional methods is still not as good as those based on deep learning methods. The reason is that the method based on deep learning can get deep and hierarchical feature representation in an end to end manner, which is more efficient to extract abstract features for halftone image restoration. Moreover, features extracted at different levels based on deep learning method exhibit diverse characteristics to the input halftone image. Thus, the method based on deep learning can better restore image details by fusing low-level detail features and high-level semantic features. In references [16][17][18]20], they compared with some classical traditional methods, such as filtering method [8], wavelet method [11], LUT method [12], MLP method [15] and dictionary learning method [14]. The experimental results also demonstrate the image quality restored by inverse halftoning methods based on deep learning is superior to traditional inverse halftone methods both in qualitative and quantitative evaluation aspects.

Deep Learning Based Inverse Halftoning
Deep convolutional neural networks have shown their outstanding performance for many tasks. Hou and Qiu [16] firstly applied DCNN to inverse halftoning, where they used a U-net network as the transformation network for inverse halftoning. In addition, perceptual loss based on pre-trained network was also introduced to construct the objective function for the training, which can overcome the shortcoming of per-pixel loss of producing blurry outputs. To obtain more image details, Xiao et al. [19] proposed a two-stages gradient-guided DCNN for inverse halftoning. In the first stage, two subnetworks are designed to predict the gradient maps from the input halftone image. In the second stage, the gradient maps, along with input halftone image, are fed to the third subnetwork to reconstruct the continuous-tone image. All the three subnetworks are the U-net architecture, and halftone images generated by the Floyd-Steinberg error diffusion algorithm are used to perform experiments. On the basis of the method [19], Yuan et al. [20] put forwarded a gradient-guided residual learning method for inverse halftoning. In the paper, the second stage is a residual network, which helps to restore better local details. Xia et al. [18] proposed deep inverse halftoning via progressively residual learning (PRL), which is another foundational work for inverse halftoning. The PRL includes two modules: content aggregation module that is used to remove halftone noise and reconstruct the initial continuous-tone image, and detail enhancement module that extracts fine structures by learning a residual image. Recently, Son [17] presented a structure-aware DCNN (SADCNN) for inverse halftoning, which can not only remove noisy dot-patterns well on flat areas but also restore details clearly on textured areas. Guo et al. [35] firstly pro-posed a novel inverse halftoning method by using GAN network, which can effectively perform both halftoning and inverse halftoning for dispersed dot halftone images. Due to no paired data of halftone images and their corresponding continuous-tone images, restoring scanned halftone image is more challenging than restoring digital halftone image. Kim T.H et al. [22] proposed a context-aware descreening method for scanned halftone image. The method consists of two main stages, where the intrinsic features of the scene are extracted for reconstructing the low-frequency of the image and removing halftone noise at the first stage, and fine details are synthesized on top of the low-frequency output at the second stage. Gao et al. [21] proposed a novel inverse halftoning method for scanned halftone images. In the method, the first stage is unsupervised training for removing printing artifacts which make the method adapt to real halftone prints, and the second stage is a supervised training manner for the inverse of halftoning by using synthetic training data. Table 1 summarizes the traditional inverse halftoning methods and deep learning-based inverse halftoning methods. As can be seen from the abovementioned methods, we can conclude: (1) Most of the proposed methods are focused on error diffusion halftone images. However, there are more than twenty types of halftone images used in practice, which have different critical halftone characteristics such as dot directivity, dot distribution, pattern periodicity, directional artifacts and so on. [1,5]. Therefore, a general purpose inverse halftoning method which has generalization performance for different types of halftone images is urgently need. (2) Due to the coexistence of details loss and halftone noise, the restored images still suffer from fine details loss and visual artifacts. To solve these problems, we propose a novel inverse halftoning method in which the attention mechanism and multi-stage loss functions are introduced to better work with the proposed network. The proposed method is simple yet effective for different types of halftone images.

Methodology
In this study, we propose a novel inverse halftoning method based on deep learning, which integrates deep CNN, attention module and multi-stage loss functions. Firstly, we will introduce the network architecture. Then, the multi-stage loss functions will be discussed.

Network Architecture
As shown in Figure 1, the architecture of the proposed approach consists of three major components: (1) Feature extraction and halftone noise removing; (2) Image restoring with residual channel attention block (RCAB) module and contextual semantic information aggregation; (3) Multi-stage loss functions learning. The first part begins with a normal convolution. As is evidenced in reference [36], a large convolutional kernel could be replaced by a multi-layer convolution with small kernel size, which can reduce the parameter count and improve the non-linear ability of the network. To keep the size of the output feature maps as the same of the inputs, the stride and the padding have to be set as 1. Since initial convolution is used to extract low-level features, such as edges, corners, lines and colors, they can be used to restore image details in the second part with skip connection. Therefore, the number of filters in the initial convolution layer is set as 32 in order to extract more fine-grained features. Then, three convolution blocks (Conv_Block1, Conv_Block2 and Conv_Block3) are cascaded for further refining features and removing halftone noise, where each Conv_Block includes three sequential basic units: convolution, LeakyRelu and convolution. For each Conv_Block, the filter number of the first convolution is the same as the input channel number of this Conv_Block, and the filter number of the last convolution is twice of the first convolution. The above feature extraction process can be expressed in the following formula: where X I is the input image and X is the extracted feature image.    In the second part, three attention modules and three concatenation modules with identical layout are used to restore continuous-tone images. In every attention module, we sequentially lay 16 RCAB to extract the channel statistic among channels to further enhance the discriminative ability of the network [23]. The details of residual channel attention block are shown in Figure 2. As shown in Figure 2, X b−1 is the input of RCAB block: where ω 1 b and ω 2 b are weight sets of the two convolutional layers in RCAB. δ(.) denote ReLU function. Let F b = [ f 1 , . . . , f c , . . . , f C ] C×H×W as input feature maps, which has C channels and the size of feature map is H × W. The channel-wise statistic of the c-th channel z ∈ R C is determined by Equation (3): . This formula denotes the global pooling for every channel, such channel statistic can express the whole image. The channel-wise global spatial information can be viewed as a channel descriptor by using global average pooling.   We aggregate global information by average pooling. Then in order to capture channel-wise dependencies by introducing a gating mechanism. This gating mechanism can learn nonlinear interactions between channels and a non-mutually-exclusive relationship between channel-wise features. We opt to simple gating mechanism by sigmoid function: where (. ) and (. ) denote ReLU function and the sigmoid gating, respectively. is the weight of a channel-downscaling layer with reduction ratio . Then it is activated by ReLU.
is the weight set of channel-upscaling with ratio . Finally, the channel statistics is used to rescale the input feature map : where is the scaling factor in the c-th feature map. The channel attention enhances the discriminative ability by rescaling the residual component in the RCAB. For the b-th RCAB block, we have: where and are the input and output of residual channel attention block. dnotes channel attention module.
Following each attention module, there is a concatenation module which stacks the output of the attention module and the feature maps from the previous Conv_Block with skip connection. Before concatenating, the output of each attention module is convoluted with convolution in order to have the same channel number with the feature maps from the corresponding previous Conv_Block. The concatenation can provide more contextual We aggregate global information by average pooling. Then in order to capture channelwise dependencies by introducing a gating mechanism. This gating mechanism can learn nonlinear interactions between channels and a non-mutually-exclusive relationship between channel-wise features. We opt to simple gating mechanism by sigmoid function: where δ(.) and f (.) denote ReLU function and the sigmoid gating, respectively. ω D is the weight of a channel-downscaling layer with reduction ratio r. Then it is activated by ReLU. ω U is the weight set of channel-upscaling with ratio r. Finally, the channel statistics s c is used to rescale the input feature map f c : where s c is the scaling factor in the c-th feature map. The channel attention enhances the discriminative ability by rescaling the residual component in the RCAB. For the b-th RCAB block, we have: where X b−1 and X b are the input and output of residual channel attention block. R b dnotes channel attention module. Following each attention module, there is a concatenation module which stacks the output of the attention module and the feature maps from the previous Conv_Block with skip connection. Before concatenating, the output of each attention module is convoluted with convolution in order to have the same channel number with the feature maps from the corresponding previous Conv_Block. The concatenation can provide more contextual semantic information, which helps to restore the fine details. The detailed parameters of the whole network are shown in Table 2. The whole network architecture imitates the design of U-net network structure, but our network doesn't adopt down-sampling and upsampling. Based on the analysis of U-net, we proposed the following special structure designs: (1) the downsampling can lead to the loss of underlying feature, which are lose some important feature information for the restoration of inverse halftoning. Thus, the downsampling is not adopted in the step of feature extracting in our proposed solution; (2) we use skip connection to retain detailed features from shallow layers, which can better restore the image details; (3) the special design of our network is the introduction of attention module, which can focus on informative features and further enhance the discriminative ability of the network. The ablation experiment results detailed in section IV demonstrate the rationality of attention module RCAB. (4) deep supervision with multi-stage losses can accelerate the optimization of network model. Unlike previous methods which compute loss function only using the final restoring image, we propose a multi-stage loss function calculating strategy. As shown in Figure 1, the loss functions of Loss1, Loss2, Loss3 and Loss4 are from images reconstructing at different stages, where Conv_1, Conv_2, Conv_3 and Conv_4 are used to reconstruct the continuous-tone images with three channels at different stage.
Through the analysis of the network architecture, we can conclude: (1) the proposed network is a fully convolution network, which can adapt to the input halftone images with different image size and can be learned in an end-to-end manner. (2) the proposed network is a lightweight network, where the parameter size is 10.87 M. (3) unlike the existing inverse halftoning method based on deep learning, which treat halftone image channelwise features equally, the proposed method can generate different attention in different channel-wise feature. Thus, it is flexible to different types of halftone images and can pay more attention to the informative features at different channels, such as shapes, colors, edges and textures. (4) The proposed network is simple yet effective, because it achieves notable performance improvements compared to previous inverse halftoning methods.

Loss Function
We define multi-stage loss functions that measure the difference of the reconstruction image at different feature reconstruction stages and its original continuous-tone image. In feature reconstruction stage, In order to encourage the pixels of the construction imagê y to exactly match the continuous-tone image y, we hope them to have similar feature representations in different feature reconstruction layers.
As shown in Figure 3, when we reconstruct from early layers, the image content and global spatial structure are preserved. But when we reconstruct the continuous-tone image from higher layers, the color and texture are reconstructed well, so we define multistage loss functions to encourage the reconstruction imageŷ to be perceptually similar to the continuous image at different feature reconstruction stages. What's more, if the output images of early layers are reconstructed well due to constraints of loss function, the reconstruction images of higher layers must be well.

Loss Function
We define multi-stage loss functions that measure the difference of the reconstruction image at different feature reconstruction stages and its original continuous-tone image. In feature reconstruction stage, In order to encourage the pixels of the construction image to exactly match the continuous-tone image y, we hope them to have similar feature representations in different feature reconstruction layers.
As shown in Figure 3, when we reconstruct from early layers, the image content and global spatial structure are preserved. But when we reconstruct the continuous-tone image from higher layers, the color and texture are reconstructed well, so we define multistage loss functions to encourage the reconstruction image to be perceptually similar to the continuous image at different feature reconstruction stages. What's more, if the output images of early layers are reconstructed well due to constraints of loss function, the reconstruction images of higher layers must be well.  Based on above analysis, as shown in Figure 1, we define multi-stage loss for training our network: where denotes the j-th layer of reconstruction stage, and are the output reconstruction image of j-th layer and the ground truth continuous-tone image. For the value of λ, we consider the following factors: In the deep layers, the network reconstructs good semantic information, the higher loss weight can better reconstruct the global information of the image. From deep layers to shallow layers, the loss weight from large to small is conducive to better reconstruction of the global image. By the experiment in Section 4, we set = 0.1, = 0.2, = 0.3, = 0.4. Based on above analysis, as shown in Figure 1, we define multi-stage loss for training our network: where j denotes the j-th layer of reconstruction stage,ŷ j and y are the output reconstruction image of j-th layer and the ground truth continuous-tone image. For the value of λ, we consider the following factors: In the deep layers, the network reconstructs good semantic information, the higher loss weight can better reconstruct the global information of the image. From deep layers to shallow layers, the loss weight from large to small is conducive to better reconstruction of the global image. By the experiment in Section 4, we set

Experiment
In this section, we introduce our experimental settings in Section 4.1, including detailed experimental parameters and dataset. Then we show the generalization performance of the proposed method for different types of halftone images in Section 4.2. To demonstrate the superior performance, we compare our method with the state-of-the art methods in Section 4.3. The ablative study in Section 4.4.

Experiment Settings
For training and testing, we use the PASCAL VOC2012 dataset [27], which are used as continuous-tone color images. The numbers of color images are 13,600, in which we select 10,000 images for training and 3600 images for validation. In addition, some other classical images are also selected as test images, such as Lena, Peppers, Baboon and so on.
Our network is implemented using the PyTorch frame-work and trained on 1080 Ti GPU. In the training, the whole network is trained in an end-to-end manner. We define the initial learning rate 0.0001 and choose Adam algorithm as the optimizer, the learning rate is reduced by Cosine annealing [37]. We select the number of iterations is 100 times and the batch size is 2 by experiments.

Evaluation for Different Types of Halftone Images
In order to illustrate the generalization performance of the network for different types of halftone image restoration. We choose different halftone methods to produce different types of halftone images. The continuous-tone images are converted into halftone images by Bayer's CD Ordered Dithering (BCD), Bayer's DD Ordered Dithering (BDD), Knuth DD Dot Diffusion (KDD), Direct Binary Search (DBS), Floyd Steinberg DD error diffusion (FSDD), Ulichney CD Ordered Dithering (UCD) respectively, the code is provided by Guo et al. [5]. Each type of halftone image is divided into training image and testing image.
For performance evaluation, the peak signal-to-noise ratio (PSNR) and the Structural Similarity (SSIM) are used to measure the difference of the output image and the original image. We train the network with different types of halftone images, and then twenty classical images are used to test the network. Figure 4 shows the classical images [38], numbered as 1-20.
The restoring continuous-tone images and their corresponding halftone images for Lena, Man and Peppers images are illustrated in Figure 5. From Figure 5, we can see the quality of the restoring images is satisfactory compared with the original images, which means the proposed method has generalization performance for different types of halftone images. In addition, the halftone images generated by DBS and FSDD have a great superiority over the other halftone images. Table 3

Experiment
In this section, we introduce our experimental settings in Section 4.1, including detailed experimental parameters and dataset. Then we show the generalization performance of the proposed method for different types of halftone images in Section 4.2. To demonstrate the superior performance, we compare our method with the state-of-the art methods in Section 4.3. The ablative study in Section 4.4.

Experiment Settings
For training and testing, we use the PASCAL VOC2012 dataset [27], which are used as continuous-tone color images. The numbers of color images are 13,600, in which we select 10000 images for training and 3600 images for validation. In addition, some other classical images are also selected as test images, such as Lena, Peppers, Baboon and so on.
Our network is implemented using the PyTorch frame-work and trained on 1080 Ti GPU. In the training, the whole network is trained in an end-to-end manner. We define the initial learning rate 0.0001 and choose Adam algorithm as the optimizer, the learning rate is reduced by Cosine annealing [37]. We select the number of iterations is 100 times and the batch size is 2 by experiments.

Comparison with State-of-the-Art Methods
To verity the advantage of the proposed method, we compare our proposed method with state-of-the-art inverse halftoning methods, where the halftone images are generated using Floyd Steinberg DD error diffusion method. Recently, three representative methods, named as SADCNN [17], GRL [20], PRL [18], achieved state-of-the-art results for inverse halftoning. Thus they are selected for comparison with our method. Due to the fact the GPL [20] code is not open, so we wrote it according to the paper. Maybe our experimental environments are different, but the average test result is not as high as in the original paper. Besides inverse halftoning methods based on deep learning, a typical traditional method based on dictionary learning, named as LLD [14] is also employed as a comparison method. The PSNR value and SSIM value are used to evaluate the performance of different inverse halftoning methods. We first select all test images in Figure 4 to compare different methods. Figure 6 shows the comparison with state-of-the-art methods. As observed in Figure.6, our proposed method has a qualitative and quantitative superiority over the state-of-the-art methods. What's more, recently published paper in reference [39] proposed an inverse halftoning method via stationary wavelet domain. In this paper, six classic images are selected as the testing images, which are Koala, Cactus, Bear, Barbra, Shop and Pepper, corresponding to images in the 5th, 4th, 14th, 12th, 10th and 9th of Figure 4. The average PSNR of the five test images is 29.95 in reference [37] and 30.68 in our method. The results once again show our method has better performance than recently published solutions.  [14] (c) SADCNN [17] (d) GRL [20] (e) PRL [18] (f) Ours To further show the advantages of our proposed method compared with the previous methods, we demonstrate some restored image details, lines and texture in Figures 7-9. In Figure 7, the results of SADCNN [17] lack of texture details, but our approach describes the image details more accurately. To further show the advantages of our proposed method compared with the previous methods, we demonstrate some restored image details, lines and texture in Figures 7-9. In Figure 7, the results of SADCNN [17] lack of texture details, but our approach describes the image details more accurately.  In Figure 8, the restoration of GRL [20] cannot restore the lines of the image very well, but in our approach more lines and sharpness of the images are restored better. For the texture of restored Butterfly image as shown in Figure 9, there are some noise in the restored image by the PRL [18] method. However, the restored image by our method is smoother and more natural in texture restoration.
(a) GRL [20] (b) Ours (a) PRL [18] (b) Ours Besides, our approach could be applied to other datasets successfully. Table 4 gives the quantitative evaluation results, which demonstrates that methods based on deep learning are superior to traditional inverse halftoning methods. Moreover, our proposed method obtains the highest average PSNR performance compared with the other four methods and the SSIM value is similar to PRL [18]. Table 5 gives the results of comparison on PSNR/SSIM in 3600 test datasets obtained by different methods. From Table 5, we can see our proposed method achieves the best performance both in PSNR and SSIM compared with other methods.  In Figure 8, the restoration of GRL [20] cannot restore the lines of the image very well, but in our approach more lines and sharpness of the images are restored better. For the texture of restored Butterfly image as shown in Figure 9, there are some noise in the restored image by the PRL [18] method. However, the restored image by our method is smoother and more natural in texture restoration.
(a) GRL [20] (b) Ours (a) PRL [18] (b) Ours Besides, our approach could be applied to other datasets successfully. Table 4 gives the quantitative evaluation results, which demonstrates that methods based on deep learning are superior to traditional inverse halftoning methods. Moreover, our proposed method obtains the highest average PSNR performance compared with the other four methods and the SSIM value is similar to PRL [18]. Table 5 gives the results of comparison on PSNR/SSIM in 3600 test datasets obtained by different methods. From Table 5, we can see our proposed method achieves the best performance both in PSNR and SSIM compared with other methods.  In Figure 8, the restoration of GRL [20] cannot restore the lines of the image very well, but in our approach more lines and sharpness of the images are restored better. For the texture of restored Butterfly image as shown in Figure 9, there are some noise in the restored image by the PRL [18] method. However, the restored image by our method is smoother and more natural in texture restoration.
(a) GRL [20] (b) Ours (a) PRL [18] (b) Ours Besides, our approach could be applied to other datasets successfully. Table 4 gives the quantitative evaluation results, which demonstrates that methods based on deep learning are superior to traditional inverse halftoning methods. Moreover, our proposed method obtains the highest average PSNR performance compared with the other four methods and the SSIM value is similar to PRL [18]. Table 5 gives the results of comparison on PSNR/SSIM in 3600 test datasets obtained by different methods. From Table 5, we can see our proposed method achieves the best performance both in PSNR and SSIM compared with other methods.  In Figure 8, the restoration of GRL [20] cannot restore the lines of the image very well, but in our approach more lines and sharpness of the images are restored better. For the texture of restored Butterfly image as shown in Figure 9, there are some noise in the restored image by the PRL [18] method. However, the restored image by our method is smoother and more natural in texture restoration.
Besides, our approach could be applied to other datasets successfully. Table 4 gives the quantitative evaluation results, which demonstrates that methods based on deep learning are superior to traditional inverse halftoning methods. Moreover, our proposed method obtains the highest average PSNR performance compared with the other four methods and the SSIM value is similar to PRL [18]. Table 5 gives the results of comparison on PSNR/SSIM in 3600 test datasets obtained by different methods. From Table 5, we can see our proposed method achieves the best performance both in PSNR and SSIM compared with other methods. In the model training, the model loss and PSNR values were recorded in Figure 10, where PSNR is the average of 3600 test images.

Ablative Study
To evaluate the effect of the attention module and the parameter selection of loss function, we conducted an ablation experiment. All the evaluations are based on classical testing images in Figure 4. Table 6 gives the effect of attention module. From Table 6, we can conclude that the attention module can improve the performance of the restored images. In addition, the

Ablative Study
To evaluate the effect of the attention module and the parameter selection of loss function, we conducted an ablation experiment. All the evaluations are based on classical testing images in Figure 4. Table 6 gives the effect of attention module. From Table 6, we can conclude that the attention module can improve the performance of the restored images. In addition, the restored images achieve highest PSNR and SSIM when RCAB is 16. Thus, the number of RCAB is selected 16 in our networks. To obtain the suitable weight parameters for loss function, we do an experimental study as shown in Table 7. From Table 7, we can see the optimum values of λ are λ 1 = 0.1, λ 2 = 0.2, λ 3 = 0.3, λ 4 = 0.4.

Discussion
In this article, we report a simple and effective network structure. Firstly, we use a stacked convolution layer to extract features effectively. Then we use a residual channel attention module to extract more effective features and guide the network to remove noise dot-patterns and restore image details. We realize the fusion of low-level features and high-level features through skip connection. Finally, the network is optimized by the multi-scale supervision loss, and the state-of-the-art experimental results are achieved.
The following points summarize our experimental results. (1) From the network structure design, the network is a lightweight network, simple and effective, easy to implement. (2) From the results of quantitative indicators PSNR and SSIM, this method achieves the state-of-the-art results compared with some previous methods. (3) From the qualitative analysis, the network has good restoration results, especially the restoration of the details, lines and texture, which is better than the previous methods. (4) From the point of view of the generality of the model, the previous method can only restore the single halftone image, our method can realize the restoration of many types of halftone images, and has generality.

Conclusions
We have proposed a novel deep convolutional neural network by fusing the attention mechanism for inverse halftoning. The proposed network first extracts main features and removes noisy-dot patterns by a convolution layer, and then reconstructs image by fusing the feature information and residual channel attention block. Such an attention block can help the network to focus on informative features. Thus, the restored image can have more fine details. Finally, we test on the classic datasets, the average PSNR is 31.70, and the average SSIM is 0.9. The results of experiment show that our approach outperforms the state-of-the-art methods both in visual performance and in quantitative evaluation.
In the future research, the restoration of scanned halftone image is an urgent problem to be solved, where the most challenging problem is that there is no paired data. How to construct the inverse halftone method under the condition of unpaired data is an im-portant research direction. In addition, semi-supervised or unsupervised learning com-bined with our methods will be further studied in the future work.