Dual Image Deblurring Using Deep Image Prior

: Blind image deblurring, one of the main problems in image restoration, is a challenging, ill-posed problem. Hence, it is important to design a prior to solve it. Recently, deep image prior (DIP) has shown that convolutional neural networks (CNNs) can be a powerful prior for a single natural image. Previous DIP-based deblurring methods exploited CNNs as a prior when solving the blind deburring problem and performed remarkably well. However, these methods do not completely utilize the given multiple blurry images, and have limitations of performance for severely blurred images. This is because their architectures are strictly designed to utilize a single image. In this paper, we propose a method called DualDeblur, which uses dual blurry images to generate a single sharp image. DualDeblur jointly utilizes the complementary information of multiple blurry images to capture image statistics for a single sharp image. Additionally, we propose an adaptive L 2 _SSIM loss that enhances both pixel accuracy and structural properties. Extensive experiments show the superior performance of our method to previous methods in both qualitative and quantitative evaluations.


Introduction
Motion blur is a common artifact caused by the relative motion between the camera and the scene during exposure. In practice, when we obtain images from cameras equipped in the mobile embedded systems, the images are often blurred because they are usually captured with hand-held cameras. The unwanted blur artifacts not only degrade the image quality but also result in the loss of important information in the image. Consequently, blurry images deteriorate the performance of various computer vision tasks, such as image classification [1][2][3], object detection [4][5][6], and segmentation [7][8][9]. Accordingly, numerous image deblurring studies have been actively proposed to remove blur artifacts and restore sharp images.
Given a blurry image y, the blur process is typically modeled as a convolution operation of a latent sharp image x and a blur kernel k as follows: where ⊗ denotes the convolution operator and n is the noise. The goal of blind image deblurring is to estimate the sharp image and the blur kernel simultaneously when the blur kernel is unknown. This is a classical ill-posed problem because x and k can have multiple solutions. Owing to the ill-posed nature of the problem, conventional deblurring studies constrain the solution space by leveraging various priors and regularizers.
Recently, extensive studies [10][11][12][13][14] based on deep learning (DL) have been performed on image deblurring. Most of them employ deep convolutional neural networks (CNNs) and trained them on a large-scale dataset of blurry/sharp image pairs [15]. CNNs implicitly learn more general priors by capturing the natural image statistics from a large number of blurry/sharp image pairs. DL-based methods have provided superior results. However, collecting such a large dataset is difficult and expensive [16]. In contrast to DL-based datadriven approaches, Ulyanov et al. [17] proposed a deep image prior (DIP), which is based on self-supervised learning, and showed that a CNN can capture the low-level statistics of a single natural image. Their method performed remarkably well in low-level vision tasks, such as denoising, super-resolution, and inpainting. Inspired by this, Ren et al. [18] suggested the SelfDeblur framework to solve the single image blind deblurring problem. Given a single blurry image, the SelfDeblur estimates the latent sharp image and the blur kernel simultaneously by jointly optimizing the image generator network and kernel estimator network. However, the SelfDeblur cannot perform deblurring in the case of multiple blurry images. This is because the architecture of the SelfDeblur is strictly designed to leverage only the internal statistics of a single blurry image. Although using multiple observations for image deblurring is beneficial [19,20], most of self-supervised learning approaches do not completely leverage the internal information of given multiple images.
We propose a method called DualDeblur that aims to restore a single sharp image from two given blurry observations. In many practical scenarios, we can capture multiple images of the same physical scene. At this time, we obtain multiple blurry images under various conditions through multiple captures. For example, let us consider two blurry images shown in Figure 1b,c. They share the same latent sharp image as shown in Figure 1a. Thus, the sharp images restored from Figure 1b,c should be the same. Hence, we can further constrain the solution space. Specifically, our DualDeblur comprises a single image generator and two blur kernel estimators. The image generator aims to estimate a sharp image, which is latent in two blurry images. Each blur kernel estimator estimates the blur kernel for each blurry image. Thereafter, we jointly optimize the image generator and blur kernel estimators by comparing the reblurred images and given blurry images. Here, the reblurred images are generated by the blur process of the predicted image and the estimated blur kernels. Through this joint optimization process, our image generator learns a strong prior for a single sharp image by using the complementary information of multiple images.
(a) Ground truth (b) Blurry image 1 (c) Blurry image 2 (d) Xu&Jia {1} [21] (e) Xu&Jia {2} [21] (f) SelfDeblur {1} [18] (g) SelfDeblur {2} [18] (h) Ours {1,2}  In addition, we propose an adaptive L 2 _SSIM loss to enhance both pixel-wise accuracy and structure details. Most DIP-based methods use the L 2 loss to minimize the difference in pixel values between the target image and restored image. In our task, simply using the L 2 loss may deteriorate the restoration performance because the target image is blurry. Thus, the L 2 loss is insufficient to restore the detailed textures. Hence, many restoration methods involve replacing the L 2 loss with structural properties loss, such as the SSIM loss [9], MS-SSIM loss [22], and FSIM loss [23]. However, using only the SSIM loss has several limitations. SSIM does not consider pixel-wise accuracy. Therefore, comparing corrupted structures may lead to unexpected resulting images. To tackle this, our adaptive L 2 _SSIM loss adjusts the weight for each training step through a weighted sum that considers the characteristics of L 2 and SSIM. At the beginning of training, most of the weight is focused on L 2 , which is decreased exponentially, according to the iterations. Hence, pixelwise accuracy is ensured by focusing on L 2 in the early stages of training. Increasing the pixel-wise accuracy at an early stage of training can prevent unexpected structures in the resulting images. In the remaining stages of training, we exponentially increase the weight of the SSIM loss to preserve the structural properties. Through this process, our reconstruction loss ensures both pixel-wise accuracy and structural properties. Figure 1 shows the effectiveness of our method. Generally, large blurs often occur when the images are taken from cameras with fast movement in the night environments (see Figure 1b,c). In this case, previous classical methods often fail to restore the sharp images, as shown in Figure 1d,e. This is because the priors utilized in the methods are subjective and cannot accurately capture the intrinsic distribution of natural images and blur kernels [24]. As shown in Figure 1f,g, SelfDeblur [18] also fails to estimate the kernel for severely blurred images and does not appropriately deblur images. However, the proposed DualDeblur successfully estimates two blur kernels using two severely blurred images and generates a superior resulting image with many textures. Our experiments show that DualDeblur performs better than other comparative methods, both quantitatively and qualitatively.
The following are the main contributions of this study: • We propose a DIP-based deblurring method called DualDeblur using two blurry images of the same scene. Multiple images are used to jointly optimize complementary information. • We propose an adaptive L 2 _SSIM loss that adjusts the weights of both L 2 and SSIM for each optimization step. From this, we ensure both pixel-wise accuracy and structural properties in the deblurred image. • The experimental results show that our method is quantitatively and qualitatively superior to previous methods.

Related Works
In this section, we briefly introduce the existing image deblurring methods based on optimization and DL [25].

Optimization-Based Image Deblurring
Image deblurring, one of the classical inverse problems, aims to restore a sharp latent image from a given blurry image. Owing to the ill-posed nature of the deblurring problem, most traditional methods have been proposed to constrain the solution space by using various priors or regularizers, such as TV regularizations [26,27], gradient priors [21], sparsity priors [28], gradient sparsity priors [29], Gaussian scale mixture priors [30], hyper-Laplacian priors [31], 1 / 2 -norms [32], variational Bayes approximations [33,34], 0 -norms [35,36], patch-based statistical priors [37,38], adaptive sparse priors [19], and dark channel priors [39]. By taking advantage of those priors, the traditional methods jointly estimated the sharp image and blur kernel from the blurry image. However, most of these methods heavily rely on the accurate selection of regularizers or priors. Furthermore, when the blur kernel is large and complex, their methods often fail to restore the sharp image.

DL-Based Image Deblurring
Recently, DL [25]-based methods were widely developed to solve the image deblurring problem. Early DL-based deblurring methods [40,41] focused only on estimating blur kernels using DL. Sun et al. [40] proposed to predict the probabilistic distribution of motion blur at the patch level, using a CNN. Chakrabarti et al. [41] presented a CNN to predict the complex Fourier coefficients of a deconvolution filter to be applied to the input patch for restoration. Unlike traditional approaches of using CNNs as a kernel estimation process, Nah et al. [10] proposed to directly predict the deblurred output without an additional kernel estimation process by using multi-scale CNNs. Motivated by the multi-scale approach, Tao et al. [12] proposed to reduce the memory size using a long short-term memory (LSTM)-based scale-recurrent network. Zhang et al. [14] proposed a multi-level CNN that uses a multi-patch hierarchy as input to exploit a multi-patch localized-to-coarse approach. Ulyanov et al. [17] suggested DIP, showing that CNNs can work satisfactorily as priors for a single image. However, there is a limitation to capturing the characteristics of the blur kernel, because the DIP network consists of CNNs that contain only image statistics [18]. To tackle this, Ren et al. [18] suggested the SelfDeblur to solve the blind deblurring problem. SelfDeblur [18] adopted a CNN to capture image statistics. To overcome the aforementioned drawback of DIP, they employed a fully connected network (FCN) to model the prior of the blur kernel. Although SelfDeblur [18] effectively solves the blind deblurring problem, its structure can only handle a single image and cannot appropriately utilize multiple images. In contrast to SelfDelbur, our DualDeblur is designed with a structure that can utilize multiple images that share a single sharp image.

Proposed Method
In this section, we describe the blur process for two blurry images and the proposed DualDeblur framework, using two blurry images. Additionally, we introduce an adaptive L 2 _SSIM loss that considers both pixel-wise accuracy and perceptual properties. Subsequently, we summarize the optimization process of the proposed method.

DualDeblur
Given two blurry observations y 1 and y 2 , the blur process can be formulated as follows: where x denotes a latent sharp image, and k 1 and k 2 represent two blur kernels corresponding to each blurry observation, respectively. Our DualDeblur predicts a single sharp image x using two blurry images, y 1 and y 2 . As depicted in Figure 2, DualDeblur consists of an image generator f θ x (·) and blur kernel estimators f θ k1 (·) and f θ k2 (·). Table 1 presents the detailed architecture of our image generator f θ x (·). The image generator f θ x (·) is learned as a networkx = f θ x (z x ) mapping the uniform distribution z x to an imagex. Table 2 shows our kernel estimators f θ k1 (·) and f θ k2 (·). The blur kernel estimator f θ k1 (·) is learned as a networkk 1 = f θ k1 (z k1 ) mapping the uniform distribution 1-D vector z k1 to a 2-D reshaped blur kernelk 1 . Similarly, the blur kernel estimator f θ k2 (·) is learned as a networkk 2 = f θ k2 (z k2 ) mapping the uniform distribution 1-D vector z k2 to a 2-D reshaped blur kernelk 2 . Networks f θ k1 (·) and f θ k2 (·) are dual architectures designed for two blurry images.k 1 andk 2 are the estimated blur kernels corresponding to y 1 and y 2 , respectively.
DualDeblur jointly optimizes f θ x (·), f θ k1 (·), and f θ k2 (·) by comparing y 1 andx 1 ⊗k 1 , as well as y 2 andx 2 ⊗k 2 through the proposed loss function, as explained in the following.  Table 1. Architecture f θ x . We adopt Unet [7] with a skip connection as the architecture of f θ x . Conv2d represents a 2D convolution operation, "lReLU" denotes a leaky ReLU, and ⊕ denotes the channel-wise concatenation. Kernel (m, n × n, p) represents the number of filters m, filter sizes n × n, and padding p. We implement downsampling with stride 2 and upsampling with bilinear interpolation. C represents image channels, and W x × H x the image size. Output layer Conv2d, Sigmoid C, 1 × 1, 0 d 5 →x Table 2. The architecture f θ ki (·). We adopt a FCN as each blur kernel estimator network f θ ki (·). W ki × H ki represents blur kernel sizes. f θ ki (·) takes a 200-dimensional input and has 1000 nodes in the hidden layer and W ki × H ki nodes in the last layer. The 1D output is reshaped to a 2D blur kernel size.
Input: z ki (200) of uniform distributions, blur kernel size of W ki × H ki Output: blur kernel k i (W ki × H ki )

FCN Operation
Layer 1 Linear (200, 1000), ReLU In this sub-section, we propose an adaptive L 2 _SSIM loss to enhance both pixel-wise accuracy and perceptual properties. We adjust the weights of each training step with a weighted sum that considers the properties of L 2 and SSIM. First, we introduce the L 2 and SSIM losses.
When solving the restoration problem, the L 2 loss is usually used and is formulated as follows: where i denotes the i-th observation, and L 2 increases the pixel-wise accuracy by minimizing the pixel values between the target image and the restored image. However, in the case of L 2 , the output image tends to be blurry and lacks high-frequency textures [42,43]. In our case, using only L 2 is even worse because both y and k ⊗ x are blurry images. To overcome the limitation, the SSIM loss, which preserves perceptual features is also used. SSIM captures the luminance, contrast, and structure of an image [9]. Here, L SSI M is formulated as follows: However, because the SSIM loss does not consider pixel-wise accuracy, collapsed structures in the blurry observations may lead to an unexpected structure in the resulting image. Therefore, we propose an adaptive L 2 _SSIM loss to preserve the strengths of each loss and compensate for the weaknesses of each loss. The proposed adaptive L 2 _SSIM loss (L L 2 _SSI M ) is formulated as follows: where ω(t) denotes a weighting function that adjusts the weights of the L 2 and SSIM losses according to each tth step, and α represents a parameter that adjusts the scale of the L 2 loss.
γ denotes a parameter that adjusts the range of the steps affected by the L 2 loss. At the beginning of the step, the weights of the L 2 loss account for most of the total weights to focus on pixel-wise accuracy so that it does not result in unexpected structures. Hence, we reduce the weights of the L 2 loss and increase those of the L SSI M loss to preserve the structure content of the image. As a result, our reconstruction loss not only increases the pixel-wise accuracy, but also preserves the structural details of the image. The effectiveness of the proposed reconstruction loss was demonstrated in an ablation study in Section 4.5.
The final optimization process of DualDeblur is summarized in Algorithm 1. Here, T denotes the total training iteration, and θ k1 , θ k2 and θ x represent network parameters corresponding to f θ k1 (·), f θ k2 (·) and f θ x (·), respectively. DualDeblur estimates a restored image and two blur kernels. Thereafter, it generates two reblurred images using a convolution operation and compares them with y 1 and y 2 , respectively, through the L L 2 _SSI M loss in Equation (5). By optimizing all the networks simultaneously, the image generator f θ x (·) jointly utilizes the complementary information of the two blurry images. Finally, we obtain the restored image and blur kernels from T iterations.

Dataset
To evaluate the performance of our method, we used two image deblurring benchmark datasets: the Levin test set [33] and the Lai test set [45]. The proposed method solves the deblurring problem by using two observations. In this case, there are two possible scenarios. First, two observations are degraded by a similar degree of blur artifacts (soft pairs). Second, the degrees of blur artifacts are very different from each other (hard pairs). To simulate these cases, we divided each test set into soft and hard pairs and used them for evaluation. The two test sets are discussed in the following.

1.
Levin test set [33]: In their seminal work, Levin et al. [33] provided 8 blur kernels with size of k × k, where k = 13, 15, 17, 19, 21, 23, 27 and 4 sharp images, resulting in 32 blurry gray-scale images with size of 255 × 255. To evaluate our method, we divided the soft and hard pairs on the basis of difference in blur kernel size. If the difference was less than 5 pixels, we classified such an image pair as a soft pair, and vice versa as a hard pair. Following this pipeline, we randomly selected 7 soft pairs and 7 hard pairs, totaling to 14 blurry pairs per image. In short, we prepared a total of 56 pairs of blurry images for evaluation. The composition of the Levin test set [33] is described in detail in Table 3. Specifically, the soft pairs comprised [13,15], [15,17], [17,19], [19,21], [21, 23a], [21, 23a], and [23a, 23b]. Here, each number represents the blur kernel size of k. For example, [11,13] means that the blur kernel sizes 13 × 13 and 15 × 15 are paired. Because the Levin test set contains two blur kernels with a size of 23 × 23, we denote each as 23a and 23b. The hard pairs contained [13,27], [15,27], [17,27], [19,27], [21,27] Table 3, there are 25 sharp images and 5 blur kernel pairs; a total of 125 pairs of blur images are used for evaluation.

Implementation Details
We implemented our DualDeblur using Pytorch [46]. The networks were optimized using Adam [44] with a learning rate of 1 × 10 −2 , β 1 = 0.9, and β 2 = 0.999. In our experiments, the total number of iterations was 5000, and the learning rate was decayed by multiplying by 0.5 for every 2000, 3000, and 4000 iterations. We empirically set values of α and γ in Equation (5) as α = 10 and γ = 100. Following [17,18], we sampled the initial z x , z k1 and z k2 from the uniform distribution with a fixed random seed 0. Notably, all the experiments of our model were conducted using a single NVIDIA TITAN-RTX GPU.

Comparison on the Levin Test Set
For the Levin test set [33], we compared our DualDeblur with the existing blind deconvolution methods (i.e., Krishnan et al. [32], Levin et al. [33], Cho&Lee [30], Xu&Jia [21], Sun et al. [37], Zuo et al. [29], and Pan-DCP [39]), and a DIP-based deblurring method (i.e., SelfDeblur [18]). Ref. [34] was used as the deconvolution to generate the final results of the previous methods. For quantitative comparison, we calculated the PSNR and SSIM [9] metrics using the codes provided by [18]. Moreover, we reported FSIM [23] and LPIPS [43] distance to evaluate the perceptual similarity. We also compared the error ratio [34], which was formulated by the sum of squared differences between deconvolution with the estimated kernels and deconvolution with the ground truth kernels.
We computed the average PSNR, SSIM, error ratio, FSIM and LPIPS on the Levin test set for various methods (see Table 4). For a fair comparison, we reported the results for the soft and hard pairs that contained each kernel.
With the advantage of using multiple images, the results of our method were significantly superior to those of the previous methods in terms of all the metrics. Specifically, our results showed that the PSNR was 8.00 higher than the second-highest SelfDeblur [18], that the SSIM was 0.0542 higher than the second-highest Zuo et al. [29], and that the FSIM was 0.0378 higher than the second-highest Sun et al. [37]. Our method also showed superior performance at the LPIPS distance compared to the other methods. Note that our method performed remarkably well regardless of the difference in blur kernel size between the two given images. Our experimental results show that average results of the hard pairs are slightly better than those of the soft pairs. We believe that this is because the complementary information between the two images is important for deblurring, and the hard pairs often include more complementary information than the soft pairs. In Figure 3, we compare the previous methods with the soft and hard pairs of our method. The results of the previous methods are the results for input 1 in Figure 3. In Figure 3, ours {1,2} is the soft pair result of input 1 and input 2, and ours {1,3} is the hard pair result of input 1 and input 3. Our method outperforms other methods in restoring sharp edges and fine details in both soft and hard pairs. The blur kernel estimated using the DualDeblur method is considerably closer to the ground truth.
As shown in Table 5, we measured the inference time and the number of model parameters of our method and SelfDeblur [18]. We measured the average inference time for a single image using the Levin test set [33]. The inference time of our model and the SelfDeblur [18] were measured on a PC with an NVIDIA TITAN-RTX GPU, while other methods were measured a PC with 3.30 GHz Intel(R) Xeon(R) CPU as reported in [18]. Our model has a longer inference time and more parameters than SelfDeblur [18]. This is because our model optimizes three networks, whereas SelfDeblur [18] optimizes two networks.  [33]. * indicates that the method uses the non-blind deconvolution method of [34] to produce the final result. The best results are highlighted. Results of the blur kernel "Avg." means the averagePSNR, SSIM, error ratio, FSIM and LPIPS results for all blur kernels.  Table 5. Comparison of average inference time on Levin test set [33] and the number of model parameters. * indicates that the method uses the non-blind deconvolution method of [34] to produce the final result.

Method Time (s) Parameters (M)
Zuo et al. * [29] 10.998 -Pan-DCP * [39] 295.23 -SelfDeblur [18] 368.  Figure 3. Qualitative comparisons on the Levin test set [33]. The input image for each method is denoted as {} (i.e., ours {1,2} indicates our resulting image when the input images are input 1 and input 2). method (i.e., SelfDeblur [2]). [34] was used as the deconvolution to generate the final 245 results of the previous methods. For quantitative comparison, we calculated the PSNR 246 and SSIM [11] metrics using the codes provided by [2]. Moreover, we reported FSIM 247 [23] and LPIPS [43] distance to evaluate the perceptual similarity. We also compared 248 the error ratio [34], which was formulated by the sum of squared differences between 249 deconvolution with the estimated kernels and deconvolution with the ground truth 250 Figure 3. Qualitative comparisons on the Levin test set [33]. * indicates that the method uses the non-blind deconvolution method of [34] to produce the final result. The input image for each method is denoted as {} (i.e., ours {1,2} indicates our resulting image when the input images are input 1 and input 2).

Comparison on Lai Test Set
For the Lai test set [45], our method was compared with those of Cho and Lee [30], Xu and Jia [21], Xu et al. [35], Michael et al. [38], Perrone et al. [27], Pan-DCP [39], and SelfDeblur [18]. In previous methods, after blur kernel estimation, ref. [47] was applied to the Saturated category as deconvolution, and ref. [31] to the other categories. In Table 6, Figure 3. Qualitative comparisons on the Levin test set [33]. * indicates that the method uses the non-blind deconvolution method of [34] to produce the final result. The input image for each method is denoted as {} (i.e., ours {1,2} indicates our resulting image when the input images are input 1 and input 2).

Comparison on Lai Test Set
For the Lai test set [45], our method was compared with those of Cho and Lee [30], Xu and Jia [21], Xu et al. [35], Michael et al. [38], Perrone et al. [27], Pan-DCP [39], and SelfDeblur [18]. In previous methods, after blur kernel estimation, ref. [47] was applied to the Saturated category as deconvolution, and ref. [31] to the other categories. In Table 6, our DualDeblur results achieved better quantitative metrics, compared with the previous methods. Our average results for the Lai test set [45] were 7.72 higher for PSNR and 0.2136 higher for SSIM compared with the 2nd highest SelfDelbur [18]. The results of LPIPS showed that our method can restore more perceptually high-quality images, compared to other methods. Additionally, our method performed superior for all blur kernels. This shows that the proposed DualDeblur method performed excellently for large and diverse images. Both our soft and hard pairs outperformed the results of the previous methods. Table 6. Quantitative comparisons on the Lai test set [45]. The methods marked with * adopt [31,47] as non-blind deconvolution for the final result after kernel estimation. Ref. [47] is adopted as a non-blind deconvolution method in the Saturated category, and ref. [31] for the other categories. The best results are highlighted. Results of the blur kernel "Avg." means the averagePSNR, SSIM, FSIM and LPIPS results for all blur kernels. In Figures 4 and 5, through a qualitative comparison, it can be seen that our DualDeblur is visually superior to the previous methods. The kernel estimated by our DualDeblur is highly accurate compared with the other methods. Although other methods suffer from blur or ringing artifacts, our results are perceivably superior with rich texture (see Figure 4 details). Additionally, Figure 5 shows the high-quality details of our result; clearly, only the result of our method accurately reconstructs the stripes of the tie.
In Figure 6, our method shows superior results when using two blurry images that cannot be deblurred by the previous methods. Conversely, our method performs deblurring by jointly using two blurred images that are severely damaged and contain little information. In the 3rd line of Figure 6, SelfDeblur [18] fails to estimate the blur kernels in both input 1 and input 2, whereas our method is superior in estimating the blur kernels and the final image.

Ablation Study
To investigate the effectiveness of the proposed dual architecture and adaptive L 2 _SSIM loss, we conducted ablation studies. After equalizing the loss, we compared the dual architecture (called DualDeblur-A) with [18] to investigate the effect of the dual architecture. Furthermore, we demonstrated the effectiveness of our adaptive L 2 _SSIM loss by comparing models optimized using L L 2 _SSI M and only L 2 or L SSI M . Models DualDeblur-B and DualDeblur-C have the same architecture as DualDeblur-A; however, DualDeblur-B uses only L 2 in Equation (3) and DualDeblur-C uses only L SSI M in Equation (4) for optimization. Finally, we define DualDeblur, using the proposed L L 2 _SSI M in Equation (5). The quantitative and qualitative comparisons are shown in Table 7 and Figure 7, respectively.  DualDeblur method is considerably closer to the ground truth.

271
As shown in Table 5, we measured the inference time and the number of model 272 parameters of our method and SelfDeblur [2]. We measured the average inference time 273 for a single image using the Levin test set [33]. The inference time of our model and the  [45]. * indicates that the method uses the non-blind deconvolution method of [34] to produce the final result. The input image for each method is denoted as {} (i.e., ours {1,2} indicates our resulting image when the input images are input 1 and input 2).     [45]. * indicates that the method uses the non-blind deconvolution method of [34]  Ground truth (PSNR, SSIM) Figure 6. Qualitative comparisons on the Lai test set [45]. The input image for each method is denoted as {} (i.e., ours {1,2} indicates our resulting image when the input images are input 1 and input 2).

282
In previous methods, after blur kernel estimation, [47] was applied to the Saturated 283 category as deconvolution, and [31] to the other categories. In Table 6 SSIM compared with the 2 nd highest SelfDelbur [2]. The results of LPIPS showed that our 287 Figure 6. Qualitative comparisons on the Lai test set [45]. * indicates that the method uses the non-blind deconvolution method of [34] Figure 7. Ablation study. Qualitative comparisons on the Levin test set [33]. The input image for each method is denoted as {} (i.e., ours {1,2} indicates our resulting image when the input images are input 1 and input 2). Figure 7. Ablation study. Qualitative comparisons on the Levin test set [33]. The input image for each method is denoted as {} (i.e., ours {1,2} indicates our resulting image when the input images are input 1 and input 2).

Effects of Dual Architecture
Unlike SelfDeblur [18], which performs deblurring with a single observation, our method leverages multiple observations via a dual architecture. In our experiments, DualDeblur-A using a dual architecture significantly improved the deblurring performance, compared to SelfDeblur (see (a) and (b) in Table 7). The PSNR and SSIM results of DualDeblur-A increased by 2.68 and 0.0098, respectively, compared to those of SelfDeblur. For FSIM and LPIPS, the results of DualDeblur-A are also better than those of SelfDeblur by 0.738 and 0.0334, respectively. This indicates that using multiple images is more helpful for deblurring than using a single image. This also shows that the proposed method is effective in handling multiple images during the deblurring procedure. The results of DualDeblur-A and DualDeblur-B (see Table 7) show that the performance of DualDeblur-B without TV regularization is similar to that of DualDeblur-A. These results show that the dual architecture works well without an additional regularizer.

Effects of Adaptive L 2 _SSIM Loss
The proposed adaptive L 2 _SSIM loss, formulated as the weighted sum of L 2 and L SSI M , focuses on restoring the intensity values per pixel first and then gradually restoring the structure later. By using the proposed adaptive L 2 _SSIM loss, we aim to exploit the advantages of L 2 and L SSI M loss functions and complement their limitations. To demonstrate the effectiveness of the adaptive L 2 _SSIM loss, we compare the performances of DualDeblur optimized with various loss functions (1) DualDeblur-B using the L 2 loss, (2) DualDeblur-C using the L SSI M loss, and (3) DualDeblur using the L L 2 _SSI M loss. When optimizing our model using only the L 2 loss, the quantitative results are the worst in PSNR and SSIM (see Table 7). As shown in Figure 7, the results of our method using only the L 2 loss are overly smooth and fail to restore the details. To overcome this, we employed the structural loss (L SSI M ) in our method to enhance the perceptual quality and structural details in local regions [48]. Figure 7 also shows that using L SSI M helps restore details of the image rather than using only the L 2 loss. However, L SSI M does not restore the accurate pixel intensity. Additionally, corrupted structures in blurry observations may lead to unexpected structures in the resulting images.
However, in Figure 7 the results of our adaptive L 2 _SSIM loss L L 2 _SSI M demonstrate not only effectiveness in restoring accurate pixel values, but also in restoring the details and sharp edges of the image. As shown in Table 7, DualDeblur achieves the best in most metrics including PSNR, SSIM, and LPIPS except FSIM. Specifically, the results of DualDeblur show that the average PSNR increases by 5.26 and 1.78, compared with those of DualDeblur-B and DualDeblur-C, respectively. In addition, the results of DualDeblur show that the average SSIM is 0.0212 higher than the second-highest DualDeblur-C, that the average FSIM is 0.0197 lower than the highest DaulDeblur-A, and that the average LPIPS is 0.0287 better than the second-best DualDeblur-A. Figure 8a demonstrates the effectiveness of our adaptive L 2 _SSIM loss. The proposed adaptive L 2 _SSIM loss outperforms all other losses in every iteration. Figure 8b shows the change of ω(t) in Equation (5), which is the weight of the adaptive L 2 _SSIM loss following the training iterations. As mentioned earlier, the L 2 is more weighted than L SSI M in the initial iteration step, and the weight of L SSI M increases exponentially.
As shown in Table 8, we conduct various experiments on the α and γ of Equation (5). The results show that the model with α = 10 and γ = 100 gives the best results for both PSNR and SSIM, whereas the model with α = 50 and γ = 200 is the best for FSIM and LPIPS. We select the model with α = 10 and γ = 100 because PSNR and SSIM are the most commonly used metrics.

Conclusions
In this paper, we proposed a DualDeblur framework to restore a single sharp image using multiple blurry images. Our framework adopted a dual architecture to utilize the complementary information of two blurry images for obtaining a single sharp image. We proposed an adaptive L 2 _SSIM loss to ensure both pixel accuracy and structural details. For practical and accurate performance evaluation of our results, we divided the blur pairs into soft and hard pairs. Extensive comparisons demonstrated the superior results of our DualDeblur, compared to those of previous methods in both quantitative and qualitative evaluations.

Conflicts of Interest:
The authors declare no conflicts of interest.