Multi-Scale Cyclic Image Deblurring Based on PVC-Resnet

: Aiming at the non-uniform blurring of image caused by optical system defects or external interference factors, such as camera shake, out-of-focus, and fast movement of object, a multi-scale cyclic image deblurring model based on a parallel void convolution-Resnet (PVC-Resnet) is proposed in this paper, in which a multi-scale recurrent network architecture and a coarse-to-ﬁne strategy are used to restore blurred images. The backbone network is built based on Unet codec architecture, where a PVC-Resnet module designed by combinations of parallel dilated convolution and residual network is constructed in the encoder of the backbone network. The convolution receptive ﬁeld is expanded with parallel dilated convolution to extract richer global features. Besides, a multi-scale feature extraction module is designed to extract the shallow features of different scale targets in blurred images, and then the extracted features are sent to the backbone network for feature reﬁnement. The SSIM loss function and the L 1 loss function are combined to construct the SSIM-L 1 joint loss function for the optimization of the overall network to ensure that the image restoration at different stages can be optimized. The experimental results show that the average peak signal-to-noise ratio (PSNR) of the proposed model on different data sets is as high as 32.84 dB, and the structural similarity (SSIM) reaches 0.9235. and statistical structural similarity (Stat-SSIM) of 0.9249 on different datasets. Compared with other methods, the deblurred images generated by this method are superior to the methods proposed by Nah et al., Kupyn et al. and Cho S J et al., especially on the calibration board data set. The model proposed in this paper applies parallel dilated convolution and SSIM-L 1 joint loss function to improve the performance of the network so that the edge and texture details of the restored image are clearer.


Introduction
In recent years, along with the progress of computer computing speed and the development of vision algorithms, images have become an important data source, and useful information and features can be extracted by processing and analyzing images extracted for materials [1], biology [2], medicine [3], and other application fields.However, optical imaging systems [4] are limited by their own physical characteristics, such as the shape and material of the lens and the location of the shot, resulting in the scattering and refraction phenomenon when the light passes through the lens or reflector, and vision sensors may blur images acquired in real time, causing changes in image resolution and contrast, which affects the acquisition of important information in the images, thus causing a degradation in the performance of vision algorithms [5,6].Therefore, image deblurring is of great significance in the field of optical imaging and plays an important role in improving image quality.
Traditional image deblurring methods usually recover the sharp version from a blurred image via estimating the blurring kernel [7], which models different types of uniform, nonuniform and depth-aware blurring kernel [8] and impose various constraints to solve the blurring kernel using a priori information about the image, and finally recover the corresponding sharp image from the given blurred image.Xu et al. [9] made an assumption that the blur function is a generalized mathematical sparse expression and then performed the deconvolution operation to obtain the sharp image estimate.Zhou et al. [10] proposed an alternative method of global modeling for image deblurring research, which decouples image degradation and content components to a certain extent by Fourier transform as a priori for image degradation, and employs Fourier spatial interaction modeling and Fourier channel evolution customized core design.Hayashi et al. [11] proposed a method for the estimation of motion blur parameters of point spread function (PSF), using a filter convolution PSF consisting of PSF (motion blur) and PSF (out-of-focus blur) convolutions to inverse the blurred image and sharpening the image, which is effective.Most of the traditional works recovered blurred images by utilizing a hypothetical blurring kernel and natural image priors during the deblurring process, but the blurring function is commonly unknown and obtaining the optimal blurring kernel is a difficult task, thus directly affecting the overall deblurring performance.
The development of emerging technologies such as optical computing [12] and deep learning provides new solutions for image deblurring.Optical computing techniques can accelerate image processing and computation by exploiting the nonlinear properties of optical devices, which can be operated directly in the optical domain [13].Unlike traditional methods, optical computing methods can directly process blurred images without estimating blur kernels.For example, by using devices such as optical phase modulators [14] or spatial light modulators [15], operations including nonlinear filtering, inverse filtering, regularization and least squares can be implemented in the optical domain, thereby directly recovering the sharp version of the blurred image.Adabi et al. [16] proposed a scalable and learnable deblurring framework to organize digital filters in an intelligent way, which can find the most suitable speckle reduction algorithm for a given image.Convolutional neural network (CNN) based image deblurring methods have also achieved remarkable success.Early CNN-based image deblurring methods used CNN as a blurring kernel estimator and constructed a two-stage CNN-based kernel estimation and deconvolution-based image deblurring framework [17], but the down-sampling operation of the image reduced the resolution of the small-scale target information in the blurred image, resulting in the loss of some details of the small targets in the deblurred image.In order to realize the restoration of small targets in blurred images, Nah et al. [18] proposed a coarse-to-fine deblurring method that used multi-scale CNN to directly deblur the image.Tao et al. [19] proposed a scale-recurrent network (SRN) with shared parameters, which can reduce network model parameters and effectively improve the efficiency of model training.On the basis of SRN, Gao et al. [20] proposed a parameter selective sharing strategy and a nested skip connection structure to realize the blind restoration of blurred images in dynamic scenes.Compared with traditional CNNs, multi-scale models can improve the deblurring performance of the network, but their multi-scale coarse-to-fine strategies are mainly to increase network depth or share network weights, leading to an increase in computational complexity.
Recently, the deblurring model based on GAN (generative adversarial network) was studied, which realized image deblurring in a single-scale way by constructing a generator and discriminator and defining a game between the discriminator and the generator.Kupyn et al. proposed DeblurGAN [21], a kernel-free blind motion deblurring based on a conditional adversarial network [22] that is optimized using a multi-component loss function, which is helpful for the detection of blurred images.Zhao et al. [23] proposed a lightweight and real-time unsupervised BID baseline, called Frequency Domain Contrast Loss Constrained Lightweight CycleGAN (FCL-GAN), where two new cooperative units, Lightweight Domain Conversion Unit (LDCU) and Parameterless Frequency Domain Contrast Unit (PFCU) were designed, and the performance of the model was proved on multiple datasets.The GANs-based image deblurring methods have proven to be effective in image deblurring but suffer from training instability.With the increase in the number of network layers, the network has problems such as information loss and overfitting, resulting in a serious degradation in model performance.To solve the above problems, some studies have found that the introduction of residual blocks in CNNs can build wider networks, and residual learning can make the network learn more features from each convolutional layer [24][25][26].Based on the advantage of residual learning, Sharif et al. [27] proposed an end-to-end scale recurrent deep network to achieve multi-modal image deblurring, in which a new residual dense block with spatial asymmetric attention is designed and the experimental results show that it has obvious advantages in qualitative and quantitative evaluation.It has been demonstrated that different levels of blur in images can be better handled from multi-scale images [28].Lin et al. [29] used a transformer-based deep learning model for image deblurring and used the imaging model to generate 16 different blurring kernels to adapt the model to different degrees of blurred images, and the results show that the proposed method can remove different degrees of blurring and can handle images with both clear and blurred targets.Based on this, various CNN-based deblurring methods have also adopted this idea where blurred images of different scales are used as inputs to each sub-network [30], and a coarse-to-fine design principle has proven to be effective for image deblurring.
In order to achieve the restoration of targets of different scales in blurred images and to obtain the global features of blurred images, a multi-scale circular image deblurring model based on PVC-Resnet is proposed in this paper.Resnet is introduced into the network to avoid gradient disappearance and information loss in the deep neural network.According to the characteristics of blurred images, parallel dilated convolution is applied to expand the convolution receptive field and improve the deblurring ability of the network model.A multi-scale feature extraction module is proposed to extract the shallow information of different size targets in blurred images.In addition, to compensate for the loss of necessary resolution information of small targets caused by the downsampling process, this paper uses a multiscale cyclic method to reduce the blurred images and perform feature fusion between different scales by upsampling.Aiming at the deviation of brightness and color caused by SSIM loss function, combined with L 1 norm loss, the joint loss function SSIM-L 1 loss is constructed to restore the deblurred image closer to the true value image in edge detail and brightness color.

Data Description
Two datasets have been used in this paper, GoPro dataset and the calibration board dataset.The GoPro dataset was created by Nah et al. [19].It simulates complex camera shakes and object motions.The motion-blurred scenes include pedestrian motion and vehicle motion, which are widely used in the field of image deblurring.The dataset consists of one-to-one correspondence of real blurred images and groundtruth images taken by high-speed cameras, including 3214 blurred images with a size of 1280 × 720, of which 2103 are training images and 1111 are test images.Some examples of the GoPro dataset are shown in Figure 1.The calibration plate dataset is generated by MATLAB software and is commonly used in the fields of camera calibration and image distortion correction, etc.The use of the calibration plate dataset allows us to obtain the accurate internal and external parameters of the camera, which can better recover the image details and textures and improve the effectiveness of the deblurring.Besides, with the calibration plate dataset, we can use known geometrical structures to evaluate the reconstruction accuracy and fidelity of dif- The calibration plate dataset is generated by MATLAB software and is commonly used in the fields of camera calibration and image distortion correction, etc.The use of the calibration plate dataset allows us to obtain the accurate internal and external parameters of the camera, which can better recover the image details and textures and improve the effectiveness of the deblurring.Besides, with the calibration plate dataset, we can use known geometrical structures to evaluate the reconstruction accuracy and fidelity of different deblurring methods, which allows for a better comparison of the performance of different deblurring methods.The generated images are enhanced by geometric transformations such as rotation, scaling, and tilting.After the screening, 2400 clear calibration plate images are obtained.In order to make the blurred image closer to the actual shooting situation, this paper uses the fuzzy algorithm to process the generated calibration plate images, and finally, 800 defocus blurred images and 1600 motion blurred images are obtained.Some examples of the calibration plate dataset are shown in Figure 2. The calibration plate dataset is generated by MATLAB software and is commonly used in the fields of camera calibration and image distortion correction, etc.The use of the calibration plate dataset allows us to obtain the accurate internal and external parameters of the camera, which can better recover the image details and textures and improve the effectiveness of the deblurring.Besides, with the calibration plate dataset, we can use known geometrical structures to evaluate the reconstruction accuracy and fidelity of different deblurring methods, which allows for a better comparison of the performance of different deblurring methods.The generated images are enhanced by geometric transformations such as rotation, scaling, and tilting.After the screening, 2400 clear calibration plate images are obtained.In order to make the blurred image closer to the actual shooting situation, this paper uses the fuzzy algorithm to process the generated calibration plate images, and finally, 800 defocus blurred images and 1600 motion blurred images are obtained.Some examples of the calibration plate dataset are shown in Figure 2.

Multi-Scale Cyclic Deblurring Model Based on PVC-Resnet
The structure of the multi-scale cyclic deblurring model based on PVC-Resnet proposed in this paper is shown in Figure 3.In the proposed model, the Unet codec architecture is used as the backbone network.To improve the deblurring ability of the model, a multi-scale feature extraction module and PVC-Resnet module are designed in the construction of the network encoder.The multi-scale feature extraction module realizes shallow feature extraction of different-sized targets in blurred images by concatenating multiple convolutional layers in parallel.The PVC-Resnet module combines parallel dilated convolution with a residual network.The residual structure directly transmits the input information to the output by introducing a cross-layer connection, so as to avoid gradient disappearance and information loss in the deep neural network, thereby improving the performance and training efficiency of the model.Besides, the dilated convolution effectively expands the convolution receptive field without increasing the amount of calculation and realizes the extraction of global and local information from blurred images.Moreover, in order to make up for the loss of the necessary resolution information caused by the downsampling process, this paper uses a multi-scale cycle to restore the blurred im-

Multi-Scale Cyclic Deblurring Model Based on PVC-Resnet
The structure of the multi-scale cyclic deblurring model based on PVC-Resnet proposed in this paper is shown in Figure 3.In the proposed model, the Unet codec architecture is used as the backbone network.To improve the deblurring ability of the model, a multiscale feature extraction module and PVC-Resnet module are designed in the construction of the network encoder.The multi-scale feature extraction module realizes shallow feature extraction of different-sized targets in blurred images by concatenating multiple convolutional layers in parallel.The PVC-Resnet module combines parallel dilated convolution with a residual network.The residual structure directly transmits the input information to the output by introducing a cross-layer connection, so as to avoid gradient disappearance and information loss in the deep neural network, thereby improving the performance and training efficiency of the model.Besides, the dilated convolution effectively expands the convolution receptive field without increasing the amount of calculation and realizes the extraction of global and local information from blurred images.Moreover, in order to make up for the loss of the necessary resolution information caused by the downsampling process, this paper uses a multi-scale cycle to restore the blurred image.As shown in Figure 3, in the coding stage, the model input is a blurred image of different scales.The small-scale blurred images are first extracted by using the multi-scale feature extraction module and then fed into the backbone network stage 3 for feature refinement.After upsampling, C 3 is fused with the blurred image, and then input to the backbone network stage 2 through the multi-scale feature extraction module for feature refinement.After upsampling, C 2 is fused with the largest-scale blurred image B1.Finally, the deblurred image C 1 is output through a complete backbone network.At the same time, the output results of each loop are supervised by clear images of corresponding scales, and the SSIM-L 1 joint loss function is constructed to calculate the loss between the two, so as to ensure that the image restoration at different stages can be optimized.
backbone network stage 2 through the multi-scale feature extraction module for feature refinement.After upsampling, C2 is fused with the largest-scale blurred image B1.Finally, the deblurred image C1 is output through a complete backbone network.At the same time, the output results of each loop are supervised by clear images of corresponding scales, and the SSIM-L1 joint loss function is constructed to calculate the loss between the two, so as to ensure that the image restoration at different stages can be optimized.

Backbone Network Design
The backbone network of this paper is based on the Unet architecture based on the coding network block-decoding network block.The Unet network performs dimensional splicing of the feature map through skip connections, which can retain more location and feature information, and is superior to other network structures in image deblurring tasks.The coding network block of this paper is composed of a convolution layer, PVC-Resnet module and maximum pooling layer.Stage 0-Stage 5 in Table 1 shows the structure of the coding network.The convolution layer is responsible for the feature extraction of the input feature map, and the maximum pooling layer is responsible for two times the downsampling of the input feature map to reduce the spatial resolution of the image.The PVC-Resnet module combines parallel dilated convolution with residual network and uses the Resnet network to improve the deblurring performance of the model.According to the characteristics of blurred images, parallel dilated convolution is added to expand the convolution receptive field and obtain the global features of the image.The decoding network block is composed of an upsampling layer and a convolution layer.The structure is shown in Stage 6-Stage 9 in Table 1.The encoder and decoder are connected across layers through a long skip connection structure, which effectively aggregates the context information of the shallow network and the deep network and makes up for the information loss caused by the downsampling process in the coding stage.

Backbone Network Design
The backbone network of this paper is based on the Unet architecture based on the coding network block-decoding network block.The Unet network performs dimensional splicing of the feature map through skip connections, which can retain more location and feature information, and is superior to other network structures in image deblurring tasks.The coding network block of this paper is composed of a convolution layer, PVC-Resnet module and maximum pooling layer.Stage 0-Stage 5 in Table 1 shows the structure of the coding network.The convolution layer is responsible for the feature extraction of the input feature map, and the maximum pooling layer is responsible for two times the downsampling of the input feature map to reduce the spatial resolution of the image.The PVC-Resnet module combines parallel dilated convolution with residual network and uses the Resnet network to improve the deblurring performance of the model.According to the characteristics of blurred images, parallel dilated convolution is added to expand the convolution receptive field and obtain the global features of the image.The decoding network block is composed of an upsampling layer and a convolution layer.The structure is shown in Stage 6-Stage 9 in Table 1.The encoder and decoder are connected across layers through a long skip connection structure, which effectively aggregates the context information of the shallow network and the deep network and makes up for the information loss caused by the downsampling process in the coding stage.Due to the camera jitter and target motion during the imaging process, the collected images will be blurred to varying degrees at different positions.Therefore, the network model is required to have strong feature extraction capabilities and be able to adapt to the feature extraction of different-sized targets in blurred images.Generally, the shallow network convolution receptive field is small, which can retain the required resolution information for small targets in blurred images, while the deep network convolution receptive field is large, which is suitable for processing large targets.Therefore, this paper proposes a multi-scale feature extraction module consisting of a different number of convolutional layers of different sizes in parallel to achieve multi-scale feature extraction of different scale targets in blurred images.The structure of the multi-scale feature extraction module is shown in Figure 4.The feature information of different-sized targets in the blurred image is extracted by four parallel branches.Each branch contains a different number of CBR modules, each of which contains a 3 × 3 convolutional layer, a batch normalization layer and a linear rectification function.As shown in Figure 4, from left to right, each branch consists of two CBR modules, four CRB modules and one CRB module, respectively, and each branch is followed by a 1 × 1 convolution to adjust the number of channels for dimension connection between feature maps.In order to reduce the amount of calculation, the parameters of the first branch and the first convolution layer with the second branch are shared to extract the blur of the larger size target in the blurred image.The third branch network is shallow, which is used to extract the blur of small-size targets in blurred images.The fourth branch contains only a 1 × 1 convolution, which is used to capture the global and local features of each pixel in the blurred image.Finally, the feature information extracted from the four branches is fused to enhance the weights of useful features in the network and suppress useless features to complete the feature extraction of multi-scale targets in blurred images.
Photonics 2023, 10, x FOR PEER REVIEW 7 of 15 consists of two CBR modules, four CRB modules and one CRB module, respectively, and each branch is followed by a 1 × 1 convolution to adjust the number of channels for dimension connection between feature maps.In order to reduce the amount of calculation, the parameters of the first branch and the first convolution layer with the second branch are shared to extract the blur of the larger size target in the blurred image.The third branch network is shallow, which is used to extract the blur of small-size targets in blurred images.The fourth branch contains only a 1 × 1 convolution, which is used to capture the global and local features of each pixel in the blurred image.Finally, the feature information extracted from the four branches is fused to enhance the weights of useful features in the network and suppress useless features to complete the feature extraction of multiscale targets in blurred images.

PVC-Resnet Module
In the process of inputting blurred images into the neural network, in order to increase the receptive field and reduce the amount of calculation, the network will perform downsampling operations such as convolution and pooling on the image.These operations increase the receptive field but reduce the spatial resolution, resulting in the loss of internal data structure and spatial hierarchical information of the image.Therefore, based on the Resnet network, this paper proposes a parallel void convolution-Resnet (PVC-Resnet) module in the encoder of the backbone network.The built PVC-Resnet module introduces dilated convolution into the Resnet network, which can expand the receptive field, extract richer global features and enhance the deblurring ability of the network model.For the dilated convolution, ' dilation rate ' is introduced into the standard convolution layer, and the spacing of each value is defined when the convolution kernel processes data, which can increase the receptive field without increasing the amount of calculation

PVC-Resnet Module
In the process of inputting blurred images into the neural network, in order to increase the receptive field and reduce the amount of calculation, the network will perform downsampling operations such as convolution and pooling on the image.These operations increase the receptive field but reduce the spatial resolution, resulting in the loss of internal data structure and spatial hierarchical information of the image.Therefore, based on the Resnet network, this paper proposes a parallel void convolution-Resnet (PVC-Resnet) module in the encoder of the backbone network.The built PVC-Resnet module introduces dilated convolution into the Resnet network, which can expand the receptive field, extract richer global features and enhance the deblurring ability of the network model.For the dilated convolution, ' dilation rate ' is introduced into the standard convolution layer, and the spacing of each value is defined when the convolution kernel processes data, which can increase the receptive field without increasing the amount of calculation and make full use of the information contained in the feature map.The relationship between the actual receptive field size and the expansion rate of the feature map is defined as follows: where d represents the expansion rate, that is, the number of intervals between the convolution kernel.(d − 1) represents the number of spaces filled in, k represents the size of the convolution kernel, and N represents the actual receptive field size of the feature map, respectively.The receptive field changes after increasing the dilated convolution are shown in Table 2.For the deblurring task, the receptive field should be large enough to ensure the capture of severe large-scale blurring, but too large an expansion rate will lose the continuity of image information.With regard to these, this paper constructs a parallel void convolution module based on the ResNet structure, as shown in Figure 5. Through the combination of the void convolution and the standard convolution, the missing information of the void convolution at the spatial level is effectively supplemented to give better continuity to the whole network.
and make full use of the information contained in the feature map.The relationship between the actual receptive field size and the expansion rate of the feature map is defined as follows: ( 1) ( 1) where d represents the expansion rate, that is, the number of intervals between the convolution kernel.(d − 1) represents the number of spaces filled in, k represents the size of the convolution kernel, and N represents the actual receptive field size of the feature map, respectively.The receptive field changes after increasing the dilated convolution are shown in Table 2.For the deblurring task, the receptive field should be large enough to ensure the capture of severe large-scale blurring, but too large an expansion rate will lose the continuity of image information.With regard to these, this paper constructs a parallel void convolution module based on the ResNet structure, as shown in Figure 5. Through the combination of the void convolution and the standard convolution, the missing information of the void convolution at the spatial level is effectively supplemented to give better continuity to the whole network.As shown in Figure 5, S2, S3, and S4 are composed of multiple end-to-end residual units BTNK1 and BTNK2.The structures BTNK1 and BTNK2 are given in Figure 5. BTNK2 makes the input x obtain the mapping function F(x) through three convolutional layers and related BN layers and Relu activation functions and then adds it to x through a jump connection structure to finally obtain the mapping function.Compared with BTNK2, BTNK1 has a 1 × 1 branch convolution layer, which is used to adjust the number of channels between input x and output F(x), match the difference between input and output dimensions, and finally obtain the mapping function F(x) + G(x).The structural advantages of BTNK1 and BTNK2 enable shallow features to be directly mapped to the deep layer, which effectively solves the problem of loss of target texture detail information caused by traditional convolutional layers and fully connected layers in the process of image deblurring.

Loss Function
The design of loss function can impact the performance of the network.In this paper, the joint loss function is built to complete the network parameter optimization.The loss function is defined as the joint loss of structural similarity (SSIM) loss and L 1 norm loss, which is calculated as: In the equation, L loss represents the loss value of the joint loss function, L SSIM represents the loss value of SSIM, L l1 represents the loss value of L 1 , α represents a constant, usually 0.84, G α represents the Gaussian distribution parameter.In practical applications, the Gaussian function is generally used to calculate the mean, variance and covariance of the image, instead of traversing the pixels to obtain higher efficiency.
The L 1 norm loss function is also called the minimum absolute deviation (LAD).In general, it minimizes the sum S of the absolute difference between the target value Y i and the estimated value f (x i ).The calculation is as follows: The full name of the SSIM loss function is structural similarity index, which is structural similarity.It considers brightness, contrast and structural indicators, and considers human perception.In general, the results obtained by SSIM will have better detail.
The research shows that SSIM loss can easily lead to brightness change and color deviation, but it can retain high-frequency information and better restore the edges and details of the image, while L 1 loss function can better maintain brightness and color unchanged.Therefore, this paper adopts a joint loss function combining SSIM loss and L 1 loss.

Experimental Environment and Parameter Settings
The Intel(R)-Xeon(R) CPU E5-2699 v3, 2.30 GHz, and NVIDIA Quadro P2000 GPU with 5 GB of video memory were used for the experiments.The software environment is Windows 64-bit operating system, using Python as the programming language and Pytorch as the deep learning framework.
In the network training, a batch training method is used to divide the training and validation sets into multiple batches, and all the images of the training set are computed in the network model as one epoch.The network model is initialized by loading pretrained weights to initialize the network.The initial learning rate is 1 × 10 −3 , and the Adam algorithm is used to optimize and calculate the adaptive learning rate for each weight parameter.

Evaluation Indicators
The evaluation criteria used in this paper include Peak Signal to Noise Ratio (PSNR), Structural Similarity (SSIM), Statistical Structural Similarity (Stat-SSIM), and Deblurring Time of a single image.PSNR is the most common and widely used objective evaluation index of an image, which evaluates the quality of an image according to the error between the corresponding pixel points, and is expressed in dB, with a larger value indicating that the smaller the distortion is and the closer the two images are SSIM is a measure of the similarity of two images, which takes into account the brightness, contrast and structural indicators, and human perception.Generally speaking, the results obtained by SSIM will have better details.Stat-SSIM is an improved algorithm based on SSIM, which takes into account the statistical variations of the image and uses the histogram information of the image to adjust the normalization coefficients of the luminance and contrast factors to be able to more accurately assess the quality of the image, and its calculation is shown in Equation ( 4): where µ i is a shorthand for the image i expected value E[i], which is computed through convolution with an 11 × 11 Gaussian kernel of standard deviation 3.All the other expectation values in the equation are calculated by convoluting the argument quantity with the same Gaussian filter ε µ and ε σ are two regulators that limit the maximum resolution of the fractions in the computational equation and impose cutoffs on the mean and variance expectation values, respectively.In particular, when the values of both the numerator and denominator of the fraction are much smaller than the corresponding ε, the output is close to 1.The results are finally averaged over the whole image containing n p pixels and n c channel dimensions.The image expectation value µ is calculated as shown in Equation ( 5): σ represents the standard deviation of the image and is calculated as shown in Equation ( 6): 1 2 (6) where (m, n) denotes the image position coordinates, and H, W represent the height and width of the image, respectively.

Ablation Experiment
In order to verify the effectiveness of the PVC-Resnet module and the joint loss function SSIM-L 1 proposed in this paper, ablation experiments were designed to test the proposed deblurring model, and the performance of the SSIM-L 1 + Resnet model, L 1 + PVC-Resnet model, SSIM + PVC-Resnet model and SSIM-L 1 + PVC-Resnet model were tested based on two sample data sets, and the results are shown in Figure 6.As can be seen from the detailed comparison graphs, due to the lack of the advantage of parallel void convolution to expand the convolutional field, the image deblurring effect of the SSIM-L 1 + Resnet model is the worst, and the recovered images have one or more different levels of defects in the edge details, respectively.Compared with the SSIM-L 1 + Resnet model, the image deblurring effect of the L 1 + PVC-Resnet model is slightly improved, but some texture details in the recovered images are still not clear enough.The SSIM + PVC-Resnet model has a better recovery effect in the edges and details of blurred images, but its recovered images slightly deviate from the true value images in terms of brightness and color.In contrast, the SSIM-L 1 + PVC-Resnet model proposed in this paper uses the SSIM-L 1 joint loss function to calculate the error loss value between the prediction result and the label data after expanding the receptive field through parallel dilated convolution.The restored image is clearer with more sharp edges and details.It is also closer to the true value image in brightness and color, and has a better deblurring effect.
Table 3 is the test results of image deblurring under different structural models.From Table 3, it can be seen that there is not much difference in the running time of single image deblurring between the structural models on the Gopro dataset and the calibration plate dataset, but the SSIM-L 1 + PVC-Resnet model proposed in this paper outperforms the SSIM-L 1 + Resnet model, L 1 + PVC-Resnet model and SSIM + PVC-Resnet models in terms of PSNR, SSIM and Stat-SSIM metrics.On the PSNR index, the SSIM-L 1 + PVC-Resnet model is 2.95 dB and 4.68 dB higher than the SSIM-L 1 + Resnet model on the two datasets, 3.10 dB and 1.47 dB higher than the L 1 + PVC-Resnet model, and 0.53 dB and 0.74 dB higher than the SSIM + PVC-Resnet model, respectively.On the SSIM index, the SSIM-L 1 + PVC-Resnet model is 0.0174 and 0.0253 higher than the SSIM-L 1 + Resnet model on the two data sets, 0.0067 and 0.0031 higher than the L 1 + PVC-Resnet model, and 0.0024 and 0.    In order to verify the effectiveness of the network proposed in this paper, this experiment is based on GoPro dataset and calibration plate dataset to test the network model performance of the methods proposed by Nah et al. [18], Kupyn et al. [21], Cho S J et al. [26] and evaluate the algorithm strengths and weaknesses by PSNR and SSIM indicators.The test results are shown in The PSNR value of this method is 3.75 dB and 1.62 dB higher than that of Nah et al., 1.88 dB and 1.35 dB higher than that of Kupyn et al., and 0.23 dB and 0.85 dB higher than that of Cho S J et al.On the SSIM index, the proposed method is 0.0227 and 0.0053 higher than that of Nah et al., 0.0064 and 0.0131 higher than that of Kupyn et al., 0.0049 and 0.0045 higher than that of Cho S J et al.On the Stat-SSIM metrics, the method proposed herein is 0.036 and 0.0176 higher than the method of Nah et al., 0.0213 and 0.0104 higher than the method of Kupyn et al., and 0.0047 and 0.0016 higher than the method of Cho S J et al., respectively.Through data analysis, it can be seen that the multi-scale cyclic image deblurring model based on PVC-Resnet proposed in this paper can effectively compensate for the missing feature information due to the lack of receptive field in the image deblurring process and improve the image deblurring effect.Table 4 shows the test results of image deblurring under different methods.From Table 4, it can be seen that on the Gopro dataset and the calibration board dataset, the model proposed in this study outperforms the methods of Nah et al., Kupyn et

Figure 5 .
Figure 5. PVC-ResNet module.As shown in Figure 5, S2, S3, and S4 are composed of multiple end-to-end residual units BTNK1 and BTNK2.The structures BTNK1 and BTNK2 are given in Figure 5. BTNK2 makes the input x obtain the mapping function F(x) through three convolutional layers and related BN layers and Relu activation functions and then adds it to x through a jump connection structure to finally obtain the mapping function.Compared with

Figure 6 .
Figure 6.Subjective comparison of different structural models for deblurring.

Figure 7 ,
from left to right are blurred image, true value image, Nah et al.'s method, Kupyn et al.'s method, Cho S J et al.'s method and our method.From the detailed comparison figure, we can see that the deblurred images generated by Nah et al.'s, and Kupyn et al.'s methods have more obvious ringing artifacts which are pseudo-edges similar to information fluctuations and oscillations when high-frequency information is added to enhance the edges and details of the image during image sampling and reconstruction.Compared with Nah et al.'s, and Kupyn et al.'s methods, the method in this paper has more accurate and clear feature contours on the recovered images, fewer artifacts, and higher digital recognition.Compared with the method of Cho S J et al., this paper has a slight improvement in light and brightness, which is closer to the real image, and the overall texture of the image is smoother and better perceived.model proposed in this study outperforms the methods of Nah et al., Kupyn et al. and Cho S J et al. in terms of PSNR, SSIM and Stat-SSIM metrics, although it is slightly faster than the method proposed by Nah et al. in terms of the runtime for single image deblurring.

Figure 7 .
Figure 7.Comparison of subjective effects of deblurring results of each model image.Figure 7. Comparison of subjective effects of deblurring results of each model image.

Figure 7 .
Figure 7.Comparison of subjective effects of deblurring results of each model image.Figure 7. Comparison of subjective effects of deblurring results of each model image.
al. and Cho S J et al. in terms of PSNR, SSIM and Stat-SSIM metrics, although it is slightly faster than the method proposed by Nah et al. in terms of the runtime for single image deblurring.The PSNR value of this method is 3.75 dB and 1.62 dB higher than that of Nah et al., 1.88 dB and 1.35 dB higher than that of Kupyn et al. and 0.23 dB and 0.85 dB higher than that of

Table 1 .
Detailed information of the model.

Table 2 .
The receptive field size of convolution kernel under different D values.

Table 2 .
The receptive field size of convolution kernel under different D values.

Table 3 .
Evaluation index scores of different structural models.iment is based on GoPro dataset and calibration plate dataset to test the network model performance of the methods proposed by Nah et al. [18], Kupyn et al. [21], Cho S J et al. [26] and evaluate the algorithm strengths and weaknesses by PSNR and SSIM indicators.The test results are shown in Figure 7, from left to right are blurred image, true value 3.3.2.Comparative Analysis of Performance of Different Deblurring ModelsIn order to verify the effectiveness of the network proposed in this paper, this exper-

Table 3 .
Evaluation index scores of different structural models.

Table 4 .
Image deblurring test results of each model.