EFE-CNA Net: An Approach for Effective Image Deblurring Using an Edge-Sensitive Focusing Encoder

Zheng, Fengbo; Zhang, Xiu; Jiang, Lifen; Liang, Gongbo

doi:10.3390/electronics13132493

Open AccessArticle

EFE-CNA Net: An Approach for Effective Image Deblurring Using an Edge-Sensitive Focusing Encoder

¹

College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China

²

Xinjiang Port Economic Development and Management Research Center, Yili Normal University, Yining 835000, China

³

Department of Computing and Cyber Security, Texas A&M University, San Antonio, TX 78224, USA

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2024, 13(13), 2493; https://doi.org/10.3390/electronics13132493

Submission received: 29 May 2024 / Revised: 19 June 2024 / Accepted: 25 June 2024 / Published: 26 June 2024

(This article belongs to the Special Issue Deep Learning-Based Image Restoration and Object Identification)

Download

Browse Figures

Versions Notes

Abstract

Deep learning-based image deblurring techniques have made great advancements, improving both processing speed and deblurring efficacy. However, existing methods still face challenges when dealing with complex blur types and the semantic understanding of images. The segment anything model (SAM), a versatile deep learning model that accurately and efficiently segments objects in images, facilitates various tasks in computer vision. This article leverages SAM’s proficiency in capturing object edges and enhancing image content comprehension to improve image deblurring. We introduce the edge-sensitive focusing encoder (EFE) module, which utilizes masks generated by the SAM framework and re-weights the masked portion following SAM segmentation by detecting its features and high-frequency information. The EFE module uses the masks to locate the position of the blur in an image while identifying the intensity of the blur, allowing the model to focus more accurately on specific features. Masks with greater high-frequency information are assigned higher weights, prompting the network to prioritize them during processing. Based on the EFE module, we develop a deblurring network called the edge-sensitive focusing encoder-based convolution–normalization and attention network (EFE-CNA Net), which utilizes the EFE module to enhance the deblurring process, employs an image-mask decoder to merge features from both the image and the mask from the EFE module, and incorporates the CNA Net as its base network. This design enables the model to focus on distinct features at various locations, enhancing its learning process through the guidance provided by the EFE module and the blurred images. Testing results on the RealBlur and REDS datasets demonstrate the effectiveness of the EFE-CNA Net, achieving peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) metrics of 28.77, 0.902 (RealBlur-J), 36.40, 0.956 (RealBlur-R), 31.45, and 0.919 (REDS).

Keywords:

deep learning; image deblur; image restoration; segment anything

1. Introduction

In the field of image processing, image deblurring is essential for enhancing visual quality and extracting meaningful information from degraded images. The advent of deep learning has significantly advanced image deblurring techniques, particularly through the use of CNNs. These end-to-end models can directly learn complex representations from data.

Despite these advancements, deep learning models still face challenges in understanding image content and semantic information, particularly in scenarios with severe blur or significant variations. While some models can learn high-level features, they often struggle with accurately comprehending objects and structures in complex scenes.

Several studies have highlighted the effectiveness of segmentation-assist deblurring methods [1,2,3,4]. Krishnan et al. [1] proposed a blind image deblurring method using normalized sparsity measures, segmenting the image into distinct regions to infer the blur kernel and clear image based on regional sparsity. Luo et al. [2] employed superpixel segmentation priors for blind deblurring, improving deblurring quality by utilizing consistent information within superpixels. Zhang et al. [3] introduced a method using a three-stage intensity prior, enhancing deblurring by modeling intensity distributions in three distinct regions. Li et al. [4] introduced a dynamic scene deblurring framework that employs hybrid activation functions and edge-assisted dual-branch residuals. This method enhances deblurring performance by leveraging a combination of activation functions to capture diverse image features and incorporating edge information to guide the deblurring process through a dual-branch architecture, which separately processes edge details and broader image structures.

The SAM is an advanced computer vision model that automatically recognizes and segments arbitrary objects in an image and has gained attention for its potential in image deblurring [5,6,7]. Li et al. [5] improve generalizability of the deblurring model by enabling a robust interaction between image data and segmentation masks generated by SAM. Zhang et al. [6] distilled semantic priors from the SAM to enhance the efficiency and accuracy of image restoration models. Jin et al. [7] demonstrated the practicality of SAM in low-level visual tasks through grayscale encoding and channel expansion. SAM utilizes advanced image segmentation techniques to partition images into coherent regions based on priors such as sparsity, superpixel segmentation, and intensity distribution. This segmentation allows for a more nuanced understanding and processing of image content and structure, enabling tailored deblurring strategies for different regions.

In this paper, we leverage the advantages of SAM and propose the EFE module, which re-weights masks generated by SAM following segmentation by detecting their features and high-frequency information. Masks with greater high-frequency information are assigned higher weights, prompting the network to prioritize them during processing. The EFE module uses the masks to locate blur in an image and identify the intensity of the blur, allowing the model to focus more accurately on specific features. Based on the EFE module, we develop a deblurring network called EFE-CNA Net, which enhances the deblurring process by employing an image-mask decoder to merge features from both the image and the mask from the EFE module, and incorporates the CNA Net as its base network. This design enables the model to focus on distinct features at various locations, enhancing its learning process through the guidance provided by the EFE module and the blurred images. Feature maps extracted by different models show that our model outperforms others in extracting features from key parts of the image. Testing results on the RealBlur [8] and REDS [9] datasets showed PSNR and SSIM metrics of 28.77 and 0.902 on RealBlur-J, 36.40 and 0.956 on RealBlur-R, and 31.45 and 0.919 on REDS, surpassing other existing methods.

The remaining part of this article is organized as follows: Section 2 provides a brief review of related work. Section 3 describes our method EFE-CNA Net in detail. Section 4 shows the experimental results. The usefulness of edge perception in image deblurring is discussed in Section 5. Section 6 presents the future work. Section 7 concludes the paper.

2. Related Work

The rapid development of deep learning, particularly CNNs, has significantly improved the effectiveness of image deblurring techniques. Recent methods that have advanced the field considerably include the deblur generative adversarial network (Deblur-GAN) [10], which uses a generative adversarial network to produce high-quality deblurred images; the deep multi-patch hierarchical network (DMPHN) [11], which processes overlapping patches through a hierarchical structure to handle non-uniform blurs; and the scale-recurrent network (SRN) [12], which refines images at multiple scales for various blur levels. Despite the substantial progress these technologies have made in advancing image deblurring, they still face limitations in understanding image content and semantic information. While some advanced models can learn high-level features from images, they struggle to accurately comprehend objects and structures in complex or heavily blurred scenes.

By segmenting the image into meaningful regions, a segmentation mask is obtained, which can be used to facilitate a more detailed understanding and processing of the image content and structure by the network, thus tailoring the deblurring strategy to different regions. Krishnan et al. [1] proposed a blind image deblurring method based on normalized sparsity measures. They utilize image segmentation techniques to partition the image into different regions and infer the blur kernel and clear image based on the sparsity of each region, thus achieving blind deblurring. However, it relies on relatively simple region partitioning. Luo et al. [2] employed a superpixel segmentation prior to blind image deblurring. They proposed a method based on superpixel segmentation to assist in recovering clear images by utilizing the prior information from superpixel segmentation. This method first segments the input image into superpixels and then utilizes the consistency information within superpixels to guide the blind deblurring process, thereby improving the quality of deblurred results. Zhang et al. [3] introduced a method for image deblurring using a three-stage intensity prior. In image deblurring, the intensity prior refers to the distribution of intensities (brightness) in different regions of the image. Traditional intensity priors often assume the existence of only two different intensity distributions in the image, namely background and foreground. However, they proposed a novel approach that divides the image into three different regions, and models the intensity distribution of each region separately, thereby enhancing the deblurring effect. While this partitioning method provides an innovative perspective, it appears somewhat limited in handling more diverse scenes and edge variations. Jin et al. [7] introduced a method using grayscale encoding and channel expansion, which demonstrated practicality in low-level visual tasks. However, this approach showed some limitations in handling complex image content and fine edges. Although these methods use segmentation and similar methods as additional guidance for image deblurring training, they still have limitations.

To address the limitations of existing methods, we introduce EFE-CNA Net, which excels in capturing edges, enhancing image comprehension, and detecting high-frequency information for superior deblurring performance. The EFE-CNA Net adopts the SAM framework to generate segmentation masks and employs the EFE module to process these masks, which allows the network to prioritize areas with significant detail and accurately identify the location and intensity of the blur. The features from the blurred image are fused with the mask features using an image-mask decoder, and the combined features are then fed into the CNA Net for image recovery. This design enhances the model’s focus on distinct features, thereby addressing issues of insufficient edge handling and semantic perception in previous methods.

3. Approach

To address the challenge of blurred edges and improve the comprehension and analysis of image content and structure, we propose the EFE-CNA Net. The architecture of the EFE-CNA Net is shown in Figure 1. The blurred image is first pre-processed and then segmented by SAM to obtain the mask. For the mask acquisition process, we employ an improved method called grouped masking. We slice the mask into multiple segments, with each slice representing a specific grayscale value. This differs from conventional approaches that group pixels based solely on grayscale values. By creating a unique mask for each grayscale value, we enhance precision and capture finer details. These mask slices incorporate multiple grayscale levels, providing a more nuanced and detailed representation of the image. This slicing approach allows us to accurately segment and analyze different regions of the image based on their grayscale levels. In the EFE module, the mask is fused with the blurred image information, reweighted, and calibrated features are obtained. These features are then fused with the original image features via the image-mask decoder, which combines the blurred image and the corresponding mask information. The image encoder, mask encoder, and image-mask decoder all use 3 × 3 convolutions. The structure of the image-mask decoder is obtained by concatenating the image decoder and the mask decoder. These inputs are fed into the deblurring model CNA Net to restore a clear image. In this work, we adopt a two-stage debulrring strategy in which the first stage uses the encoder and decoder structure for coarse processing, and the second stage uses CNA Net for fine processing. This kind of two-stage/multi-stage strategy could jointly promote the quality of image restoration and have been proven in many previous work [13,14].

3.1. EFE Module

To address the challenge of blurred edges, we introduce the EFE module, which enhances the understanding and processing of the image content and structure while tailoring deblurring strategies to different regions. The module employs mask slicing, a technique where the mask obtained from SAM is divided into smaller slices. Each slice is then multiplied with the features extracted by the image encoder and a max pooling operation is performed to aggregate these results, resulting in a calibrated mask. This refined mask directs the network’s attention toward critical regions, ensuring that even if certain areas are masked, important edge information remains clear.

The significance of mask slicing in our work lies in its ability to fine-tune the interaction between image information and segmentation masks. By breaking down the mask into slices, the EFE module can more accurately identify and emphasize regions with pronounced edge changes. This nuanced interaction enhances the network’s capacity to discern and preserve crucial edge details. Consequently, the network focuses more on these regions, leading to improved edge definition and overall image clarity.

Specifically, during training, the position of each mask slice is determined first. Each mask slice is then combined with features from the image encoder. The features extracted from the mask max module are re-weighted and aggregated to form new features, which are then further extracted by the mask max module. The pseudo-code of the EFE module is provided in Algorithm 1.

The re-weight process starts by determining the importance of each mask region through the sum of pixel values within that region, representing its weighted significance. Using this sum, we calculate both the maximum pixel value and the weighted sum of the pixel values within each region. The maximum pixel value for each mask region is then multiplied by the corresponding mask region to obtain the weighted image segment for that region. Next, we derive

{mask}_{u n c o v e r e d}

by identifying and merging regions in the segmentation mask that are not occluded by other instances. The mask for the uncovered regions is calculated, ensuring values are restricted between 0 and 1. The maximum pixel value within the uncovered area is determined and multiplied by the corresponding mask area. This result is then added to the previously computed weighted image segments, enhancing the network’s ability to focus on significant regions and improve overall image quality.

Algorithm 1 EFE module

Require:: Segment mask
Ensure:: Calculated the features in masks
0:: Input: $M_{{mask}_{i}}$ , where $i \in {1, N}$ and $m a s k_{u n c o v e r e d}$
1:: Initialize $F_{E F E}$ to 0
2:: $F_{e n c} \leftarrow Image Encoder$ (blur image)
3:: if training then
4:: Mask Dropout $(M_{m a s k})$
5:: $M_{m a s k} \leftarrow Concat ({m a s k_{u n c o v e r e d}, M_{m a s k}})$
6:: for each $m a s k_{n} \in M_{m a s k}$ do
7:: $F_{e n c_{n}} \leftarrow F_{e n c} \times m a s k_{n}$
8:: $F_{E F E_{n}} \leftarrow Mask Max (F_{e n c_{n}})$
9:: $F_{E F E} \leftarrow F_{E F E} + F_{E F E_{n}}$
10:: end for
11:: end if

3.2. CNA Net

CNA Net, based on a U-Net [15] backbone, incorporates convolution, normalization, gating, and attention mechanisms in CNA blocks. The architecture, depicted in Figure 2, includes two downsampling and two upsampling layers with skip connections.

Each CNA block, shown at the bottom of Figure 2, starts with layer normalization, followed by 1 × 1 convolution and 3 × 3 depth-wise convolution. Simple gating performs element-wise multiplication, and channel attention uses a single Conv and 1 × 1 convolution. The output is summed with the initial normalized features.

In the second stage, data undergo layer normalization and 1 × 1 convolution, without depth-wise convolution. Simple gating and another 1 × 1 convolution follow, with the final output summed with the initial normalized features.

3.3. Loss Function and Training

In this article, we utilize PSNR and SSIM to jointly assess image quality. The PSNR loss function primarily focuses on the peak signal-to-noise ratio of the image, while SSIM considers the structural similarity of the image, including brightness, contrast, and structural information. By simultaneously considering PSNR and SSIM, we can comprehensively evaluate the quality of image reconstruction, ensuring that the output image maintains both the accuracy of the structure and the consistency of visual perception while reducing distortion. The loss function is given by Equation (1).

Loss = λ_{PSNR} \cdot {Loss}_{PSNR} + λ_{SSIM} \cdot {Loss}_{SSIM}

(1)

where

λ_{PSNR}

and

λ_{SSIM}

are the weights for the PSNR and SSIM losses, respectively. In our model, both weights are set to 0.5.

{Loss}_{PSNR}

is defined as follows:

{Loss}_{PSNR} = - 10 \cdot {log}_{10} (\frac{M A X^{2}}{\frac{1}{m n} \sum_{i = 1}^{m} \sum_{j = 1}^{n} {(I (i, j) - K (i, j))}^{2}})

(2)

where

M A X

is the maximum possible pixel value of the image (e.g.,

M A X = 255

for an 8-bit image), and

I (i, j)

and

K (i, j)

represent the pixel values at position

(i, j)

of the original and reconstructed images, respectively. A lower PSNR loss indicates a smaller mean squared error between the original and reconstructed images, thus indicating higher image quality.

{Loss}_{SSIM}

is defined as follows:

{Loss}_{SSIM} = 1 - \frac{(2 μ_{x} μ_{y} + C_{1}) (2 σ_{x y} + C_{2})}{(μ_{x}^{2} + μ_{y}^{2} + C_{1}) (σ_{x}^{2} + σ_{y}^{2} + C_{2})}

(3)

where

μ_{x}

and

μ_{y}

are the means of the original and reconstructed images,

σ_{x}^{2}

and

σ_{y}^{2}

are the variances of the original and reconstructed images,

σ_{x y}

is the covariance between the original and reconstructed images, and

C_{1}

and

C_{2}

are constants for stability. A lower SSIM loss indicates a higher structural similarity between the original and reconstructed images.

To optimize the EFE-CNA network for enhanced image deblurring performance, we utilize the AdamW optimizer, which combines the backpropagation algorithm with gradient descent and incorporates weight decay to address potential overfitting issues. Our primary goal is to minimize both PSNR and SSIM losses. We employ cosine annealing to adjust the learning rate, starting with a default of

2 \times 10^{- 3}

and reducing it to

1 \times 10^{- 7}

. The model operates on 256 × 256 patches with a batch size of 32, and the training process involves a total of

3 \times 10^{5}

iterations.

Through iterative parameter tuning using this approach, the EFE-CNA Net can significantly improve image deblurring, resulting in clearer and more accurate image restoration.

4. Experiments

In this article, we utilize PSNR and SSIM as the primary metrics for evaluating the quality of the restored images. We conducted experiments on the RealBlur dataset and the REDS dataset.

4.1. Dataset

Our model is trained on the RealBlur dataset and the REDS dataset, both of which are publicly accessible resources specifically designed for deblurring tasks.

The RealBlur [8] dataset consists of a collection of blurry images captured from real-world scenes, utilized for learning and evaluating deblurring algorithms. It comprises 4738 pairs of images from 232 different scenes. Each training set consists of 3758 image pairs, while each test set consists of 980 image pairs. The dataset is divided into two subsets based on the image acquisition format: the RAW format RealBlur-R subset and the JPEG format RealBlur-J subset.

The REDS (realistic and dynamic scenes) [9] dataset provides realistic and dynamic scene data, including both video and image files, as well as images at both original and low resolutions. This dataset can be used for not only super-resolution research but also image and video deblurring studies. It consists of 300 video sequences with a resolution of 720 × 1280 pixels, each containing 100 frames. The training, validation, and test sets consist of 240, 30, and 30 videos, respectively. The image files in this dataset are extracted from the videos, with each image corresponding to a frame in the video. Thus, there are 24,000 images for training and 3000 images each for validation and testing.

Details of the datasets are in Table 1.

4.2. Results

Table 2 presents the deblurring comparison results for the RealBlur-R dataset. Table 3 displays the deblurring comparison results for the RealBlur-J dataset. Table 4 shows the deblurring comparison results for the REDS dataset. The tables are sorted in ascending order of PSNR, and in the case of PSNR ties, they are further sorted in ascending order of SSIM. Our method is at the end.

As seen in Table 2, Table 3 and Table 4, our method achieves effective results with different datasets. Although MAXIM-3S [23] achieves the highest PSNR on the RealBlur-J dataset, which is 0.2% higher than ours, its SSIM is 3% lower than ours. Zhang et al. [22] achieved the highest SSIM on the REDS dataset, surpassing ours by 4%; however, we outperformed their PSNR by 5%. Experiments show that our model is effective in deblurring.

4.3. Quality Experiments

In Table 2, Table 3 and Table 4, we demonstrate the effectiveness and superiority of EFE-CNA Net on the RealBlur dataset and REDS dataset. Additionally, we conducted quality experiments to validate the superiority of our proposed method. We select a subset of models from the aforementioned models for comparison with our model, as shown in Figure 3 and Figure 4.

In quality experiments, the superior delineation of edges, the greater structural integrity, and the closer proximity to the GT image mean better image deblurring quality.

As seen in Figure 3 and Figure 4, although our method does not exhibit significant improvements in PSNR and SSIM compared to other methods, our images show considerable enhancement in detail when compared to those produced by other methods. For example, consider the image of the medical box in Figure 3, we can see that the deblurred image from our method exhibits clear edges and visible contours, while the results from other methods do not. Images with clear and continuous contours have a higher quality of recovery.

5. Discussion

The experiment results show the effectiveness of our method. In this section, we conduct additional experiments to further demonstrate the usefulness of edge perception in deblurring tasks.

5.1. Effectiveness of Introducing Segmentation as Additional Guidance

The main improvement of the EFE-CNA network compared to the CNA network is the introduction of SAM as an additional guidance. For the better use of SAM, we specifically designed the EFE module for mask processing. In this section, we conduct experiments to evaluate the effectiveness of these modules from different perspectives. We compare the experimental results of EFE-CNA Net (with SAM) and CNA Net (without SAM) on different datasets. The specific data are shown in Table 5.

From Table 5, it can be seen that the use of SAM further improves the quality of image restoration, especially on the SSIM metrics. It can be confirmed that the introduction of segmentation can indeed enhance the comprehension of image content. We conducted experiments to evaluate the SAM framework, conducting quality experiments to validate the effectiveness of the SAM framework. Image quality comparisons are shown in Figure 5. In quality experiments, the superior delineation of edges, the greater structural integrity, and the closer proximity to the GT image mean better image deblurring quality.

In Figure 5, the outcome with SAM evidently resembles the GT image’s structure more closely compared to the outcome without SAM.

5.2. Evaluation of the EFE Module

In blurred regions, pixels with higher intensity typically represent brighter features or edges. Therefore, using the maximum value to represent the entire area highlights these brighter features and focuses the network on that portion, enhancing contrast and highlighting details within the blurred area.

We conducted experiments to evaluate the SAM framework and EFE module, conducting both quality experiments and experiments generating network perception feature heatmaps to validate the effectiveness of the combined SAM framework and EFE module. We compare the experimental results of the mask concatenate method (without the mask max module) and EFE-CNA Net (with the mask max module) on different datasets. The image quality comparison is presented in Figure 6. The feature extraction heat map is presented in Figure 7. Next, we will compare and evaluate the different modules in the EFE module. The results are shown in Table 6.

From Table 6, we find that EFE-CNA Net is better than mask concatenate.

In quality experiments, the superior delineation of edges, the greater structural integrity, and the closer proximity to the GT image mean better image deblurring quality. For example, in Figure 6, in the license plate section, the characters can be clearly identified, indicating good image deblurring results.

In Figure 7, through the heat map, we find that the effect of SAM alone (mask concatenate method) improves very little. However, after adding the EFE module, the EFE-CNA Net is more effective than the mask concatenate method.

5.3. Considerations for the Mask Max Module in the EFE Module

The reason for this is that training the model using the maximum value highlights key features, enhances the model’s perception of edges and details, reduces blurring, and makes the image clearer and easier to recognize. In contrast, using the average to represent the entire region results in the network perceiving fewer details and edges, as the average is influenced by all pixel values within the region, including less prominent features. Therefore, using the maximum value better highlights sharp features and edges in the blurred area.

Although using the maximum value may cause some information loss, especially when multiple salient features exist within a region, SAM segmentation effectively divides the region so that each segment does not contain multiple distinct salient features. The use of the mask max module in the EFE module eliminates the potential for information loss, making it highly effective.

We compare the experimental results of the EFE module with those of average pooling and max pooling methods on different datasets. The results are shown in Table 7.

Figure 8 illustrates the various methods employed within the mask max module of the EFE module. These include the method without using the EFE module (CNA Net), the method that employs average pooling in the mask max module (Average), and the method that utilizes max pooling in the mask max module (MAX).

For example, in the license plate section of an image, the model needs to focus on learning the characters on the license plate. The average pooling method learns from the entire license plate, while the maximum pooling method not only learns from the whole license plate but also further focuses on the characters of the license plate.

Figure 8 demonstrates that the EFE module captures image details better with max pooling, whereas average pooling is less effective in preserving these details.

6. Limitations and Future Work

Although the combination of the SAM framework and the EFE module provides significant assistance and guidance in image deblurring, there are still some limitations in practical applications. For instance, in handling extreme lighting conditions or highly complex dynamic scenes, there is room for improvement in deblurring effects due to inaccurate segmentation, particularly in processing edges and details. For blurry images, SAM sometimes does not handle them well.

In the future, to address the segmentation accuracy issue, we will explore the integration of multidimensional data (like spatiotemporal information) to enhance the precision of image segmentation. Additionally, to tackle the blurring problem in dynamic scenes, future research will explore deblurring models that combine temporal information, temporal prediction, and the SAM framework to better handle blurring in complex dynamic scenarios. In addition, we will propose an evaluation mechanism to assess the confidence of SAM segmentation results. Then we will dynamically adjust the strategy of deblurring processing according to the evaluated confidence level.

7. Conclusions

In this article, we propose EFE-CNA Net, which leverages SAM and utilizes masks to guide image deblurring, addressing the issue of poor object edge perception in traditional deep learning deblurring networks. SAM segmentation and the EFE module can make the network effectively perceive the content information of the image, perceive the edge of the object, and highlight the key information by recalculating the weight. The well-designed image mask decoder effectively combines the information of the blurred image and the segmentation mask, so that the network pays more attention to the key features, to achieve the purpose of guiding the image deblurring. Through experiments, we find that the use of max pooling in the mask max module of the EFE module is more effective than the use of average pooling. Due to the introduction of SAM, the perception of blurred edges of objects can be improved to a certain extent while still distinguishing objects from objects, which is why the use of max pooling enables the network to perceive regions better than average pooling, allowing the network to perceive regions better.

Author Contributions

L.J. and F.Z. conceptualized this study; F.Z. and X.Z. designed the model; X.Z. implemented the model; L.J. and F.Z. contributed to the improvement of the model; F.Z. and X.Z. designed the experiment; X.Z., L.J., F.Z., and G.L. reviewed and evaluated the results; X.Z. and F.Z. wrote the manuscript. All authors reviewed the manuscript and contributed to revisions. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Tianjin Research Innovation Project for Postgraduate Students (grant nos. 2022SKY264 and 2022SKY283). This work was funded by Tianjin Normal University Collaborative Research Project No. 53H24034.

Data Availability Statement

The trained model and related experiment results are available at https://github.com/JemmaZX/EFE-CNA-Net (accessed on 25 May 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Krishnan, D.; Tay, T.; Fergus, R. Blind deconvolution using a normalized sparsity measure. In Proceedings of the CVPR 2011, Colorado Springs, CO, USA, 20–25 June 2011; IEEE: Piscataway, NJ, USA, 2011. [Google Scholar]
Luo, B.; Cheng, Z.; Xu, L.; Zhang, G.; Li, H. Blind image deblurring via superpixel segmentation prior. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 1467–1482. [Google Scholar] [CrossRef]
Zhang, H.; Wu, Y.; Zhang, L.; Zhang, Z.; Li, Y. Image deblurring using tri-segment intensity prior. Neurocomputing 2020, 398, 265–279. [Google Scholar] [CrossRef]
Li, Z.; Cui, G.; Liu, H.; Chen, Z.; Zhao, J. A novel dynamic scene deblurring framework based on hybrid activation and edge-assisted dual-branch residuals. Vis. Comput. 2024, 40, 3849–3869. [Google Scholar] [CrossRef]
Li, S.; Liu, M.; Zhang, Y.; Chen, S.; Li, H.; Dou, Z.; Chen, H. SAM-Deblur: Let segment anything boost image Deblurring. In Proceedings of the ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024; IEEE: Piscataway, NJ, USA, 2024. [Google Scholar]
Zhang, Q.; Liu, X.; Li, W.; Chen, H.; Liu, J.; Hu, J.; Xiong, Z.; Yuan, C.; Wang, Y. Distilling Semantic Priors from SAM to Efficient Image Restoration Models. arXiv 2024, arXiv:2403.16368. [Google Scholar]
Jin, Z.; Chen, S.; Chen, Y.; Xu, Z.; Feng, H. Let segment anything help image dehaze. arXiv 2023, arXiv:2306.15870. [Google Scholar]
Rim, J.; Lee, H.; Won, J.; Cho, S. Real-world blur dataset for learning and benchmarking deblurring algorithms. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XXV 16. Springer International Publishing: Cham, Switzerland, 2020. [Google Scholar]
Nah, S.; Baik, S.; Hong, S.; Moon, G.; Son, S.; Timofte, R.; Mu Lee, K. Ntire 2019 challenge on video deblurring and super-resolution: Dataset and study. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
Kupyn, O.; Budzan, V.; Mykhailych, M.; Mishkin, D.; Matas, J. Deblurgan: Blind motion deblurring using conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Zhang, H.; Dai, Y.; Li, H.; Koniusz, P. Deep stacked hierarchical multi-patch network for image deblurring. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
Tao, X.; Gao, H.; Shen, X.; Wang, J.; Jia, J. Scale-recurrent network for deep image deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Zhang, X.; Zheng, F.; Jiang, L.; Guo, H. CNB Net: A Two-Stage Approach for Effective Image Deblurring. Electronics 2024, 13, 404. [Google Scholar] [CrossRef]
Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.-H.; Shao, L. Multi-stage progressive image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III; Springer International Publishing: Cham, Switzerland, 2015; Volume 9351, pp. 234–241. [Google Scholar]
Nah, S.; Hyun Kim, T.; Mu Lee, K. Deep multi-scale convolutional neural network for dynamic scene deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Chen, L.; Lu, X.; Zhang, J.; Chu, X.; Chen, C. Hinet: Half instance normalization network for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021. [Google Scholar]
Hu, Z.; Cho, S.; Wang, J.; Yang, M.-H. Deblurring low-light images with light streaks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
Pan, J.; Sun, D.; Pfister, H.; Yang, M.-H. Blind image deblurring using dark channel prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Xu, L.; Zheng, S.; Jia, J. Unnatural l0 sparse representation for natural image deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013. [Google Scholar]
Kupyn, O.; Martyniuk, T.; Wu, J.; Wang, Z. Deblurgan-v2: Deblurring (orders-of-magnitude) faster and better. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
Zhang, J.; Pan, J.; Ren, J.; Song, Y.; Bao, L.; Lau, R.-W.; Yang, M.-H. Dynamic scene deblurring using spatially variant recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Tu, Z.; Talebi, H.; Zhang, H.; Yang, F.; Milanfar, P.; Bovik, A.; Li, Y. Maxim: Multi-axis mlp for image processing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]

Figure 1. The overall architecture of EFE-CNA Net. The blur image is preprocessed to obtain the mask. The re-weighting operation is handled by the EFE module. The image-mask decoder module concatenates the features from both the image encoder and mask encoder, combining information from the blurred image and the re-weighted mask. This combined input is then used for the CNA Net to perform image deblurring.

Figure 2. The architecture of CNA Net. The picture shows the main structure of CNA Net, which consists of several CNA blocks with U-Net [15] architecture, and the number of channels is labeled on the left side. Below is the internal structure of the CNA block, from left to right the modules are layer norm, Conv, DConv, simple gate, simplified channel attention, Conv, add operation, layer norm, Conv, simple gate, Conv, and add operation.

Figure 3. Visual comparison of image deblurring methods on the RealBlur-J dataset. The first column is the blurry image, the second column is the blurred patch, the third column is the result of MPRNet [14], and the fourth column is MAXIM-3S [23]. The fifth column presents the results of our model. The sixth column is the GT (ground truth) patch. The image on the right is the detailed portion of the original image on the left circled in red.

Figure 4. Visual comparison of image deblurring methods on the REDS dataset. The first column is the blurry image, the second column is the blurred patch, the third column is the result of HINet [17], and the fourth column is MAXIM-3S [23]. The fifth column presents the results of our model. The sixth column is the GT patch. The image on the right is the detailed portion of the original image on the left circled in red.

Figure 5. Qualitative comparison with different methods. Column 1: blurry image, Column 2: blurry patch, Column 3: CNA Net (without SAM), Column 4: EFE-CNA Net (with SAM), Column 5: GT image. The image on the right is the detailed portion of the original image on the left circled in red.

Figure 6. Qualitative comparison with different methods. Column 1: blurry image, Column 2: blurry patch, Column 3: mask concatenate, Column 4: EFE-CNA Net, Column 5: GT image. The image on the right is the detailed portion of the original image on the left circled in red.

Figure 7. Comparison with different methods. In the first row, the blur image, GT image, and SAM-segmented mask image are displayed. In the second row, the CNA Net, mask concatenate, and EFE-CNA Net results are shown. Detailed images are presented in the same order. The feature-sensitive heatmaps are extracted after the EFE module. The heatmaps illustrate the mean values averaged across all channels. The bottom image is the detailed portion circled in red in the original image above.

Figure 8. Comparison with different methods within the mask max module of the EFE module. In the first row, the blur image, GT image, and SAM-segmented mask image are displayed. In the second row, the CNA Net result, the EFE module with average pooling result, and the EFE module with max pooling result are shown. Detailed images are presented in the same order. The feature-sensitive heatmaps are extracted after the EFE module. The heatmaps illustrate the mean values averaged across all channels. The bottom image is the detailed portion circled in red in the original image above.

Table 1. Dataset details.

Dataset	Train/Test Pairs	Data Format	Resolution	Scenes
RealBlur-J	3758/980	JPEG	669 × 760	232
RealBlur-R	3758/980	RAW	669 × 760	232
REDS	24,000/3000	PNG	1280 × 720	300

Table 2. Deblurring results on the RealBlur-R dataset. The best results are in bold.

Method	PSNR	SSIM
Nah et al. [16]	32.51	0.841
HINet [17]	32.76	0.950
Hu et al. [18]	33.67	0.916
Deblur GAN [10]	33.79	0.903
Pan et al. [19]	34.01	0.916
Xu et al. [20]	34.46	0.937
Deblur GAN-v2 [21]	35.26	0.944
Zhang et al. [22]	35.48	0.947
SRN [12]	35.66	0.947
DMPHN [11]	35.70	0.948
MAXIM-3S [23]	35.78	0.947
MPRNet [14]	35.99	0.952
EFE-CNA Net (Ours)	36.40	0.959

Table 3. Deblurring results on the RealBlur-J dataset. The best results are in bold.

Method	PSNR	SSIM
Hu et al. [18]	26.41	0.803
Xu et al. [20]	27.14	0.830
Pan et al. [19]	27.22	0.790
Zhang et al. [22]	27.80	0.847
Nah et al. [16]	27.87	0.827
Deblur GAN [10]	27.97	0.834
DMPHN [11]	28.42	0.860
SRN [12]	28.56	0.867
Deblur GAN-v2 [21]	28.70	0.866
MPRNet [14]	28.70	0.873
MAXIM-3S [23]	28.83	0.875
EFE-CNA Net (Ours)	28.77	0.902

Table 4. Deblurring results on the REDS dataset. The best results are in bold.

Method	PSNR	SSIM
Deblur GAN-v2 [21]	26.82	0.722
Pan et al. [19]	26.85	0.830
Hu et al. [18]	27.14	0.855
SRN [12]	28.60	0.759
HINet [17]	28.79	0.911
MAXIM-3S [23]	28.83	0.862
Nah et al. [16]	29.08	0.914
MPRNet [14]	29.92	0.897
Xu et al. [20]	29.25	0.882
Zhang et al. [22]	29.97	0.923
DMPHN [11]	31.20	0.910
EFE-CNA Net (Ours)	31.45	0.919

Table 5. Deblurring comparison results with different methods on the RealBlur dataset and REDS dataset.

Method	RealBlur-J	RealBlur-R	REDS
Method	PSNR/SSIM	PSNR/SSIM	PSNR/SSIM
CNA Net	26.57/0.866	34.89/0.917	29.09/0.867
EFE-CNA Net	28.77/0.902	36.40/0.959	31.45/0.919

Table 6. Deblurring results with different methods on the RealBlur dataset and REDS dataset. The mask concatenate means without the mask max/average module in the EFE module.

Method	RealBlur-J	RealBlur-R	REDS
Method	PSNR/SSIM	PSNR/SSIM	PSNR/SSIM
Mask Concatenate	26.62/0.868	34.57/0.950	28.73/0.903
EFE-CNA Net	28.77/0.902	36.40/0.959	31.45/0.919

Table 7. Deblurring results with different methods on the RealBlur dataset and REDS dataset. Average means with average pooling in the EFE module. MAX means with max pooling in the EFE module.

Method	RealBlur-J	RealBlur-R	REDS
Method	PSNR/SSIM	PSNR/SSIM	PSNR/SSIM
EFE (Average)	28.71/0.902	36.21/0.956	31.42/0.919
EFE (MAX)	28.77/0.902	36.40/0.960	31.45/0.919

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zheng, F.; Zhang, X.; Jiang, L.; Liang, G. EFE-CNA Net: An Approach for Effective Image Deblurring Using an Edge-Sensitive Focusing Encoder. Electronics 2024, 13, 2493. https://doi.org/10.3390/electronics13132493

AMA Style

Zheng F, Zhang X, Jiang L, Liang G. EFE-CNA Net: An Approach for Effective Image Deblurring Using an Edge-Sensitive Focusing Encoder. Electronics. 2024; 13(13):2493. https://doi.org/10.3390/electronics13132493

Chicago/Turabian Style

Zheng, Fengbo, Xiu Zhang, Lifen Jiang, and Gongbo Liang. 2024. "EFE-CNA Net: An Approach for Effective Image Deblurring Using an Edge-Sensitive Focusing Encoder" Electronics 13, no. 13: 2493. https://doi.org/10.3390/electronics13132493

APA Style

Zheng, F., Zhang, X., Jiang, L., & Liang, G. (2024). EFE-CNA Net: An Approach for Effective Image Deblurring Using an Edge-Sensitive Focusing Encoder. Electronics, 13(13), 2493. https://doi.org/10.3390/electronics13132493

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

EFE-CNA Net: An Approach for Effective Image Deblurring Using an Edge-Sensitive Focusing Encoder

Abstract

1. Introduction

2. Related Work

3. Approach

3.1. EFE Module

3.2. CNA Net

3.3. Loss Function and Training

4. Experiments

4.1. Dataset

4.2. Results

4.3. Quality Experiments

5. Discussion

5.1. Effectiveness of Introducing Segmentation as Additional Guidance

5.2. Evaluation of the EFE Module

5.3. Considerations for the Mask Max Module in the EFE Module

6. Limitations and Future Work

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI