Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (14)

Search Parameters:
Keywords = Restormer

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 8167 KB  
Article
MRMAFusion: A Multi-Scale Restormer and Multi-Dimensional Attention Network for Infrared and Visible Image Fusion
by Liang Dong, Guiling Sun, Haicheng Zhang and Wenxuan Luo
Appl. Sci. 2026, 16(2), 946; https://doi.org/10.3390/app16020946 - 16 Jan 2026
Viewed by 165
Abstract
Infrared and visible image fusion improves the visual representation of scenes. Current deep learning-based fusion methods typically rely on either convolution operations for local feature extraction or Transformers for global feature extraction, often neglecting the contribution of multi-scale features to fusion performance. To [...] Read more.
Infrared and visible image fusion improves the visual representation of scenes. Current deep learning-based fusion methods typically rely on either convolution operations for local feature extraction or Transformers for global feature extraction, often neglecting the contribution of multi-scale features to fusion performance. To address this limitation, we propose MRMAFusion, a nested connection model that relies on the multi-scale restoration-Transformer (Restormer) and multi-dimensional attention. We construct an encoder–decoder architecture on UNet++ network with multi-scale local and global feature extraction using convolution blocks and Restormer. Restormer can provide global dependency and more comprehensive attention to texture details of the target region along the vertical dimension, compared to extracting features by convolution operations. Along the horizontal dimension, we enhance MRMAFusion’s multi-scale feature extraction and reconstruction capability by incorporating multi-dimensional attention into the encoder’s convolutional blocks. We perform extensive experiments on the public datasets TNO, NIR and RoadScene and compare with other state-of-the-art methods for both objective and subjective evaluation. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

20 pages, 6170 KB  
Article
Adaptive Cross-Modal Denoising: Enhancing LiDAR–Camera Fusion Perception in Adverse Circumstances
by Muhammad Arslan Ghaffar, Kangshuai Zhang, Nuo Pan and Lei Peng
Sensors 2026, 26(2), 408; https://doi.org/10.3390/s26020408 - 8 Jan 2026
Viewed by 436
Abstract
Autonomous vehicles (AVs) rely on LiDAR and camera sensors to perceive their environment. However, adverse weather conditions, such as rain, snow, and fog, negatively affect these sensors, reducing their reliability by introducing unwanted noise. Effective denoising of multimodal sensor data is crucial for [...] Read more.
Autonomous vehicles (AVs) rely on LiDAR and camera sensors to perceive their environment. However, adverse weather conditions, such as rain, snow, and fog, negatively affect these sensors, reducing their reliability by introducing unwanted noise. Effective denoising of multimodal sensor data is crucial for safe and reliable AV operation in such circumstances. Existing denoising methods primarily focus on unimodal approaches, addressing noise in individual modalities without fully leveraging the complementary nature of LiDAR and camera data. To enhance multimodal perception in adverse weather, we propose a novel Adaptive Cross-Modal Denoising (ACMD) framework, which leverages modality-specific self-denoising encoders, followed by an Adaptive Bridge Controller (ABC) to evaluate residual noise and guide the direction of cross-modal denoising. Following this, the Cross-Modal Denoising (CMD) module is introduced, which selectively refines the noisier modality using semantic guidance from the cleaner modality. Synthetic noise was added to both sensors’ data during training to simulate real-world noisy conditions. Experiments on the WeatherKITTI dataset show that ACMD surpasses traditional unimodal denoising methods (Restormer, PathNet, BM3D, PointCleanNet) by 28.2% in PSNR and 33.3% in CD, and outperforms state-of-the-art fusion models by 16.2% in JDE. The ACMD framework enhances AV reliability in adverse weather conditions, supporting safe autonomous driving. Full article
(This article belongs to the Section Vehicular Sensing)
Show Figures

Figure 1

22 pages, 5463 KB  
Article
SRG-YOLO: Star Operation and Restormer-Based YOLOv11 via Global Context for Vehicle Object Detection
by Wei Song, Junying Min and Jiaqi Zhao
Automation 2026, 7(1), 15; https://doi.org/10.3390/automation7010015 - 7 Jan 2026
Viewed by 363
Abstract
Recently, these conventional object detection methods have certain defects that must be overcome, such as insufficient detection accuracy in complex scenes and low computational efficiency. Then, this paper proposes a Star operation and Restormer-based YOLOv11 model that leverages global context for vehicle detection [...] Read more.
Recently, these conventional object detection methods have certain defects that must be overcome, such as insufficient detection accuracy in complex scenes and low computational efficiency. Then, this paper proposes a Star operation and Restormer-based YOLOv11 model that leverages global context for vehicle detection (SRG-YOLO), which aims to enhance both detection accuracy and efficiency in complex environments. Firstly, during the optimization of YOLOv11n architecture, a Star block is introduced. By enhancing non-linear feature representation, this Star block improves the original C3K2 module, thereby strengthening multi-scale feature fusion and consequently boosting detection accuracy in complex scenarios. Secondly, for the detection heads of YOLOv11n, Restormer is incorporated via the improved C3K2 module to explicitly leverage spatial prior information, optimize the self-attention mechanism, and augment long-range pixel dependencies of YOLOv11n. This integration not only reduces computational complexity but also improves detection precision and overall efficiency through more refined feature modeling. Thirdly, a Context-guided module is integrated to enhance the ability to capture object details using global context. In complex backgrounds, it effectively combines local features with their contextual information, substantially improving the detection robustness of YOLOv11n. Finally, experiments on the VisDrone2019, KITTI, and UA-DETRAC datasets illustrate that SRG-YOLO achieves superior vehicle detection accuracy in complex scenes compared to conventional methods, with particular advantages in small object detection. Full article
(This article belongs to the Collection Automation in Intelligent Transportation Systems)
Show Figures

Figure 1

21 pages, 3463 KB  
Article
A Practical CNN–Transformer Hybrid Network for Real-World Image Denoising
by Ahhyun Lee, Eunhyeok Hwang and Dongsun Kim
Mathematics 2026, 14(1), 203; https://doi.org/10.3390/math14010203 - 5 Jan 2026
Viewed by 539
Abstract
Real-world image denoising faces a critical trade-off: Convolutional Neural Network (CNN)-based methods are computationally efficient but limited in capturing long-range dependencies, while Transformer-based approaches achieve superior global modeling at prohibitive computational costs (>100 G Multiply–Accumulate Operations, MACs). This presents significant challenges for deployment [...] Read more.
Real-world image denoising faces a critical trade-off: Convolutional Neural Network (CNN)-based methods are computationally efficient but limited in capturing long-range dependencies, while Transformer-based approaches achieve superior global modeling at prohibitive computational costs (>100 G Multiply–Accumulate Operations, MACs). This presents significant challenges for deployment in resource-constrained environments. We present a practical CNN–Transformer hybrid network that systematically balances performance and efficiency under practical deployment constraints for real-world image denoising. By integrating key components from NAFNet (Nonlinear Activation Free Network) and Restormer, our method employs three design strategies: (1) strategic combination of CNN and Transformer blocks enabling performance–efficiency trade-offs; (2) elimination of nonlinear operations for hardware compatibility; and (3) architecture search under explicit resource constraints. Experimental results demonstrate competitive performance with significantly reduced computational cost: our models achieve 39.98–40.05 dB Peak Signal-to-Noise Ratio (PSNR) and 0.958–0.961 Structural Similarity Index Measure (SSIM) on the SIDD dataset, and 39.73–39.91 dB PSNR and 0.959–0.961 SSIM on the DND dataset, while requiring 7.18–16.02 M parameters and 20.44–44.49 G MACs. Cross-validation results show robust generalization without significant performance degradation across diverse scenes, demonstrating a favorable trade-off among performance, efficiency, and practicality. Full article
(This article belongs to the Section E1: Mathematics and Computer Science)
Show Figures

Figure 1

29 pages, 10944 KB  
Article
Marker-Less Lung Tumor Tracking from Real-Time Color X-Ray Fluoroscopic Images Using Cross-Patient Deep Learning Model
by Yongxuan Yan, Fumitake Fujii and Takehiro Shiinoki
Bioengineering 2025, 12(11), 1197; https://doi.org/10.3390/bioengineering12111197 - 2 Nov 2025
Viewed by 1069
Abstract
Fiducial marker implantation for tumor localization in radiotherapy is effective but invasive and carries complication risks. To address this, we propose a marker-less tumor tracking framework to explore the feasibility of a cross-patient deep learning model, aiming to eliminate the need for per-patient [...] Read more.
Fiducial marker implantation for tumor localization in radiotherapy is effective but invasive and carries complication risks. To address this, we propose a marker-less tumor tracking framework to explore the feasibility of a cross-patient deep learning model, aiming to eliminate the need for per-patient retraining. A novel degradation model generates realistic simulated data from digitally reconstructed radiographs (DRRs) to train a Restormer network, which transforms clinical fluoroscopic images into clean, DRR-like images. Subsequently, a DUCK-Net model, trained on DRRs, performs tumor segmentation. We conducted a feasibility study using a clinical dataset from 7 lung cancer patients, comprising 100 distinct treatment fields. The framework achieved an average processing time of 179.8 ms per image and demonstrated high accuracy: the median 3D Euclidean tumor center tracking error was 1.53 mm, with directional errors of 0.98±0.70 mm (LR), 1.09±0.74 mm (SI), and 1.34±0.94 mm (AP). These promising results validate our approach as a proof-of-concept for a cross-patient marker-less tumor tracking solution, though further large-scale validation is required to confirm broad clinical applicability. Full article
(This article belongs to the Special Issue Label-Free Cancer Detection)
Show Figures

Figure 1

33 pages, 9679 KB  
Article
Intelligent Defect Detection of Ancient City Walls Based on Computer Vision
by Gengpei Zhang, Xiaohan Dou and Leqi Li
Sensors 2025, 25(16), 5042; https://doi.org/10.3390/s25165042 - 14 Aug 2025
Cited by 1 | Viewed by 1588
Abstract
As an important tangible carrier of historical and cultural heritage, ancient city walls embody the historical memory of urban development and serve as evidence of engineering evolution. However, due to prolonged exposure to complex natural environments and human activities, they are highly susceptible [...] Read more.
As an important tangible carrier of historical and cultural heritage, ancient city walls embody the historical memory of urban development and serve as evidence of engineering evolution. However, due to prolonged exposure to complex natural environments and human activities, they are highly susceptible to various types of defects, such as cracks, missing bricks, salt crystallization, and vegetation erosion. To enhance the capability of cultural heritage conservation, this paper focuses on the ancient city wall of Jingzhou and proposes a multi-stage defect-detection framework based on computer vision technology. The proposed system establishes a processing pipeline that includes image processing, 2D defect detection, depth estimation, and 3D reconstruction. On the processing end, the Restormer and SG-LLIE models are introduced for image deblurring and illumination enhancement, respectively, improving the quality of wall images. The system incorporates the LFS-GAN model to augment defect samples. On the detection end, YOLOv12 is used as the 2D recognition network to detect common defects based on the generated samples. A depth estimation module is employed to assist in the verification of ancient wall defects. Finally, a Gaussian Splatting point-cloud reconstruction method is used to achieve a 3D visual representation of the defects. Experimental results show that the proposed system effectively detects multiple types of defects in ancient city walls, providing both a theoretical foundation and technical support for the intelligent monitoring of cultural heritage. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

23 pages, 88853 KB  
Article
RSW-YOLO: A Vehicle Detection Model for Urban UAV Remote Sensing Images
by Hao Wang, Jiapeng Shang, Xinbo Wang, Qingqi Zhang, Xiaoli Wang, Jie Li and Yan Wang
Sensors 2025, 25(14), 4335; https://doi.org/10.3390/s25144335 - 11 Jul 2025
Cited by 4 | Viewed by 1854
Abstract
Vehicle detection in remote sensing images faces significant challenges due to small object sizes, scale variation, and cluttered backgrounds. To address these issues, we propose RSW-YOLO, an enhanced detection model built upon the YOLOv8n framework, designed to improve feature extraction and robustness against [...] Read more.
Vehicle detection in remote sensing images faces significant challenges due to small object sizes, scale variation, and cluttered backgrounds. To address these issues, we propose RSW-YOLO, an enhanced detection model built upon the YOLOv8n framework, designed to improve feature extraction and robustness against environmental noise. A Restormer module is incorporated into the backbone to model long-range dependencies via self-attention, enabling better handling of multi-scale features and complex scenes. A dedicated detection head is introduced for small objects, focusing on critical channels while suppressing irrelevant information. Additionally, the original CIoU loss is replaced with WIoU, which dynamically reweights predicted boxes based on their quality, enhancing localization accuracy and stability. Experimental results on the DJCAR dataset show mAP@0.5 and mAP@0.5:0.95 improvements of 5.4% and 6.2%, respectively, and corresponding gains of 4.3% and 2.6% on the VisDrone dataset. These results demonstrate that RSW-YOLO offers a robust and accurate solution for UAV-based vehicle detection, particularly in urban scenes with dense or small targets. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

31 pages, 8699 KB  
Article
Transformer-Based Dual-Branch Spatial–Temporal–Spectral Feature Fusion Network for Paddy Rice Mapping
by Xinxin Zhang, Hongwei Wei, Yuzhou Shao, Haijun Luan and Da-Han Wang
Remote Sens. 2025, 17(12), 1999; https://doi.org/10.3390/rs17121999 - 10 Jun 2025
Viewed by 1423
Abstract
Deep neural network fusion approaches utilizing multimodal remote sensing are essential for crop mapping. However, challenges such as insufficient spatiotemporal feature extraction and ineffective fusion strategies still exist, leading to a decrease in mapping accuracy and robustness when these approaches are applied across [...] Read more.
Deep neural network fusion approaches utilizing multimodal remote sensing are essential for crop mapping. However, challenges such as insufficient spatiotemporal feature extraction and ineffective fusion strategies still exist, leading to a decrease in mapping accuracy and robustness when these approaches are applied across spatial‒temporal regions. In this study, we propose a novel rice mapping approach based on dual-branch transformer fusion networks, named RDTFNet. Specifically, we implemented a dual-branch encoder that is based on two improved transformer architectures. One is a multiscale transformer block used to extract spatial–spectral features from a single-phase optical image, and the other is a Restormer block used to extract spatial–temporal features from time-series synthetic aperture radar (SAR) images. Both extracted features were then combined into a feature fusion module (FFM) to generate fully fused spatial–temporal–spectral (STS) features, which were finally fed into the decoder of the U-Net structure for rice mapping. The model’s performance was evaluated through experiments with the Sentinel-1 and Sentinel-2 datasets from the United States. Compared with conventional models, the RDTFNet model achieved the best performance, and the overall accuracy (OA), intersection over union (IoU), precision, recall and F1-score were 96.95%, 88.12%, 95.14%, 92.27% and 93.68%, respectively. The comparative results show that the OA, IoU, accuracy, recall and F1-score improved by 1.61%, 5.37%, 5.16%, 1.12% and 2.53%, respectively, over those of the baseline model, demonstrating its superior performance for rice mapping. Furthermore, in subsequent cross-regional and cross-temporal tests, RDTFNet outperformed other classical models, achieving improvements of 7.11% and 12.10% in F1-score, and 11.55% and 18.18% in IoU, respectively. These results further confirm the robustness of the proposed model. Therefore, the proposed RDTFNet model can effectively fuse STS features from multimodal images and exhibit strong generalization capabilities, providing valuable information for governments in agricultural management. Full article
Show Figures

Figure 1

15 pages, 1774 KB  
Article
FreqSpatNet: Frequency and Spatial Dual-Domain Collaborative Learning for Low-Light Image Enhancement
by Yu Guan, Mingsi Liu, Xi’ai Chen, Xudong Wang and Xin Luan
Electronics 2025, 14(11), 2220; https://doi.org/10.3390/electronics14112220 - 29 May 2025
Cited by 3 | Viewed by 1193
Abstract
Low-light images often contain noise due to the conditions under which they are taken. Fourier transform can reduce this noise in frequency while preserving the image detail embedded in the low-frequency components. Existing low-light image-enhancement methods based on CNN frameworks often fail to [...] Read more.
Low-light images often contain noise due to the conditions under which they are taken. Fourier transform can reduce this noise in frequency while preserving the image detail embedded in the low-frequency components. Existing low-light image-enhancement methods based on CNN frameworks often fail to extract global feature information and introduce excessive noise, resulting in detail loss. To solve the above problems, we propose a low-light image-enhancement framework and achieve detail restoration and denoising by using Fourier transform. In addition, we design a dual-domain enhancement strategy, which cooperatively utilizes global frequency-domain feature extraction to improve the overall brightness of the image and the amplitude modulation of the spatial-domain convolution operation to perform local detail refinement to improve the quality of the image by suppressing noise, enhancing the contrast, and preserving the texture at the same time. Extensive experiments on low-light datasets show that our results outperform mainstream methods, especially in maintaining natural color distributions and recovering fine-grained details under extreme lighting conditions. We adopted two evaluation indicators, PSNR and SSIM. Our method improved the PSNR by 4.37% compared to the Restormer method and by 1.76% compared to the DRBN method. Full article
Show Figures

Figure 1

18 pages, 5611 KB  
Article
A Visible and Synthetic Aperture Radar Image Fusion Algorithm Based on a Transformer and a Convolutional Neural Network
by Liushun Hu, Shaojing Su, Zhen Zuo, Junyu Wei, Siyang Huang, Zongqing Zhao, Xiaozhong Tong and Shudong Yuan
Electronics 2024, 13(12), 2365; https://doi.org/10.3390/electronics13122365 - 17 Jun 2024
Viewed by 2104
Abstract
For visible and Synthetic Aperture Radar (SAR) image fusion, this paper proposes a visible and SAR image fusion algorithm based on a Transformer and a Convolutional Neural Network (CNN). Firstly, in this paper, the Restormer Block is used to extract cross-modal shallow features. [...] Read more.
For visible and Synthetic Aperture Radar (SAR) image fusion, this paper proposes a visible and SAR image fusion algorithm based on a Transformer and a Convolutional Neural Network (CNN). Firstly, in this paper, the Restormer Block is used to extract cross-modal shallow features. Then, we introduce an improved Transformer–CNN Feature Extractor (TCFE) with a two-branch residual structure. This includes a Transformer branch that introduces the Lite Transformer (LT) and DropKey for extracting global features and a CNN branch that introduces the Convolutional Block Attention Module (CBAM) for extracting local features. Finally, the fused image is output based on global features extracted by the Transformer branch and local features extracted by the CNN branch. The experiments show that the algorithm proposed in this paper can effectively achieve the extraction and fusion of global and local features of visible and SAR images, so that high-quality visible and SAR fusion images can be obtained. Full article
(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)
Show Figures

Figure 1

20 pages, 9093 KB  
Article
A Masked-Pre-Training-Based Fast Deep Image Prior Denoising Model
by Shuichen Ji, Shaoping Xu, Qiangqiang Cheng, Nan Xiao, Changfei Zhou and Minghai Xiong
Appl. Sci. 2024, 14(12), 5125; https://doi.org/10.3390/app14125125 - 12 Jun 2024
Cited by 3 | Viewed by 3297
Abstract
Compared to supervised denoising models based on deep learning, the unsupervised Deep Image Prior (DIP) denoising approach offers greater flexibility and practicality by operating solely with the given noisy image. However, the random initialization of network input and network parameters in the DIP [...] Read more.
Compared to supervised denoising models based on deep learning, the unsupervised Deep Image Prior (DIP) denoising approach offers greater flexibility and practicality by operating solely with the given noisy image. However, the random initialization of network input and network parameters in the DIP leads to a slow convergence during iterative training, affecting the execution efficiency heavily. To address this issue, we propose the Masked-Pre-training-Based Fast DIP (MPFDIP) Denoising Model in this paper. We enhance the classical Restormer framework by improving its Transformer core module and incorporating sampling, residual learning, and refinement techniques. This results in a fast network called FRformer (Fast Restormer). The FRformer model undergoes offline supervised training using the masked processing technique for pre-training. For a specific noisy image, the pre-trained FRformer network, with its learned parameters, replaces the UNet network used in the original DIP model. The online iterative training of the replaced model follows the DIP unsupervised training approach, utilizing multi-target images and an adaptive loss function. This strategy further improves the denoising effectiveness of the pre-trained model. Extensive experiments demonstrate that the MPFDIP model outperforms existing mainstream deep-learning-based denoising models in reducing Gaussian noise, mixed Gaussian–Poisson noise, and low-dose CT noise. It also significantly enhances the execution efficiency compared to the original DIP model. This improvement is mainly attributed to the FRformer network’s initialization parameters obtained through masked pre-training, which exhibit strong generalization capabilities for various types and intensities of noise and already provide some denoising effect. Using them as initialization parameters greatly improves the convergence speed of unsupervised iterative training in the DIP. Additionally, the techniques of multi-target images and the adaptive loss function further enhance the denoising process. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

24 pages, 17104 KB  
Article
Neural Network-Based Investigation of Periodic Noise Reduction Methods for High-Resolution Infrared Line Scanning Images
by Bohan Li, Yong Zhang, Weicong Chen, Yizhe Ma and Linhan Li
Remote Sens. 2024, 16(5), 841; https://doi.org/10.3390/rs16050841 - 28 Feb 2024
Cited by 1 | Viewed by 3476
Abstract
In the realm of neural network-based noise reduction, conventional models predominantly address Gaussian and blur artifacts across entire images. However, they encounter notable challenges when directly applied to periodic noise characteristics of high-resolution infrared sequential imagery. The high resolution also complicates the construction [...] Read more.
In the realm of neural network-based noise reduction, conventional models predominantly address Gaussian and blur artifacts across entire images. However, they encounter notable challenges when directly applied to periodic noise characteristics of high-resolution infrared sequential imagery. The high resolution also complicates the construction of suitable datasets. Our study introduces an innovative strategy that transforms two-dimensional images into one-dimensional signals, eliminating the need for processing the full image. We have developed a simulated dataset that closely mirrors natural infrared line scanning images derived from the FLIR dataset. To address low-frequency periodic noise, we propose two neural-network-based denoising approaches. The first employs a neural network to deduce noise from the one-dimensional signal, while the second utilizes discrete Fourier transforms for noise prediction within the frequency domain. Our experimental results highlight the Restormer model’s exemplary performance in direct noise prediction, where denoised images attain a PSNR of around 41 and an SSIM close to 0.9 on simulated data, leaving minimal residual noise in the actual denoised images. Further, we investigate the influence of Fourier coefficients, as predicted by neural networks, on the denoising process in the second approach. Employing 12 frequency coefficients, the Restormer and NAFNet models both reach a PSNR near 34 and an SSIM of approximately 0.842. Full article
Show Figures

Figure 1

24 pages, 9563 KB  
Article
Optical and SAR Image Registration Based on Pseudo-SAR Image Generation Strategy
by Canbin Hu, Runze Zhu, Xiaokun Sun, Xinwei Li and Deliang Xiang
Remote Sens. 2023, 15(14), 3528; https://doi.org/10.3390/rs15143528 - 13 Jul 2023
Cited by 13 | Viewed by 4090
Abstract
The registration of optical and SAR images has always been a challenging task due to the different imaging mechanisms of the corresponding sensors. To mitigate this difference, this paper proposes a registration algorithm based on a pseudo-SAR image generation strategy and an improved [...] Read more.
The registration of optical and SAR images has always been a challenging task due to the different imaging mechanisms of the corresponding sensors. To mitigate this difference, this paper proposes a registration algorithm based on a pseudo-SAR image generation strategy and an improved deep learning-based network. The method consists of two stages: a pseudo-SAR image generation strategy and an image registration network. In the pseudo-SAR image generation section, an improved Restormer network is used to convert optical images into pseudo-SAR images. An L2 loss function is adopted in the network, and the loss function fluctuates less at the optimal point, making it easier for the model to reach the fitting state. In the registration part, the ROEWA operator is used to construct the Harris scale space for pseudo-SAR and real SAR images, respectively, and each extreme point in the scale space is extracted and added to the keypoint set. The image patches around the keypoints are selected and fed into the network to obtain the feature descriptor. The pseudo-SAR and real SAR images are matched according to the descriptors, and outliers are removed by the RANSAC algorithm to obtain the final registration result. The proposed method is tested on a public dataset. The experimental analysis shows that the average value of NCM surpasses similar methods over 30%, and the average value of RMSE is lower than similar methods by more than 0.04. The results demonstrate that the proposed strategy is more robust than other state-of-the-art methods. Full article
(This article belongs to the Special Issue SAR Images Processing and Analysis)
Show Figures

Graphical abstract

19 pages, 6956 KB  
Article
A Triple Deep Image Prior Model for Image Denoising Based on Mixed Priors and Noise Learning
by Yong Hu, Shaoping Xu, Xiaohui Cheng, Changfei Zhou and Yufeng Hu
Appl. Sci. 2023, 13(9), 5265; https://doi.org/10.3390/app13095265 - 23 Apr 2023
Cited by 4 | Viewed by 4261
Abstract
Image denoising poses a significant challenge in computer vision due to the high-level visual task’s dependency on image quality. Several advanced denoising models have been proposed in recent decades. Recently, deep image prior (DIP), using a particular network structure and a noisy image [...] Read more.
Image denoising poses a significant challenge in computer vision due to the high-level visual task’s dependency on image quality. Several advanced denoising models have been proposed in recent decades. Recently, deep image prior (DIP), using a particular network structure and a noisy image to achieve denoising, has provided a novel image denoising method. However, the denoising performance of the DIP model still lags behind that of mainstream denoising models. To improve the performance of the DIP denoising model, we propose a TripleDIP model with internal and external mixed images priors for image denoising. The TripleDIP comprises of three branches: one for content learning and two for independent noise learning. We firstly use a Transformer-based supervised model (i.e., Restormer) to obtain a pre-denoised image (used as external prior) from a given noisy image, and then take the noisy image and the pre-denoised image as the first and second target image, respectively, to perform the denoising process under the designed loss function. We add constraints between two-branch noise learning and content learning, allowing the TripleDIP to employ external prior while enhancing independent noise learning stability. Moreover, the automatic stop criterion we proposed prevents the model from overfitting the noisy image and improves the execution efficiency. The experimental results demonstrate that TripleDIP outperforms the original DIP by an average of 2.79 dB and outperforms classical unsupervised methods such as N2V by an average of 2.68 dB and the latest supervised models such as SwinIR and Restormer by an average of 0.63 dB and 0.59 dB on the Set12 dataset. This can mainly be attributed to the fact that two-branch noise learning can obtain more stable noise while constraining the content learning branch’s optimization process. Our proposed TripleDIP significantly enhances DIP denoising performance and has broad application potential in scenarios with insufficient training datasets. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

Back to TopTop