Medical images are an important application of digital imaging combined with biology to visually express important medical information. Optical coherence tomography (OCT) is a rapidly evolving imaging model with high resolution, dynamic range, and non-invasive and non-destructive detection, suitable for detecting a variety of biological tissues [1
]. However, OCT is susceptible to speckle noise during acquisition, resulting in poor imaging results and preventing the physician from making accurate diagnoses [3
]. Therefore, OCT image denoising is very important for subsequent medical applications.
In the past few decades, a variety of methods have been developed for reducing the speckle noise in OCT images. These algorithms can be categorized as transformation-based, spatial filtering, and sparse representations. In the transform domain, there are mainly the wavelet transform [4
], the curvelet transform [6
], and the contourlet transform [7
]. In the transform domain, the coefficients representing the image are assumed to be distinguishable from the speckle noise by using shrinkage or threshold. However, when the decomposed sub-bands come to recomposition, the transform domain algorithms are prone to producing some artifacts. Spatial filtering techniques are mainly represented by non-local means (NLM) methods [8
], which have the advantage of the self-similarity of natural images by comparing patches with non-local neighborhoods [8
]. Based on the double Gaussian anisotropic kernels [9
], the uncorrupted probability of each pixel (PNLM) [10
], or the weighted maximum likelihood estimation [11
], the non-local means is able to reduce the speckle noise more effectively. Non-local means-based methods take advantage of the self-similarity of images in a non-local manner; they are usually more robust compared with local-based filtering. However, these techniques tend to over-smooth the image details and leave a few artifacts in the despeckling results. Sparse representation has been widely applied to signal processing, such as multi-focus image fusion [13
] and OCT image despeckling [14
]. Sparse representation techniques in OCT adopt a customized scanning pattern in which a few B-scans are captured with high signal-to-noise ratio (SNR) which can be used to improve the quality of low-SNR B-scans. Based on the high-SNR images, multi-scale sparsity-based tomographic denoising (MSBTD) [14
] trains the sparse representation dictionaries to improve the quality of neighboring low-SNR B-scans. Furthermore, in sparsity-based simultaneous denoising and interpolation (SBSDI) [15
], the dictionaries are improved by constructing from previously collected datasets high-SNR images from the target imaging subject. These sparse representation techniques can preserve most image details, but they tend to leave some noise in the despeckling result.
In recent years, with the advent of deep convolutional neural networks, the ability to exploit the spatial correlations and extract the features at multiple resolutions has been greatly improved by using a hierarchical network structure. Current advancements in neural networks have shown their great applicability in supervised and unsupervised signal preprocessing and classification. Many phases of the biosignal process have been augmented with the use of deep neural network-based methods, such as electron transport proteins identifying [16
] and OCT despeckling [19
]. Deep convolutional neural networks have become a popular tool for image denoising [21
]. The denoising CNN (DnCNN) [21
] method showed that the residual learning and batch normalization were particularly useful for image denoising. The denoising performance of CNN can be improved by symmetric convolutional-deconvolutional layers [22
] and the prior observation model [23
]. However, to train their numerous parameters, these networks generally require large amounts of high-quality data, which is an inherent limitation for OCT despeckling. To tackle this issue, frame averaging has been used for training the network [19
], but the results were not satisfactory since the ground truth images were obtained only by frame averaging and still contained speckle noise.
Deep image prior (DIP) [24
] shows that the structure of the generator network is sufficient to capture a large number of low-level image statistics—that is, before any learning has shown that CNN can produce a denoised image without a “clean” dataset or a training dataset. The main reason is that compared with the noise, the convolution layer can easily recover the signal which has more self-similarity. However, as the reconstruction loss of DIP only measures the L1 or L2 loss between the network output and the input image, it is not suitable for multiple speckle noise suppressions.
In this paper, a non-local deep image prior network is proposed for OCT image restoration by adding a sorted non-local statics as autocorrelation loss in the reconstruction function. The sorted non-local statics measures the signal autocorrelation by calculating the correlation between each patch with its non-local neighbors in the differences between the constructed image and the input image, and then sorting these correlation values to select the most correlated patches. By minimizing the signal autocorrelation loss in the DIP learning, more non-local similarity image statistics are captured by CNN in the process of OCT image restoration. It allows the recovery of most of the OCT signals.
In summary, the two main contributions in this paper are as follows:
(1) A sorted non-local statics is proposed which measures the signal autocorrelation by calculating and sorting the correlation between each patch with its non-local neighbors in the differences between the constructed image and the input image.
(2) The sorted non-local statics is used as an autocorrelation loss in the deep image prior learning framework to get rid of the speckle noise in the OCT image.