Domain-Aware Few-Shot Learning for Optical Coherence Tomography Noise Reduction

Speckle noise has long been an extensively studied problem in medical imaging. In recent years, there have been significant advances in leveraging deep learning methods for noise reduction. Nevertheless, adaptation of supervised learning models to unseen domains remains a challenging problem. Specifically, deep neural networks (DNNs) trained for computational imaging tasks are vulnerable to changes in the acquisition system’s physical parameters, such as: sampling space, resolution, and contrast. Even within the same acquisition system, performance degrades across datasets of different biological tissues. In this work, we propose a few-shot supervised learning framework for optical coherence tomography (OCT) noise reduction, that offers high-speed training (of the order of seconds) and requires only a single image, or part of an image, and a corresponding speckle-suppressed ground truth, for training. Furthermore, we formulate the domain shift problem for OCT diverse imaging systems and prove that the output resolution of a despeckling trained model is determined by the source domain resolution. We also provide possible remedies. We propose different practical implementations of our approach, verify and compare their applicability, robustness, and computational efficiency. Our results demonstrate the potential to improve sample complexity, generalization, and time efficiency, for coherent and non-coherent noise reduction via supervised learning models, that can also be leveraged for other real-time computer vision applications.


I. INTRODUCTION
OCT employs low coherence interferometry to produce cross-sectional tomographic images of internal structure of biological tissue.It is routinely used for diagnostic imaging, primarily of the retina and coronary arteries [1].The resolutions obtainable are in the range 1 to 15 µm, with depth range of millimeters.Unfortunately, OCT images are often degraded by speckle noise [2], [3], creating apparent grain-like structures in the image, with size as large as the spatial resolution of the OCT system.Speckle noise significantly degrades images and complicates interpretation and medical diagnosis by confounding tissue anatomy and masking changes in tissue scattering properties.
Speckle suppression is often achieved by incoherent averaging of images with different speckle realizations [4], e.g., through angular compounding [5], [6].Averaging methods attempt to preserve the resolution while suppressing speckle arising from non-resolved tissue structure, yet some methods produce blurred images.Moreover, although effective at suppressing speckle in ex vivo tissues or in preclinical animal research, the additional time and data throughput required to obtain multiple speckle realizations can often make this approach incompatible with clinical in vivo imaging.
Consequently, many numerical algorithms attempt to computationally suppress speckle.To name a few: non-linear filtering [7], non-local means (NLM) [8], [9], and block matching and 3D filtering (BM3D) [10].The majority of these algorithms employ an image denoiser treating speckle as independent and identically distributed (i.i.d) Guassian noise.The solution can sometimes be sensitive to parameters' finetuning.Some algorithms also rely on accurately registered volumetric data, which is challenging to obtain in clinical settings.
Recently, the speckle reduction task has been extensively investigated from a supervised learning perspective [11], [12], [13].As known, most supervised learning data-driven methods require a large training dataset.In OCT, Dong et al.(2020) [14] trained a super-resolution generative adversarial network (SRGAN) [15], [16] with hardware-based speckle suppressed ex vivo samples defining the ground-truth.Namely, they used 200,000 speckle-modulating OCT images of size 800 × 600 for training.Chintada et al. (2023) [17] use a conditional-GAN (cGan) [18] trained with hundreds of retinal data B-scans, with NLM [9] as ground truth.Ma et al. [19] (2018) also use a cGAN to perform speckle reduction and contrast enhancement for retinal OCT images by adding an edge loss function to the final objective.The clean images for training are obtained by averaging of B-scans from multiple OCT volumes.
That said, there has been a growing amount of evidence demonstrating that supervised learning methods, specifically in the context of computational imaging and inverse problems, may require significantly smaller datasets.For example, it was observed that for image restoration for florescence microscopy [20], even a small number of training images led to an acceptable image restoration quality (e.g., 200 patches of size 64 × 64 × 16).Pereg et al. (2020) [21] used a single synthetic example for seismic inversion.
In learning theory, domain shift is a change of data distribution between the source domain (training dataset) and target domain (the test dataset).Despite advances in data augmentation and transfer learning, neural networks often fail to adapt to unseen domains.For example, convolutional neural networks (CNNs) trained for segmentation task can be highly sensitive to changes in resolution and contrast.Performance often degrades even within the same imaging modality.A general review of domain adaptation (DA) for medical image analysis can be found in [22].The different approaches are separated into shallow and deep DA models, further divided into supervised, semi-supervised and unsupervised DA, depending on the availability of labeled data in the target domain.Generally speaking, the appropriate DA approach depends on the background and the properties of the specific problem.Many DA methods suggest ways to map the source and target domains to a shared latent space.Whereas, generative DA methods attempt to translate the source to the target or vice versa.In our study, we focus on a simple yet efficient physicsaware unsupervised DA approach for the case of a change in the OCT imaging system.Namely, only unlabeled data is available for the target domain.This problem is also referred to in the literature as domain generalization [23], and it has been hardly explored in medical imaging so far [24].
In this work, we investigate few-shot learning as a powerful platform for OCT speckle reduction with limited ground truth training data.We prove that the output resolution of a supervised learning speckle suppression system is determined by the sampling space and the resolution of the source acquisition system.We also mathematically define the effects of the domain shift on the target output image.In light of the theoretical analysis, we promote the use of a patch-based recurrent neural net (RNN) framework to demonstrate the applicability and efficiency for few-shot learning for OCT speckle suppression.We demonstrate the use of a single image training dataset, that generalizes well.The proposed approach introduces a dramatic decrease in training time and required computational resources.Training takes about 2-25 seconds on a GPU workstation and a few minutes on a CPU workstation (2-4 minutes).We further propose novel upgrades for the original RNN framework and compare their performance.Namely, we introduce a one-shot patch-based RNN-mini-GAN architecture.We further demonstrate increased SNR achieved via averaging overlapping patches.Furthermore, we recast the speckle suppression network to a deblurring system.We further compare the 3 different RNN models' results with a patch-based one-shot learning U-Net [25].We illuminate the speckle reduction dependence of the acquisition system, via known lateral and axial sampling space and resolution, and offer strategies for training and testing under different acquisition systems.Finally, our approach can be applicable to other learning architectures as well as other applications where the signal can be processed locally, such as speech and audio, video, seismic imaging, MRI, ultrasound, natural language processing and more.The results in this paper are a substantial extension that replaces our non-published previous work ( [26], Section 6.2).

A. RNN Encoder-Decoder Framework
Assume an observation data sequence y = [y 0 , y 1 , ..., y Lt−1 ], y t ∈ R Lt×1 , t ∈ [0, L t − 1], and a corresponding output sequence x = [x 0 , x 1 , ..., x Lt−1 ], x t ∈ R P ×1 .The RNN forms a map f : y → z, from the input data to the latent space variables.That is, for input y t and state z t at time step t, the RNN output is generally formulated as z t = f (z t−1 , y t ) [27].Hereafter, we focus on the specific parametrization: where σ is an activation function, W zy ∈ R N ×nn and W zz ∈ R nn×nn are weight matrices and b ∈ R nn×1 is the bias vector.At t = 0 previous outputs are zero.Here, we use the ReLU activation function, ReLU(z) = max{0, z}.We wrap each cell with a fully connected layer with the desired final output x t ∈ R P ×1 , such that x t = FC(z t ).
Traditionally, RNNs are used for processing of time related signals, to predict future outcomes, and for natural language processing tasks such as handwriting recognition [28] and speech recognition [29].In computer vision, recurrent convolutional networks (RCNNs) were proposed for object recognition [30].Pixel-RNN [31] sequentially predicts pixels in an image along the two spatial dimensions.

B. Speckle Statistics
OCT tomograms display the intensity of the scattered light, as the log-valued squared norm of the complex-valued tomogram.It is assumed that the contributions from structural features beyond the imaging resolution of OCT add up coherently, and generate a random speckle pattern [2], [3].Speckle is not an additive statistically independent noise, but rather unresolved spatial information originating in the interference of many sub-resolution spaced scatterers [32].Speckle also plays an important role in other fields, e.g., synthetic-aperture radar, and ultrasound medical imaging.Exact analogs of the speckle phenomenon appear in many other fields and applications.The squared magnitude of the finite-time Fourier transform (FFT) (the periodogram) of a sample function of almost any random process shows fluctuations in the frequency domain that have the same single-point (pixel) statistics as speckle [33].Generally speaking, speckle appears in a signal when the signal is a linear combination of independently random phased additive complex elements.The resulting sum is a random walk, that may exhibit constructive or destructive interference depending on the relative phases.The intensity of the observed wave is the squared norm this sum .
As mention above, speckle in OCT arises from subresolution reflectors.In addition optical frequency domain imaging (OFDI) OCT is the FFT of its measured spectral components, and therefore exhibits noise typical to any periodogram.When a pixel's value is a result of a a sum of large enough number of reflectors, the sum, according to the central limit theorem, has a Gaussian distribution.In this case, assuming a uniform phase, for fully developed speckle, the intensity is distributed according to an exponential probability density, where y is the measured intensity pixel value, and x is the mean intensity, defining the ground truth.In other words, the fluctuations of fully developed speckle are of the same order as the the ground truth pixel value, which renders this type of noise to be particularly destructive and visually disturbing.
Speckle that is not fully developed would have more complicated distributions formulation, depending on the number of phasors, and their amplitudes and phases distribution.

III. DOMAIN AWARE SPECKLE SUPPRESSION
Let us denote f (z, x) ∈ C as the ground truth ideal tomogram perfectly describing the depth sample reflectivity.Here (z, x) : z, x ≥ 0, (z, x) ∈ R2 are continuous axial and lateral spatial axes.A measured tomogram can be formulated as where * denotes the convolution operation and α(z, x) is a point spread function (PSF).In the discrete setting, assuming F z s , F x s axial and lateral sampling rates respectively, and that the set of measured values at {z m , x n } lie on the grid m/F z A speckle suppressed tomogram can be viewed as the incoherent mean of coherent tomograms with different speckle realizations [3], [34], In OFDI OCT (using a wavelength-swept source) [35], [36], the axial range ∆ z is given by the central wavelength and the wavelength sampling.The axial sampling space is δz = ∆z nz , where n z the total A-line number of pixels.In the axial direction, the PSF effective width ω z is determined by the FFT of a zero-padded Hanning window.In the lateral direction, the PSF has a Gaussian shape proportional to exp(−2x 2 /w 2 x ), where the w x is referred to as the waist.δx is the lateral sampling space.Therefore, α[m, n] is separable and can be expressed as α Note that, the resolution and sampling rate are known parameters of an OCT imaging system.
In matrix-vector form, we denote an input (log-scaled) image Y ∈ R Lr×J that is a corrupted version of X ∈ R Lr×J , such that Y = X + N, where N ∈ R Lr×J is an additional noise term.Note that for the case of image despeckling we do not assume that the entries of N are neither i.i.d nor that it is uncorrelated with X1 .Our task is to recover X.That is, we attempt to find an estimate X of the unknown ground truth X.
Let us assume a source training set where {y i , x i } ∼ P S are image patches sampled from a source domain S as ground truth.The learning system is trained to output a prediction rule F S : Y → X.We assume an algorithm that trains the predictor by minimizing the training error (empirical error or empirical risk).The domain shift problem assumes a target domain T with samples from a different distribution {y i , x i } ∼ P T .
Assumption 1. (Speckle Local Ergodicity).Denote y i a patch centered around pixel i of the image Y. P Y (y i ) is the probability density of a patch y i .Under the assumption that pixels in close proximity are a result of shared similar subresolution scatterers we assume ergodicity of the Markov random field (MRF, e.g.[37]) of patches y i consisting of pixels in close proximity.In other words, the probability distribution of a group of pixels' values in close spatial proximity is defined by the same density across the entire image.This assumption takes into account that some of these patches correspond to fully developed speckle, non-fully developed speckle, and a combination of both.Note that the measured pixels' values are correlated.That said, this assumption could be somewhat controversial, particularly in surrounding of abrupt changes in the signal intensity.But since our images tend to have a layered structure, and the PSF visible range is about 7-9 pixels in each direction, we will make this assumption.
Defionition 2. (Sampling-Resolution Ratio).We define the lateral sampling-resolution ratio p x ωx δx in pixels, and the axial sampling-resolution ratio as p z ωz δz where [•] denotes rounding to the closest integer.That is, in a discrete setting, p z and p x are the number of pixels capturing the effective area of the PSF in each direction.The superscripts t and s denote the target and source respectively.
Theorem 3. (Domain-Shift Speckle Suppression Theorem).A (learned) speckle suppression mapping F S : y s → x s does not require domain adaptation.But, the output xt = F S y t resolution will be determined by the source domain resolution.Mathematically, denote α s [m, n], α t [m, n] as the discrete PSF in the source and target domain, respectively, such that where α s→t [m, n] and α t→s [m, n] are complementary impulse responses leading from one domain to the other 2 .When applying the trained system to the target input we have, We refer the reader to Appendix A for the proof and a detailed explanation.For example, if p s z = p t z , and p s x < p t x , there exist In other words, the output resolution is determined by the source resolution 1) If p s x < p t x or p s z < p t z then the system's prediction for an input in the target domain may have additional details or an artificially enhanced resolution details, that would not naturally occur with other denoising mechanisms.Examples illustrating this phenomena are illustrated in Fig. 1(e) and Fig. 3(e)-(f).Possible remedies: train with larger analysis patch size, longer training, upsampling (interpolation) of source images (or decimation of target images).

2) If p s
x > p t x or p s z > p t z then the network's output is blurred in the corresponding direction (e.g., Fig. 6(e) and Fig. 3(h)).Possible remedies: train with smaller analysis patch size , downsample (decimate) training image (or upsample target images).In this case the target has details that are smaller (in pixels) than the minimal speckle size of the source, which could be interpreted by the trained predictor as noise, thus the trained predictor may simply smear them out.Any combination of relations p s z,x ≶ p t z,x along the different image axes is possible.For our OCT data, the resolution ratio mostly differs in the lateral direction (see Table 1).Note that for some OCT systems sampling space is below Nyquist rate.Preprocessing domain adaptation stage can be applied either to the source data or the target data, interchangeably, depending on the desired target resolution.

IV. FEW SHOT LEARNING VIA RNN
The initial RNN setting described in this subsection has been previously employed for seismic imaging [38], [39], [21].Hereafter, the mathematical formulation focuses on the settings of the OCT despeckling task.Nonetheless, the model can be applied to a wide range of applications.We emphasize the potential of this framework, expand and elaborate its application, while connecting it to the theoretical intuition in Theorem 3. We also propose possible upgrades to further enhance the results in our case study.
Most OCT images have a layered structure, and exhibit strong relations along the axial and lateral axes.RNNs can efficiently capture those relations and exploit them.That said, as demonstrated below, the proposed framework is not restricted to images that exhibit a layered structure, nor to the specific RNN-based encoder-decoder architecture.
Definition 4 (Analysis Patch) [39].We define an analysis patch as a 2D patch of size Then the analysis patch associated with a point at location (i, j) is An analysis patch A (i,j) ∈ R Lt×Nx is associated with a pixel X[i, j] in the output image.To produce a point in the estimated X[i, j], we set an input to the RNN as an analysis patch, i.e., y = A (i,j) .Each time step input is a group of N neighboring pixels of the same corresponding time (depth).In other words, in our application T .We set the size of the output vector z t to one expected pixel (P = 1), such that x is expected to be the corresponding reflectivity segment, x = X[i − (L t − 1), j], ..., X[i, j] T .
Lastly, we ignore the first L t − 1 values of the output x and set the predicted reflectivity pixel X[i, j] as the last one, i.e, x Lt .The analysis patch moves across the image and produces all predicted points in the same manner.Each analysis patch and a corresponding output segment (or patch) are an instance for the net.The size and shape of the analysis patch defines the geometrical distribution of data samples for inference.
Despeckling Reformulated as Image Deblurring: Despite lowfrequency bias of over-parametrized DNNs [40], previous works [39] demonstrate the ability of the proposed framework in promoting high frequencies and super-resolution.To explore this possibility, we recast the framework described above to a deblurring task.This is achieved simply by applying a lowpass filter to the input speckled image and then training the system to deblur the image.Namely, given a noisy image Y, the analysis patches will be extracted from the input image Ŷ = HY, where H is a convolution matrix of a 2D low pass filter.We will refer to this denoiser as deblurring RNN (DRNN).
Averaging patches: Given a noisy image Y, an alternative approach is to decompose it into overlapping patches, denoise every patch separately and finally combine the results by simple averaging.This approach of averaging of overlapping patch estimates is common in patch-based algorithms [41], [42], such as Expected Patch Log-Likelihood (EPLL) [43].It also improves SNR since we are averaging for any pixel a set of different estimates.Mathematically speaking, the input analysis patch is still y = A (i,j) ∈ R Lt×Nx .But in this configuration the output in no longer a 1D segment, but a corresponding output 2D patch.In other words, T such that P = N x .
Incremental Generative Adversarial Network: Image restoration algorithms are typically evaluated by some distortion measure (e.g.PSNR, SSIM) or by human opinion scores that quantify perceived perceptual quality.It has long been established that distortion and perceptual quality are at odds with each other [44].As mentioned above, previous works adopt a two-stages training [14], [15].The first stage trains the generator with a content loss.While, in the second stage, initialized by the generator's pre-trained weights, we train both a generator G and a discrimnator D. Therefore, we propose adding a second stage of training with a combined MSE and adversarial loss, where λ is a constant balancing the losses.The generator G remains a patch-to-patch RNN-based predictor (with or without averaging patches).To this end, we design and showcase a patch-discriminator of extremely low complexity, that consists simply of 2 fully-connected layers.We will refer to this approach as RNN-GAN.

V. EXPERIMENTAL RESULTS
Here, we show examples of our proposed few-shot domainaware supervised learning despeckling approach with OCT experimental data, for demonstration.We investigated three one-shot learning challenging cases: (1) Matching tissue and matching acquisition systems, where we use one image or part of an image for training, and other images of the same tissue acquired by the same system for testing; (2) Tissue type mismatch; (3) Tissue type and acquisition system mismatch.Table I presents the acquisition parameters, namely, axial and lateral sampling spaces in tissue, N H -the effective number of measured spectral points vs N FFT -total number of FFT points after zero padding, ω x -waist in µm, axial and lateral sampling-resolution ratios in pixels, and cropped image sizes.
For all experiments, we set the number of neurons as n n = 1000.Increasing the number of neurons did not improve the results significantly, but increases training time.The analysis patch size is [15,15].Patch size can affect the results' higher frequencies.Larger patches create frequency bias in favor of lower frequencies.For the DRNN we used a Gaussian filter of size [7,7] and standard deviation σ = 1.For the RNN-GAN we employed overlapping patches averaging to promote additional noise reduction.Ex vivo OCT samples: As ground truth for training and testing, we used hardware-based speckle mitigation obtained by dense angular compounding, in a method similar to [5].That is, ground truth images for chicken muscle, blueberry, chicken skin and cucumber sample tissues, as presented in Figs.3-6(b), were acquired by an angular compounding (AC) system using sample tilting in combination with a model-based affine transformation to generate speckle suppressed ground truth data [46].Note that AC via sample tilting is not possible for in vivo samples.
Retinal Data: We used retinal data acquired by a retinal imaging system similar to [47].As ground truth for training and testing we used NLM-based speckle suppressed images [9].Note that NLM is considered relatively slow (about 23 seconds for a B-scan of size 1024 × 1024).Images were cropped to size 448 × 832.
Cardiovascular OCT: Finally, we tested our trained systems with OCT data of coronary arteries from two imaging systems.For this data we have no ground truth available.The first data referred to as Cardiovascular-I [35] was acquire with in-house built catheters, for human cadaver imaging.The human heart coronary second data, Cardiovascular-II [45], was acquired with a second clinical system, where there is usually a guidewire in place.Since imaging time is critical, only 1024 A-lines per rotation were acquired.
Figs. 1-8 depict the obtained despeckled predictions for ex-vivo samples, as well as for in vivo retinal data and intravascular OCT images, employing 4 methods: RNN, DRNN, RNN-GAN and U-Net [25].Please zoom-in on screen to see the differences.The U-Net has about 8.2 × 10 6 parameters (8 times the RNN number of parameters), and it trains on patches of size 64 × 64.Visually observing the results in different scenarios, overall, the proposed approach efficiently suppresses speckle, while preserving and enhancing visual detailed structure.
To test the DRNN's performance in different domains, we trained it with 100 columns of acquired in vivo human retinal cross section presented in Fig. 1(a).Fig. 1(b) presents the ground truth obtained as described in [9].As can be observed the DRNN approach generalizes well, both with matching tissues and imaging systems, as well as in cases of tissue and system mismatch.The DRNN produces good visual quality and efficiently suppresses speckle, even without preprocssing domain adaptation.As we theoretically established, applying the source trained system to a target with lower lateral sampling-resolution ratio indeed smooths the result, whereas a target input with higher lateral sampling-resolution ratio results in detailed structure with minor speckle residuals.Visually observing the other methods' results leads to similar conclusions.
We quantitatively evaluated the proposed approaches by comparing the peak signal to noise ratio (PSNR) and structural similarity index (SSIM) of their results with respect to the images assumed as ground truth.Table II compares the average PSNR and SSIM score for the above 4 methods with matching system and tissue.As can be seen, significant increase in PSNR and SSIM scores is achieved for all methods.RNN-GAN and U-Net have the highest scores in most cases.The U-Net usually yields the highest scores, yet, as can be observed in Fig. 4(e) it can produce unexpected visible artifacts in some cases.The U-Net has more capacity, therefore, it tends to memorize the training image better, but generalize worse.Note that PSNR and SSIM scores not always reliably represent perceptual quality or desired features of the images [44].Also, keep in mind that AC despeckled images are a result of averaging of numerous images, whereas our system's predictions rely solely on a single observation, therefore reconstructions are notably more loyal to the single observed speckled image.Furthermore, although AC images are referred to as ground truth, they may suffer from inaccuracies related to the stage tilting and its processing.
Table III provides quantitative scores for the proposed domain adaptation approach for various pairs of source and target, differing in acquisition system and tissue type, for RNN-GAN and U-Net.Notably, both approaches result in significant increase in PSNR and SSIM scores.Note that the images differ not only in their sampling-resolution ratio, but also by the nature of the ground truth used for training.Namely, AC images have different texture and visual appearance than NLM.Regardless of PSNR and SSIM scores the learning system often tends to adopt the visual characteristics of the source data.This tendency may also be perceived as an advantage in the absence of ground truth, as can be seen in Fig. 1(g).The observed speckled image may originate in many plausible reconstructions with varying textures and fine details, and different semantic information [48].The above results somewhat offer a user-dependent degree of freedom.Unfortunately, in our experiments domain randomization strategy [49]

VI. DISCUSSION & CONCLUSIONS
In this work, we analyzed the critical challenge of supervised learning domain adaptation for the task of OCT speckle suppression, with diverse imaging systems, given limited ground-truth data.We focused on a RNN patchbased approach that is both flexible and efficient in terms of patch-size and number of parameters, and less prone to low-frequency bias.We further designed a suitable adversarial loss training stage with dramatically reduced complexity.We also demonstrated the applicability of our proposed point of view to a U-Net.Future research can potentially investigate other architectures.For the challenge of domain adaptation, we illuminate a rather simplified point-of-view, that can facilitate efficient deployment both for research and industrial purposes.
Our results challenge the assumption that training data sets must consist of a large representation of the entire input data probability distribution, thus evading the curse of dimensionality.The proposed few-shot learning framework is potentially related to the information theoretical asymptotic equipartition property ([26], Section 2).Our results can inspire the design of novel few-shot learning systems for medical imaging, and can also be of interest to the wider deep learning and signal processing community.The models are formulated for two-dimensional (2D) signals, but can be easily be adapted to other data dimensions.Future work can investigate the applicability of the proposed approach to other tasks and to cross-modality domain adaptation.

Proof of Theorem 2
Let us assume source input-output pairs {y s , x s } ∈ R M×N , with distribution {y s , x s } ∼ P S (y s , x s ) in the source domain S, and samples {y t , x t } ∈ R M×N in the target domain T such that {y t , x t } ∼ P T (y t , x t ).The predictor is trained and generalizes well, such that where E (x,y)∼PX,Y (•) denotes the expectation over P X,Y , • 2 denotes the ℓ 2 norm, ε << min x 2 .As mentioned above, in the OCT speckle suppression problem, the coherent speckled input is While the corresponding incoherent output is formulated as Similarly, the target input is and the desired despeckled image in the target domain is We omit the scaled logarithm for simplicity.Applying the trained source system to the source data, we can assume ∀y s [m, n] ∈ R M×N ∃x s [m, n] : x s [m, n] ≈ F S (y s [m, n]), therefore applying the trained system to the target output we have, In other words, the system's output is its prediction for x [n] are separable PSFs.The predictor does not "know" that its input does not belong to the same domain, and it would output the corresponding prediction in the source domain.For the sake of the proof's outline, let us assume the target and source domains share an equivalent sampling resolution ratio in the axial direction, that is α s Therefore, Therefore, For example, when p s x < p t x and p s z > p t z , xt = F S y t , we have, .
failed to generalize well.Training of the proposed model is extremely fast.The number of epochs for the first training content-loss stage is 5-12 epochs, depending on analysis patch size, batch size and training image size.Adversarial loss training stage takes about

Fig. 1 . 2 ; 3 ;Fig. 2 .
Fig. 1.Retinal data speckle suppression: (a) Cross sectional human retina in vivo p s x = 2; (b) Despeckled (NLM) image used ground truth; (c) DRNN trained with 100 columns of retinal image p s x = p t x = 2; System mismatch: (d) DRNN following lateral decimation of the target input by a factor of 2, p t x = 1; (e) DRNN following lateral interpolating of the input, p t x = 3; System and tissue mismatch:(f) RNN-GAN trained with 100 first columns of muscle chicken and blueberry, p s x = 3; (g) RNN-GAN trained with 200 last columns of blueberry; (h) U-Net trained with blueberry image of size 256 × 256, p s x = 3. Scale bars, 200 µm.

Fig. 3 .Fig. 4 .Fig. 5 .
Fig. 3. Chicken muscle speckle suppression results: (a) Speckled acquired tomogram px = 3; (b) Ground truth averaged over 901 tomograms; (c) OCT-RNN trained with 100 first columns of chicken muscle; (d) RNN-GAN trained with 100 first columns of muscle chicken and blueberry, p s x = p t x = 3; (e) RNN-GAN trained with 200 columns of chicken decimated by a factor 8/3 in the lateral direction, p s x = 1.System and tissue mismatch: (f) DRNN trained with 100 columns of human retinal image, p s x = 2; (g) DRNN following lateral decimation of the target input by a factor of 4/3, p s x = p t x = 2; (h) DRNN following lateral decimation of the target input by 8/3, p t x = 1.Scale bars are 200 µm.

TABLE II AVERAGE
PSNR / SSIM OBTAINED FOR DIFFERENT METHODS AND DATASETS WITH TRAINING AND TESTING MATCHING ACQUISITION SYSTEMS AND TISSUE-TYPES.AVERAGE SCORES ARE OVER 100 TOMOGRAMS OF SIZE 256 × 256.
10-30 epochs.The total time of training is 5-25 seconds on a laptop GPU.Training without adversarial stages normally takes about 12 seconds.As a rule of thumb, training for too long can cause over-fitting, and blurry images.Training times t [m, n] * α t [m, n] = f s [m, n] * α s [m, n] = f s [m, n] * α t→s [m, n] * α t [m, n] S y t [m, n] = x s [m, n] y s [m,n]=y t [m,n] = f s [m, n] 2 * α s [m, n] 2 f t [m,n]=f s [m,n] * α t→s [m,n] .(17)Inotherwords, the resolution of the output will be determined by the source resolution α s [m, n].Furthermore, since the solution obeysf t [m, n] = f s [m, n] * α t→s [m, n], the tomogram component |f s [m, n]| isin a sense a speckled superresolved version of the target tomogram |f t [m, n]|.Note that (17) is true under the assumption that ∀f t [m, n]∃f s [m, n] : f t [m, n] = f s [m, n] * α t→s [m, n].In other words f s [m, n] is a super-resolved version of f t [m, n], which is safe to assume since the resolution of f [m, n] is well above the size of pointscatters particles.Specifically, for a Gaussian PSF we know the convolution of two Gaussian with mean µ 1 , µ 2 and variance σ 2 1 , σ 2 2 is a Gaussian with mean µ = µ 1 + µ 2 and variance σ 2 = σ 2 1 + σ 2 2 .The same proof applies to the cases where p s z ≶ p t z interchangeably, or a combination of p z , p x due to separability.For the general case, we can write (15)t [m, n] = x s [m, n] y t [m,n]=y s [m,n] = f t [m, n] * α s→t [m, n] 2 * α s [m, n] 2 .(16)Inotherwords, the output resolution is determined by the source resolutionα s [m, n].The tomogram component |f t [m, n] * α s→t [m, n]| is a lower resolution version of the desired target tomogram |f t [m, n]|.(2)If p s x > p t x , there exist α t→s x [n] such that α s x [n] = α t x [n] * α t→s x [n].Then(15)is satisfied if f F