Denoising of Optical Coherence Tomography Images in Ophthalmology Using Deep Learning: A Systematic Review

Imaging from optical coherence tomography (OCT) is widely used for detecting retinal diseases, localization of intra-retinal boundaries, etc. It is, however, degraded by speckle noise. Deep learning models can aid with denoising, allowing clinicians to clearly diagnose retinal diseases. Deep learning models can be considered as an end-to-end framework. We selected denoising studies that used deep learning models with retinal OCT imagery. Each study was quality-assessed through image quality metrics (including the peak signal-to-noise ratio—PSNR, contrast-to-noise ratio—CNR, and structural similarity index metric—SSIM). Meta-analysis could not be performed due to heterogeneity in the methods of the studies and measurements of their performance. Multiple databases (including Medline via PubMed, Google Scholar, Scopus, Embase) and a repository (ArXiv) were screened for publications published after 2010, without any limitation on language. From the 95 potential studies identified, a total of 41 were evaluated thoroughly. Fifty-four of these studies were excluded after full text assessment depending on whether deep learning (DL) was utilized or the dataset and results were not effectively explained. Numerous types of OCT images are mentioned in this review consisting of public retinal image datasets utilized purposefully for denoising OCT images (n = 37) and the Optic Nerve Head (ONH) (n = 4). A wide range of image quality metrics was used; PSNR and SNR that ranged between 8 and 156 dB. The minority of studies (n = 8) showed a low risk of bias in all domains. Studies utilizing ONH images produced either a PSNR or SNR value varying from 8.1 to 25.7 dB, and that of public retinal datasets was 26.4 to 158.6 dB. Further analysis on denoising models was not possible due to discrepancies in reporting that did not allow useful pooling. An increasing number of studies have investigated denoising retinal OCT images using deep learning, with a range of architectures being implemented. The reported increase in image quality metrics seems promising, while study and reporting quality are currently low.


Introduction
Optical coherence tomography (OCT) stands at the forefront of modern medical imaging techniques, harnessing the power of low-coherence infrared light to delve deep into biological structures with unprecedented clarity and precision [1].This revolutionary technology affords longer exposure times due to its inherent biological safety, presenting a stark departure from the ionizing radiation associated with conventional X-rays.Moreover, when juxtaposed with Magnetic Resonance Imaging (MRI) and Computerized Tomography (CT), OCT emerges as a cost-effective alternative, democratizing access to high-quality diagnostic imaging.However, amidst the brilliance of OCT lies a challenge inherent to all imaging modalities-noise.Inevitably introduced during the imaging process, noise mingles with the signal emanating from the object under scrutiny, influencing the resultant intensity observed by the detecting pixel [2].Of particular concern is speckle noise, a byproduct of low coherence in irradiance, which casts a shadow over the signal-to-noise ratio, obscuring critical details within the imagery.Within the realm of ophthalmology, OCT serves as a versatile tool, facilitating the acquisition of cross-sectional and volumetric images that illuminate the intricate landscape of biological tissues and retinal structures.These images serve as invaluable aids in diagnosing a myriad of ocular diseases, ranging from diabetic retinopathy (DR) [3] to age-related macular degeneration (AMD) [4], guiding clinicians toward tailored treatment strategies.
Meanwhile, in the realm of computer science and Artificial Intelligence (AI), Machine Learning (ML) emerges as a beacon of innovation.By mining insights from past data, ML algorithms prognosticate future trends, obviating the need for explicit programming or human intervention [5].At its core, ML epitomizes the essence of pattern recognition, endowing computers with the capacity to glean insights from vast datasets with unparalleled efficiency.
Within the ML landscape, deep learning (DL) commands center stage, propelled by leaps and bounds in computational prowess and the proliferation of "big data".Convolutional Neural Networks (CNNs) epitomize this evolution, revolutionizing DL with their ability to extract features, classify images, and recognize patterns at breakneck speeds.By emulating the intricate workings of the human brain through the deployment of filters and intricate layers, CNNs herald a new era of computational efficiency.
The intersection of DL and OCT heralds a realm of boundless possibilities, marked by advancements in volumetric data handling, heightened sensitivity, and specificity in detecting structural alterations and the tantalizing prospect of denoising retinal images to unprecedented levels of clarity.This synthesis prompts a critical examination, as we delve into recent studies illuminating the applications of DL to OCT imagery, evaluating their impact on image quality assessment.
Moreover, as we stand on the precipice of a new era in medical imaging and computational innovation, it behooves us to explore the clinical ramifications of integrating these cutting-edge computational techniques into the fabric of healthcare delivery.Through a judicious examination of recent computational innovations and their potential clinical applications, we chart a course toward enhanced diagnostic accuracy, streamlined treatment pathways, and, ultimately, improved patient outcomes.Thus, this review not only serves as a testament to the symbiotic relationship between technology and healthcare but also as a compass guiding future research endeavors and clinical initiatives.

Overview of Optical Coherence Tomography
Optical coherence tomography (OCT) stands as a maturing imaging technology, offering resolution ranging from millimeters to sub-millimeters and boasting a penetration depth comparable to that achieved in human skin [6].This innovative technique predominantly employs low-coherence infrared light to safely delve into biological tissues, affording longer exposure times in contrast to X-rays. Figure 1 illustrates the conceptual system configuration of Michelson interferometry, the foundation upon which OCT operates.Within this setup, the interferometric probe beam, formed by recombined reflected beams at the beam splitter, is directed toward the surface under examination, with a detector poised to capture the backscatter emanating from this surface [7].
Central to the functioning of OCT is the notion of coherence, a defining characteristic of light wherein all rays maintain a consistent and calculable phase over a defined period.However, the utilization of low coherence, while advantageous in probing biological tissues, introduces an unintended consequence-the introduction of noise altering pixel intensity and distorting the resulting image.This phenomenon manifests as artifacts, leading to a compromised signal-to-noise ratio within OCT images [7]; while various forms of noise may afflict the imaging process, speckle noise emerges as the predominant type encountered in OCT imagery [8].
At the heart of OCT lies the Michelson interferometer, serving as its primary setup.Here, an optical probe directs low-coherent light toward the sample, penetrating its surface and awaiting the rebound of reflected light.Subsequently, this reflected light is channeled to the interferometer via an optical fiber for meticulous analysis, as depicted in Figure 1 [9].

Material and Methods
This paper presents a systematic review focusing on studies concerning diagnostic accuracy.The reporting of this study adheres to the PRISMA-DTA guidelines [10].

Eligibility Criteria
This systematic review is designed to address specific queries utilizing the PICO framework, which encompasses Population, Intervention, Control, and Outcomes.Our investigation centers on several key questions.Firstly, we aim to explore the various implementations and accuracy outcomes associated with employing deep learning (DL) techniques for denoising retinal optical coherence tomography (OCT) imagery.Secondly, we seek to evaluate the effectiveness of these DL-based denoising models in mitigating noise and enhancing the quality of retinal OCT images.Thirdly, we will assess the performance of DL-based denoising models against other denoising methods or untreated OCT images used as benchmarks.Finally, our primary objective is to evaluate the effectiveness of DL-based denoising models by analyzing improvements in image quality through the application of widely used image metrics.Inclusion criteria for studies are as follows: Our exclusion criteria encompass studies that do not meet specific standards, including a lack of clear explanation regarding the utilized DL model, absence of an effective comparative analysis with other state-of-the-art denoising methods, and classification as reviews rather than original research contributions.By meticulously delineating these inclusion and exclusion criteria, our goal is to ensure the robustness and reliability of the systematic review findings.This approach facilitates comprehensive insights into the utility and efficacy of DL in enhancing the quality of retinal OCT imagery.

Search Methods for Identifying Studies
An electronic search was conducted within the following electronic databases: Google Scholar, Scopus, Medline (via PubMed), Embase, and ArXiv, covering entries up to 16 November 2023.Results were filtered to include publications from 2010 onward, as deep learning (DL) for computer vision and image analysis gained prominence following its development in 2012 by Krizhevsky et al. [11].Language restrictions were not applied.Customized keywords were utilized for each database.Refer to Table 1 for the specific search queries employed.Moreover, additional studies were identified through screening conference proceedings and journal articles.Furthermore, manual cross-referencing of the bibliographies of included papers was performed to ensure comprehensive coverage.

Study Selection
To efficiently manage citations, BibTeX was employed as a tool.Initial screening involved the removal of duplicate entries based on titles and abstracts.Subsequently, a thorough evaluation of articles was conducted to identify eligible studies in accordance with the predetermined inclusion and exclusion criteria.This meticulous process ensured the selection of relevant and appropriate studies for the systematic review.Additionally, any discrepancies or uncertainties during the screening process were resolved through discussion among the research team, ensuring consistency and accuracy in study selection.

Data Collection and Extraction
Data collection was conducted independently from the included studies and meticulously revised to address any discrepancies or disagreements.Comprehensive information was extracted, encompassing various data items vital for analysis.These included bibliographic details such as authors' names and publication years, details regarding the data modality and type of dataset utilized, hardware specifications, and dataset size (including train/validation/test sets, if provided).Moreover, inclusion and exclusion criteria at the image level, if available, were noted, along with the specified objective of the study (e.g., image denoising, speckle reduction, super resolution).Information regarding pre-processing techniques, data augmentation strategies, and the deep learning (DL) approach employed, including the neural network (NN) architecture utilized, was also recorded.Additionally, details regarding the loss function employed and the image quality metrics used for evaluation were documented.The resulting findings from each study were thoroughly examined.In cases where an article compared multiple NN architectures, the most accurate one was reported to ensure clarity and consistency in the analysis.This comprehensive approach to data extraction facilitated a robust and thorough examination of the included studies, contributing to the reliability and validity of the systematic review findings.

Risk of Bias and Applicability
This review focuses specifically on the denoising aspect, and to assess the risk of bias in the included studies, we adapted and employed the QUADAS-2 tool.This modified tool encompasses four main domains addressing the risk of bias: the data selection, index test, reference standard, and flow-and-timing.Additionally, it evaluates three domains regarding the applicability of the study to patient selection, the index test, and the reference standard.
Within the "data selection" domain, we scrutinized papers with vague data-split strategies and limited information on the dataset, which could potentially lead to data leakage, indicating a high risk of bias.Moving on to the "index test" domain, we assessed indicators such as the lack of description of the model and the absence of details regarding test recreation and reproducibility.The "flow-and-timing" domain was evaluated based on indicators such as the implementation of multiple reference standards (i.e., state-of-the-art denoisers) within the same article and the appropriateness of intervals between the index test and reference standard.Finally, within the "reference standard" domain, we considered indicators like inadequate information on reference standard definition and the utilization of only one reference test.
In cases where concerns arose regarding the relevance of the studies, certain factors were meticulously reviewed.These included the dataset used, the procedure employed for creating clean data, the specific deep learning (DL) model utilized, and its performance concerning image quality metrics.Table 2 outlines the key questions utilized in our assessment process, providing a structured framework for evaluating the risk of bias across the included studies.Through this comprehensive approach, we aimed to ensure the rigor and reliability of our review findings.

Data Synthesis and Analysis
Due to the diverse array of study designs and image quality measures utilized, our quantitative synthesis was primarily confined to examining the outcomes related to image denoising and speckle reduction.Given the substantial variability in the image quality metrics employed for quantifying denoising and super resolution, the scope for direct comparison was somewhat constrained, while a minority of studies reported metrics such as contrast-to-noise ratio (CNR), equivalent number of looks (ENL), and the structural similarity index measure (SSIM), the predominant metrics presented were peak signal-tonoise ratio (PSNR) or signal-to-noise ratio (SNR).Consequently, conducting a comparative quantitative analysis across studies was limited to reporting any or all evaluation metrics mentioned, calculated by: L denotes the maximum possible pixel value, and MSE is the mean squared error of the image.P signal and P noise are the mean and standard deviation of pixel values, respectively.Next, the structural similarity index (SSIM) is a well-known image quality metric that focuses on perceived similarity.The SSIM focuses on texture, quality degradation and visible structures.The SSIM is defined as µ n , µ c , and σ n , σ c are the mean value and standard variation in noisy (n)-clean (c) image pairs, respectively.Lastly, contrast-to-noise ratio (CNR) utilizes ROIs of background and signal areas for speckle repression with respect to both areas.Equivalent number of looks (ENL) is a metric assessing the smoothing of the predicted image.It does not require a reference image since it utilizes selected ROIs of background and signal.ENL is defined as σ s is the standard deviation of the signal representation, and µ b is the mean value for background representation.CNR is calculated through µ s and σ s are the mean value and standard deviation of the signal representation, respectively.For background representation, µ b and σ b are the mean value and standard deviation.

Study Selection and Study Characteristics
Out of the initial pool of 4399 studies identified, a rigorous evaluation was conducted for 41 studies based on the criteria outlined in Table 2, utilizing their full texts.Subsequently, 54 studies were excluded following a thorough assessment of their full texts.The reasons for exclusion were carefully documented and categorized, with detailed explanations provided in Table S1.Ultimately, after meticulous manual screening, a total of 41 studies were deemed eligible for inclusion in our review.Notably, the number of studies included per year exhibited an upward trend over the observation period, as depicted in Figure 2.This trend underscores the increasing interest and attention devoted to the subject matter over time, highlighting the evolving landscape of research in this field.
The studies examined in this review were categorized into two main groups based on the type of optical coherence tomography (OCT) images utilized within the field of ophthalmology: Optic Nerve Head (ONH) images and retinal images.A summary of these studies is presented in Tables 3 and 4, respectively, highlighting the utilization of various deep learning (DL) models for tasks such as image denoising, speckle reduction, and super resolution.Specifically, retinal image datasets utilized in this review were predominantly public datasets purposely employed for denoising OCT images (n = 37).Notable datasets included DUKE [12], Topcon [13], OPTIMA [14], Cirrcus [15], and Heindberg [16].Conversely, ONH images (n = 4) primarily comprised private datasets created by researchers.The majority of studies (n = 39) incorporated multiple reference tests to evaluate their proposed methods against previous state-of-the-art denoisers.These reference tests encompassed both traditional programming and DL denoising models.Specifically, 26 studies implemented DL models and traditional programming as reference tests, while 9 studies utilized traditional programming-based state-of-the-art denoisers such as BM3D and NLM.However, five studies did not specify any established reference test.
Regarding the choice of dataset for denoising, the DUKE dataset (n = 22) was the most frequently utilized, followed by Topcon (n = 6), with ONH images being the least employed (n = 4).Various DL models were deployed and integrated into hybrid frame-works, as depicted in Figure 3, which illustrates the distribution of studies implementing each DL model.Notably, 44% of the studies implemented a hybrid generative adversarial network (GAN), with the super-resolution GAN (SR-GAN) being the most utilized (n = 3) alongside the conditional GAN (cGAN) (n = 3).Additionally, 13% of the studies employed a traditional U-Net model (n = 6) for denoising, with only one hybrid framework reported.Evaluation of denoising, speckle reduction, and studies on super resolution primarily relied on metrics such as PSNR and SNR as the major image quality indicators.Furthermore, other widely used image quality metrics included the SSIM (n = 27), CNR (n = 24), and ENL (n = 16).These metrics are comprehensively displayed in Tables 3 and 4, providing insight into the methodologies and outcomes of the reviewed studies.Gour, 2020 [25] Duke [12] and Topcon [13]

Risk of Bias and Applicability
Every study included in the review underwent a thorough assessment of risk of bias, with the results meticulously documented in Table S2.Out of the studies evaluated, 10 (27%) were identified as having a low risk of bias across all four domains.Notably, the domain presenting the most challenges was the "index test", with only 22 studies (59.5%) categorized as having a low risk of bias in this area.This finding underscores the importance of critically evaluating the methodology and execution of the index test within each study to ensure the reliability and validity of the findings.Through this rigorous riskof-bias assessment, we aimed to provide a comprehensive evaluation of the methodological robustness of the included studies, thereby enhancing the credibility and trustworthiness of our review outcomes.

Findings of the Studies
When focusing on peak signal-to-noise ratio (PSNR) and signal-to-noise ratio (SNR) as metrics for denoising images, a significant variation in deep learning (DL) techniques across different types of optical coherence tomography (OCT) images was observed.Specifically, for studies utilizing Optic Nerve Head (ONH) images, PSNR or SNR values ranged from 8.1 to 25.7 dB, while, for retinal datasets, these metrics ranged from 26.4 to 158.6 dB.
The majority of studies predominantly applied public retinal datasets such as Duke [12], Topcon [13], and Cirrus [15], primarily due to their extensive availability and widespread use.However, it is noteworthy that these datasets typically do not provide "clean" versus "noisy" image pairs required for denoising tasks.Consequently, researchers resorted to generating clean images through alternative means.For instance, clean images were generated by averaging multiple B-scans or employing traditional programming techniques.In some cases, unsupervised DL techniques were also utilized to generate clean images.These approaches aimed to provide a reliable basis for assessing the effectiveness of DL-based denoising methods, despite the absence of explicitly labeled clean and noisy image pairs within the datasets.

Discussion
Numerous DL tasks consisting of detection, segmentation and classification are challenging in ophthalmology since OCT is the main imaging technique in that specific field.OCT introduces speckle noise that adds artifacts, thereby impairing image quality and confounding correct clinical interpretation [59].Thus, DL has provided solutions for denoising and speckle reduction in OCT images to overcome this problem.In this systematic review, many studies were compiled and analyzed to assess the application of DL for image denoising of OCT images in ophthalmology.Over the past ten years, multiple researchers have shown a developing and promising body of evidence supporting DL for this task.Even so, there was a limited supply of quality studies for comparison across traditional programming and DL.A number of findings require more detailed discussion.The challenge of addressing numerous deep learning (DL) tasks in ophthalmology, including detection, segmentation, and classification, is compounded by the prevalent use of optical coherence tomography (OCT) as the primary imaging technique in this field.OCT introduces speckle noise, which can adversely affect image quality, thereby complicating accurate clinical interpretation.DL techniques have emerged as promising solutions for denoising and reducing speckle artifacts in OCT images, with this systematic review analyzing a multitude of studies to assess the application of DL for image denoising in ophthalmology.Over the past decade, researchers have generated a growing body of evidence supporting the efficacy of DL for this purpose.As shown in Figure 3, the main DL models implemented were the GAN, DnCNN, Autoencoder, U-Net, Noise2Noise, and transformer, etc.
Generative adversarial networks (GANs) exhibit promise in synthesizing synthetic OCT images with reduced noise levels.By training a generator network to generate realistic images and a discriminator network to differentiate between real and generated images, GANs have demonstrated their capability to denoise OCT scans effectively while preserving clinically relevant features.However, GANs can be challenging to train and prone to mode collapse, where the generator produces limited variations in images, potentially limiting their diversity and generalizability in clinical settings.In contrast, Denoising Convolutional Neural Networks (DnCNNs) utilize deep convolutional layers to understand the underlying structure of noisy OCT images and produce clean counterparts.Through iterative training on paired noisy-clean image datasets, DnCNNs efficiently enhance image clarity.However, DnCNNs may struggle with complex noise patterns and require large amounts of labeled data for training, which may be resource-intensive and time-consuming to acquire.The U-Net architecture, characterized by its symmetric encoder-decoder structure with skip connections, has showcased remarkable performance in semantic segmentation tasks, including the denoising of OCT images.By integrating contextual information from various spatial scales, U-Net effectively preserves anatomical structures.Nonetheless, U-Net architectures may suffer from memory inefficiency and computational overhead, particularly when handling high-resolution OCT images.Autoencoders, consisting of an encoder and decoder network, learn to reconstruct input data from a compressed representation.Trained on noisy OCT images, autoencoders can encode essential features suitable for clinical interpretation.However, autoencoders may struggle with capturing complex image structures and may require careful tuning of hyperparameters to achieve optimal performance.Transformers, initially designed for natural language processing tasks, have recently been applied in image processing, including denoising.Leveraging self-attention mechanisms to capture long-range dependencies, transformers effectively preserve spatial information in OCT images.However, transformers may suffer from scalability issues when applied to large-scale image datasets due to their computational complexity and memory requirements.The Noise2Noise approach involves training deep learning models directly on pairs of noisy images, eliminating the need for clean reference images during training.By exploiting inherent redundancies in noisy data, Noise2Noise denoises OCT images without access to ground truth clean images, making it suitable for real-world clinical applications where clean reference images may not be readily available.Nevertheless, Noise2Noise may struggle with highly variable noise patterns and may require careful regularization techniques to prevent overfitting to noise artifacts.
These innovative DL techniques have the potential to revolutionize clinical workflows in ophthalmology by automating the denoising process and improving the efficiency and accuracy of OCT image interpretation.By providing clinicians with high-quality, denoised images, these techniques can facilitate more confident diagnoses and treatment decisions, ultimately leading to improved patient outcomes.However, the scarcity of highquality studies for comparison across traditional programming and DL remains limited, necessitating a closer examination of key findings.
Firstly, a significant proportion of studies (62%) conducted their DL model training and testing on public datasets, validating their reported image quality metrics and comparing them with other state-of-the-art DL models (reference tests).These studies demonstrated relatively high PSNR or SNR values for denoised images, indicating substantial improvement in image quality attributable to DL models.Given the considerable noise inherent in OCT images, DL techniques hold significant promise for enhancing diagnosis, segmentation, and detection tasks.
Secondly, only studies utilizing the Duke [12] public dataset had access to clean versus noisy image pairs for training purposes.Other studies, utilizing datasets like Topcon [12] and Cirrus [15], lacked clean images and thus resorted to creating their own algorithms or applying traditional programming techniques to obtain "ground truths" for their B-scans.This variance in ground truth generation methods complicates cross-study comparisons and underscores the importance of clearly outlining and validating strategies for creating clean images to ensure the production of robust models.
Thirdly, there was considerable variability in both the reporting and conduct of the studies, posing challenges for comprehensive quantitative and qualitative analyses across the board.Given the wide array of available image quality metrics beyond PSNR or SNR (such as CNR, ENL, and SSIM), it was observed that studies typically reported only two or three metrics.This lack of uniformity in reporting was further exacerbated by the implementation of multiple and varied reference tests.For instance, several studies utilized a diverse range of reference tests, resulting in a vague impact assessment on either PSNR or SNR.Consequently, there is a pressing need for a standardized structure for image denoising studies, encompassing specific reference tests (including the appropriate balance of traditional programming and deep learning models) and a predetermined set of specified image quality metrics (such as PSNR, SSIM, CNR, and ENL).Consequently, the feasibility of conducting a meta-analysis was limited, as only a small subset of studies provided similar image quality metrics beyond PSNR or SNR.This situation underscores the significant consequences stemming from the lack of minimum standards and tools available in the field of image denoising, where diagnostic studies are currently limited to utilizing only two tools (STRAD-AI and QUADS-2).Since the studies included in this review solely reported either PSNR or SNR of the denoised images in comparison to multiple reference tests, it remains uncertain whether DL provides significant assistance in image denoising and speckle reduction.Ideally, the impact of DL for denoising OCT images in ophthalmology should be demonstrated in practice-based settings and validated by its ability to improve further objectives such as detection [60], classification [61], and segmentation [62], which the majority of included studies did not consider.
Fourthly, a significant proportion (74%) of papers were excluded from the review due to the absence of DL methodologies, instead focusing on improving traditional programming methods such as wavelets and shearlets [63], NLM [64], and BM3D [65].This suggests that DL has not been extensively investigated for OCT image denoising.The remaining papers were excluded due to inadequate descriptions of datasets and their utilization, as well as a lack of reference testing to demonstrate their impact on PSNR and SNR.
Fifthly, and perhaps most importantly, none of the studies incorporated input from clinicians regarding the produced results.This is particularly concerning considering that the primary purpose of denoising OCT images is to assist clinicians in diagnosing various retinal diseases.Ideally, clinicians should provide feedback on the reported denoised images, assessing whether useful data has been removed or added that could significantly impact diagnosis accuracy.
Lastly, it is essential to emphasize that OCT is the primary instrument in ophthalmology for capturing multiple types of images (including ONH and retinal images) crucial for detecting retinal diseases.This systematic review has revealed both strengths and limitations, with a systematic and extensive assessment of studies conducted to compare DL for image denoising, speckle reduction, and super resolution for OCT images in ophthalmology; while there was a restricted timeline justified by the focus on DL, a substantial and diverse body of evidence has been presented, validating the necessity for such limitations.Additionally, the absence of a meta-analysis was attributed to the mixed nature of reporting and the lack of quality in comparative result analysis.
Therefore, there has been a growing number of studies investigating the denoising of OCT images in ophthalmology using deep learning, with various computational architectures being explored.Among the reported metrics of image quality, the peak signal-to-noise ratio (PSNR) has emerged as a reliable metric for intercomparison, with values spanning from 8.1 to 25.7 dB for ONH images and 26.4 to 158.6 dB for retinal datasets.Moving forward, it is imperative for future studies to clearly outline reference tests and datasets, relying on a common, extensive, and clinically meaningful outcome basis to drive progress in the field.

Conclusions
In summary, the landscape of research in denoising Optical Coherence Tomography (OCT) images within ophthalmology has seen a notable surge, with a diverse array of studies delving into the application of deep learning techniques.This exploration has encompassed various computational architectures, reflecting a dynamic and evolving field seeking optimal solutions for image enhancement.Among the array of metrics used to evaluate image quality, the peak signal-to-noise ratio (PSNR) has emerged as a consistent benchmark for intercomparison, offering insights into the effectiveness of denoising methodologies.Notably, PSNR values have exhibited substantial ranges, underscoring the complexity and variability inherent in OCT image denoising efforts.Looking ahead, it is essential for future investigations to prioritize transparency and standardization in methodologies, particularly in outlining reference tests and datasets.By establishing a common foundation grounded in clinically meaningful outcomes, researchers can foster more robust advancements and ensure the translation of findings into tangible benefits for clinical practice and patient care.

Figure 1 .
Figure 1.A system diagram of the principle of Michelson interferometry used in OCT [9].The signal from a collimated light source (LS) is partitioned into two orthogonal beam paths by a beam splitter (BS); one collinear and the other normal to the ls pointing.The normal beam is reflected off a movable mirror (M1), the collinear off a fixed mirror (M2).The reflected beams are recombined at the bs and the co-propagating superposition is recorded as an interferogram at the detector (D).Image [9].

1 - 1 - 1 -
Are any data imbalances addressed in the article?2-Was the dataset split explained correctly for training, validation, and testing?3-Did the study collect sufficient noisy-clean image pairs?Index test Was the methodology sufficiently explained for reproducibility?2-Were the results of deep learning models explained without knowledge of state-of-the-art denoisers?3-Did the study apply any image quality metrics, sensitivity or robustness analysis on their model?Flow and Timing 1-Was the full dataset utilized in the analysis?2-Did each image have a reference clean image?3-Were the reference clean images produced similarly?4-Did the model show a sufficient interval between the reference and index test?Reference Standard Were state-of-the-art results of denoisers mentioned and utilized for interpretation?2-Did the study describe the noisy-clean image procedure and minimize bias?3-Were limitations, biases, and generalization issues reported sufficiently?

Figure 2 .
Figure 2. Flowchart of the search following the PRISMA guidelines.

Figure 3 .
Figure 3. Deep learning models approaches involved for denoising OCT images (n = studies).

Table 1 .
The results of the electronic search in multiple databases.

Table 2 .
Modified leading questions of QUADAS-2 for critical appraisal.

Table 3 .
Summary of findings in the selected studies utilizing Optic Nerve Head (ONH) datasets for image denoising, speckle reduction, and super-resolution.

Table 4 .
Summary of findings in the selected studies utilizing retinal datasets for image denoising, speckle reduction, and super-resolution.