Next Article in Journal
The Shrinking Blind Spot: How Freeze–Thaw Obscures Microscopic Evidence of Ante-Mortem Ecchymosis
Previous Article in Journal
Identifying Gastrointestinal Pathologies Using Point-of-Care Ultrasound
Previous Article in Special Issue
Impact of Deep Learning-Based Reconstruction on the Accuracy and Precision of Cardiac Tissue Characterization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

BanglaOCT2025: A Population-Specific Fovea-Centric OCT Dataset with Self-Supervised Volumetric Restoration Using Flip-Flop Swin Transformers

by
Chinmay Bepery
1,*,
G. M. Atiqur Rahaman
2,*,
Rameswar Debnath
2,
Sajib Saha
3,
Md. Shafiqul Islam
4,
Md. Emranul Islam Abir
5 and
Sanjay Kumar Sarker
6
1
Department of Computer Science and Information Technology, Patuakhali Science and Technology University, Patuakhali 8602, Bangladesh
2
Computer Science and Engineering Discipline, Khulna University, Khulna 9208, Bangladesh
3
The Australian e-Health Research Centre, CSIRO, Perth, WA 6151, Australia
4
Department of Ophthalmology, Sher-e-Bangla Medical College Hospital, Barishal 8200, Bangladesh
5
Department of Ophthalmology, Khulna Medical College Hospital, Khulna 9100, Bangladesh
6
Department of Vitreo-Retina, National Institute of Ophthalmology and Hospital, Dhaka 1207, Bangladesh
*
Authors to whom correspondence should be addressed.
Diagnostics 2026, 16(3), 420; https://doi.org/10.3390/diagnostics16030420 (registering DOI)
Submission received: 25 December 2025 / Revised: 22 January 2026 / Accepted: 25 January 2026 / Published: 1 February 2026
(This article belongs to the Special Issue 3rd Edition: AI/ML-Based Medical Image Processing and Analysis)

Abstract

Background: Age-related macular degeneration (AMD) is a major cause of vision loss, yet publicly available Optical Coherence Tomography (OCT) datasets lack demographic diversity, particularly from South Asian populations. Existing datasets largely represent Western cohorts, limiting AI generalizability. Moreover, raw OCT volumes contain redundant spatial information and speckle noise, hindering efficient analysis. Methods: We introduce BanglaOCT2025, a retrospective dataset collected from the National Institute of Ophthalmology and Hospital (NIOH), Bangladesh, using Nidek RS-330 Duo 2 and RS-3000 Advance systems. We propose a novel preprocessing pipeline comprising two stages: (1) A constraint-based centroid minimization algorithm automatically localizes the foveal center and extracts a fixed 33-slice macular sub-volume, robust to retinal tilt and acquisition variability; and (2) A self-supervised volumetric denoising module based on a Flip-Flop Swin Transformer (FFSwin) backbone suppresses speckle noise without requiring paired clean reference data. Results: The dataset comprises 1585 OCT volumes (202,880 B-scans), including 857 expert-annotated cases (54 DryAMD, 61 WetAMD, and 742 NonAMD). Denoising quality was evaluated using reference-free volumetric metrics, paired statistical analysis, and blinded clinical review by a retinal specialist, confirming preservation of pathological biomarkers and absence of hallucination. Under a controlled paired evaluation using the same classifier with frozen weights, downstream AMD classification accuracy improved from 69.08% to 99.88%, interpreted as an upper-bound estimate of diagnostic signal recoverability rather than independent generalization. Conclusions: BanglaOCT2025 is the first clinically validated OCT dataset representing the Bengali population and establishes a reproducible fovea-centric volumetric preprocessing and restoration framework for AMD analysis, with future validation across independent and multi-centre test cohorts.

Graphical Abstract

1. Introduction

Optical Coherence Tomography (OCT) functions as an “optical ultrasound,” utilizing low-coherence interferometry to provide non-invasive, micrometer-scale cross-sectional views of retinal microstructures [1]. In these grayscale tomograms, pixel intensity correlates with tissue backscattering properties; hyper-reflective (bright) regions indicate distinct layers such as the Retinal Nerve Fiber Layer (RNFL) and the Retinal Pigment Epithelium (RPE), while hypo-reflective (dark) regions represent the nuclear layers and vitreous humor [2]. Even minor textural or intensity deviations within these layers can signal early pathological changes associated with sight-threatening conditions.
OCT has become the clinical gold standard for diagnosing age-related macular degeneration (AMD) and other macular disorders [3]. However, the development of reliable automated diagnostic systems is constrained by two fundamental challenges: demographic dataset bias and volumetric image quality degradation.
First, majority of the publicly available OCT datasets, such as Duke SD-OCT, OCTA-500, and AROI, are based on Western (largely Caucasian) or East Asian patient cohorts [4,5,6]. Although these datasets are well curated and widely used, they provide limited representation of South Asian populations, where retinal anatomy, pigmentation, and disease characteristics may differ across ethnic groups [7]. As a result, AI models trained on Western OCT datasets may experience domain shift when applied to South Asian populations, reducing diagnostic reliability. Despite the growing burden of age-related macular degeneration, population-specific OCT datasets for South Asia—particularly Bangladesh—remain scarce. Such anatomical and demographic differences can alter OCT appearance and challenge model generalization, consistent with broader findings on limited cross-population transferability in medical AI [8].
To address this gap, we introduce BanglaOCT2025, a large-scale macular OCT dataset collected at the National Institute of Ophthalmology and Hospital (NIOH), Bangladesh, using NIDEK RS-330 Duo 2 and RS-3000 Advance systems under routine clinical workflows, with clinician-verified annotations and demographic metadata [9,10].
Second, the volumetric nature of OCT introduces redundancy and speckle noise. Standard macular scans contain up to 128 B-scans covering a 6 mm × 6 mm retinal region, many of which lie outside the diagnostically relevant central macula and increase computational cost and noise sensitivity [11]. Reliable 3D analysis, therefore, requires accurate localization of the fovea centralis; however, existing fovea detection methods based on global thickness or intensity heuristics are often sensitive to tilt, motion artifacts, and pathological deformation [12,13,14].
To address this, we propose a constraint-based centroid minimization algorithm that robustly localizes the fovea, even in tilted scans, and extracts a standardized 33-slice sub-volume. This fovea-centric design aligns with clinical practice, where AMD-related biomarkers are concentrated in the central macula, while reducing redundant computation.
Furthermore, raw OCT images are inherently affected by speckle noise [15]; it appears as a granular interference pattern that obscures fine pathological features, including drusen boundaries, subretinal fluid, and outer retinal layers. Conventional two-dimensional filtering methods often reduce this noise at the cost of blurring layer boundaries, thereby compromising diagnostic detail [16,17]. Recent machine-learning approaches, mainly based on 2D autoencoders, denoise OCT images by processing individual B-scans and typically rely on paired or reference images [18,19,20,21]. However, such methods largely ignore the intra- and inter-slice spatial relationships inherent to volumetric OCT data. Consequently, the use of collocated volumetric coherence for self-supervised OCT denoising remains underexplored.
To address this limitation, we introduce a self-supervised volumetric restoration framework based on a Flip-Flop Swin Transformer (FFSwin) backbone. By leveraging intra- and inter-slice anatomical coherence, the model suppresses speckle noise while preserving retinal structure, without requiring noise-free reference images.
In summary, this study makes four main contributions: (i) the introduction of BanglaOCT2025, the first clinically curated macular OCT dataset representing the Bengali population; (ii) a tilt-robust, constraint-based method for automated fovea-centric sub-volume extraction; (iii) a self-supervised volumetric denoising framework using an FFSwin backbone; and (iv) a comprehensive evaluation demonstrating that restoration-driven preprocessing enhances downstream AMD classification under severe class imbalance. Downstream classification is used solely as a task-oriented probe to assess preservation of diagnostically relevant information, while the primary focus of this work remains the dataset, preprocessing pipeline, and volumetric restoration methodology.

2. Materials and Methods

2.1. Dataset: BanglaOCT2025

2.1.1. Dataset Acquisition

BanglaOCT2025 was retrospectively collected (BanglaOCT2025) from the National Institute of Ophthalmology and Hospital (NIOH), Bangladesh, under routine clinical workflows. All data were de-identified in accordance with institutional data governance practices prior to analysis, with only limited demographic metadata (age and sex) retained for research purposes. No personally identifiable information was used in this study.
OCT examinations were acquired using NIDEK spectral-domain OCT systems, namely the RS-330 Duo 2 and RS-3000 Advance (NIDEK Co., Ltd., Gamagori, Japan), operated through the NAVIS-EX image management software. A total of 1419 patient records were initially retrieved from the NAVIS-EX system, comprising 1071 cases from the RS-330 Duo 2 and 348 cases from the RS-3000 Advance. After excluding incomplete, corrupted, or empty scan folders, 1071 valid patients and 1658 volumetric OCT scans were retained. A detailed breakdown of patient counts, scan validity, and slice statistics is summarized in Table 1.
Each macular OCT volume was acquired using a standard 6 mm × 6 mm fovea-centered protocol with 128 B-scans, following routine clinical practice including patient fixation guidance, optional pupil dilation, automated alignment, and manual confirmation of macular center to ensure consistent alignment and image quality for AMD assessment [2,23,24,25,26]. All B-scans were stored as grayscale images, where pixel intensity represents tissue backscattering and supports layer-wise (e.g., RNFL and RPE) retinal analysis.
Following quality control, a total of 1585 scans (202,880 B-scans) were retained for the BanglaOCT2025 dataset, as summarized in Table 2. Among these, 857 scans from 573 patients were selected for expert annotation. Diagnostic labels were assigned by experienced retina specialists into three clinical categories: DryAMD, WetAMD, and NonAMD. Each case was independently reviewed by multiple clinicians, and final labels were determined through consensus to enhance annotation reliability. This expert-driven labeling strategy is consistent with established best practices in ophthalmic imaging datasets [3,4,27]. Scans without expert labels were retained for unsupervised and restoration-only experiments.
To our knowledge, BanglaOCT2025 is the first large-scale, clinically curated OCT dataset focused on the Bengali population. By offering population-specific data with verified clinical labels, it fills an important gap in existing OCT resources and supports the development of robust ophthalmic AI systems for South Asian populations.

2.1.2. Ground Truth Labeling

To ensure clinical reliability, the dataset underwent a rigorous labeling process involving three retina specialists from Sher-e-Bangla Medical College Hospital (SBMCH), Khulna Medical College Hospital (KMCH), and NIOH. The specialists independently reviewed scans using the Nidek NAVIS-EX software (trial)version-1.12 19702-E201.
Two specialists (SBMCH and KMCH) independently labeled 573 patients (857 scans) as DryAMD, WetAMD, or NonAMD. Most cases showed consistent agreement; diagnostically ambiguous cases—primarily early-stage distinctions between DryAMD and WetAMD—were reviewed by a third specialist from NIOH. The final ground-truth labels were determined by a majority consensus, reflecting standard clinical practice [27,28]. The annotated dataset includes 857 scans (54 DryAMD, 61 WetAMD, and 742 NonAMD). The NonAMD category comprises both normal eyes and non-AMD retinal conditions, consistent with a clinically realistic “AMD vs. non-AMD” screening formulation.
Inter-grader reliability was assessed prior to consensus labeling. Agreement between the two primary graders across all 857 cases yielded a Cohen’s κ of 0.78, indicating substantial agreement. Of these, 812 cases were labeled identically, while 45 challenging cases were adjudicated by the third grader. Fleiss’ κ computed on this adjudicated subset was low (κ ≈ 0.01), reflecting the intentionally difficult nature of these borderline cases rather than poor annotation quality.
The BanglaOCT2025 dataset includes 1585 scans, including 728 unlabeled scans from 1071 patients aged 10.5–85.5 years. Ground-truth labels were assigned to patients aged 48.5–85.5 years, with the youngest WetAMD case observed at 49.5 years. In ground-truth labelling, a total of 81 patients are considered, ranging in age from 46 to 50.5 years, among 141 patients. Patient distribution according to age is summarized in Table 3. In this dataset, gender-wise dry AMD and wet AMD are presented in Table 4.

2.2. Constraint-Based Fovea-Centric Volume Extraction

Each B-scan was exported as an 8-bit grayscale bitmap image I ( i )     0,255 H × W ,   where i = 1 ,   2 , , 128 denotes the slice index, H and W represent the image height and width, respectively.
Accurate localization of the fovea is a critical prerequisite for reliable macular analysis, as automated fovea detection may be unreliable in the presence of retinal pathology, fixation instability, or scan tilt. Prior studies have shown that pathological deformation and acquisition artifacts can significantly affect automated fovea localization accuracy [12,13]. Unlike prior fovea detection approaches that rely on global thickness profiles or intensity heuristics, the proposed method introduces a constraint-aware, column-wise centroid formulation that is robust to tilt and pathological deformation. Therefore, for algorithm validation and benchmarking purposes only, the foveal center was manually identified following established clinical protocols reported in the literature [12,13]. All preprocessing and downstream experiments in this study rely exclusively on the proposed fully automated constraint-based centroid minimization algorithm, without manual intervention. During validation only, manual foveal localization was performed by identifying the B-scan with the deepest foveal pit within ±5 slices of the system-reported fovea, following established clinical protocols [29]. After foveal localization, a standardized fovea-centered sub-volume of 33 consecutive B-scans (fovea ±16 slices) was extracted to ensure consistent macular coverage across all subjects [11]. To reduce computational cost, a fovea-centered sub-volume of 33 slices was used. Although the fovea is typically located near the mid-volume (around slice 64), anatomical variation and patient motion can shift its position between slices 59 and 69 [12]. To ensure robust detection, the algorithm segments retinal tissue in the central region of each slice, computes a column-wise centroid to identify the retinal pit, applies a slice-level pit metric with anatomical penalty constraints, and extracts the foveal slice along with its 16 neighboring slices on each side.
In summary, the proposed constraint-based centroid minimization framework enables reliable, tilt-robust automated foveal localization and standardized 33-slice macular sub-volume extraction (Implementation Link). Algorithm 1 summarizes the proposed constraint-based fovea detection pipeline. Manual foveal identification is used solely for validation, while all reported analyses are based on the fully automated pipeline.
Algorithm1:  Constraint-Based Centroid Minimization
Parameters:
N adj = 16 (Adjacent slices on each side)
T slices = 128 (Total slices per volume)
R start = 59 ,   R end = 69 (preferred foveal range)
P = 200 (Penalty for out-of-range slices)
Procedure:
For each patient folder F :
a.
Initialize   metric   dictionary   M { }
b.
For each slice i = 1   to   128 :
  • Load   image   I oct _ c _ i . bmp
  • Extract   central   region   ( 35 %   width ) :   I c I [ : , 0.325 W : 0.675 W ]
  • Apply   Gaussian   blur :   I b GaussianBlur ( I c , ( 7,7 ) )
  • Compute   Otsu   threshold   T Otsu
  • Segment   tissue :   B ( I b T Otsu )
  • Compute   column   centroids   c y
  • M pit m i n ( c y )
  • If   i < 59   or   i > 69 :   M [ i ] M pit + 200
  • Else :   M [ i ] M pit
c.
Find   foveal   slice :   i fovea a r g m i n ( M )
d.
Calculate   range :   k start m a x ( 1 , i fovea 16 ) ,   k end m i n ( 128 , i fovea + 16 )
e.
If   k end > 128 : adjust range leftward
f.
If   k start < 1 : adjust range rightward
g.
Copy   slices   k start   through   k end to the output folder

2.2.1. Tilt-Robust Foveal Slice Detection and Macular Extraction Algorithm

A novel fovea detection algorithm was employed that integrates column-wise centroid analysis [30] with clinical range constraints, enabling robust slice localization under image rotation and tilt.
Central Region Extraction: To reduce peripheral noise while preserving the foveal pit, analysis was restricted to the central macular region, consistent with prior macula-focused OCT studies [11,31]. For anatomical relevance and computational efficiency, a central region covering 35% of the image width was extracted. This width was chosen to reliably include the foveal and parafoveal regions while excluding peripheral areas more prone to shadowing and curvature artifacts.
The parameters used for fovea detection (penalty value = 200, central width = 35%, and foveal slice range = 59–69) were chosen based on known macular anatomy and typical OCT acquisition geometry rather than dataset-specific fine-tuning. These settings define broad spatial constraints using relative proportions and slice indices, rather than absolute intensity thresholds. During method development, we observed that moderate parameter variations (e.g., ±5–8% change in central width or small shifts in slice range) did not meaningfully affect fovea localization, indicating that the procedure is not overly sensitive to precise parameter settings. Nevertheless, atypical scanning protocols may require adjustment, and systematic cross-scanner sensitivity analysis is left for future work.
The central region extraction is defined as:
I center ( i ) ( x , y )   =   I ( i ) x , y start ( i ) : y end ( i )
where, y start ( i )   =   W × 0.325   =   W × ( 0.5 0.175 ) and y end ( i )   =   W × 0.675   =   W × ( 0.5   +   0.175 ) .
Denoising with Gaussian Filtering: Speckle noise [15] was reduced using a 2D Gaussian filter with kernel size 7 × 7 . A mild Gaussian smoothing σ 1.0 was applied to suppress speckle-induced high-frequency fluctuations without blurring retinal layer boundaries while preserving layer geometry, consistent with prior OCT preprocessing studies [32,33].
I blur ( i )   =   I center ( i ) × G
where G ( x , y )   =   1 2 π σ 2 e x p x 2   +   y 2 2 σ 2 ,   σ   =   7   1 6 1.0 .
Adaptive Thresholding with Otsu’s Method: Retinal tissue was segmented using Otsu’s method, which automatically determines the optimal threshold T Otsu by maximizing between class variance to separate retinal tissue from background.
B ( i ) ( x , y )   =   1 if   I blur ( i ) ( x , y ) T Otsu 0 otherwise
where T Otsu is computed as:
T Otsu   =   a r g m a x 0 k < 256 ω 0 ( k ) ω 1 ( k ) μ 0 ( k ) μ 1 ( k ) 2
This procedure automatically finds the best intensity value to separate bright retinal tissue from darker background by maximizing the difference between the averages of the two groups of pixels.
Column-Wise (A-Scan) Centroid Analysis: The key innovation is computing centroids for each vertical column (A-scan) independently, rather than a single global centroid. This provides tilt-robustness. Unlike global centroid or thickness-based approaches, column-wise centroid estimation preserves robustness under scan tilt and local deformation [12,31]. For each segmented slice B ( i ) , the j -th A-scan is represented by a binary depth profile b j ( x )   =   B ( i ) ( x , j ) , x   =   1 , 2 , , H , capturing axial retinal tissue distribution. The zeroth moment (area) for the column j is M 00 ( i , j )   =   x = 1 H B ( i ) ( x , j ) , and the first vertical moment is M 01 ( i , j )   =   x = 1 H x B ( i ) ( x , j ) .
The column centroid (vertical position) is:
C y ( i , j ) = M 01 ( i , j ) M 00 ( i , j ) if   M 00 ( i , j ) > ϵ H otherwise   ( no   tissue )
where ϵ = 10 6 prevents division by zero.
Foveal Pit Metric—Minimum Column Centroid: The foveal pit corresponds to the thinnest point of the retina, which manifests as the highest position in the image (minimum y -coordinate). The metric for the slice i is:
M pit ( i )   =   m i n j J C y ( i , j )
where J is the set of all columns in the central region. This metric identifies the minimum (highest) centroid among all columns, corresponding to the foveal depression.
Efficient Vectorized Implementation: The algorithm uses NumPy vectorization for computational efficiency. Let B R H × W c denote the segmented central retinal region, where W c = y end y start . The column-wise zeroth and first-order moments are computed as m 00 = B 1 ,   m 01 = B y , where 1 is a vector of ones and y = [ 1 , , H ] . The column centroids are then defined as:
c y = m 01 m 00 , m 00 > ϵ H , otherwise
with ϵ = 10 6 to avoid division by zero. Finally,
M pit ( i ) = m i n ( c y )
Clinical Range Constraint with Penalty Function: To incorporate anatomical prior knowledge, slices outside the clinically expected foveal range (59–69) were penalized:
M ~ pit ( i )   =   M pit ( i ) if   59     i     69 M pit ( i )   +   200 otherwise
A penalty value of 200 was empirically chosen to suppress anatomically implausible slices while retaining strong foveal pit responses, ensuring a balance between robustness and flexibility. These values were chosen to represent anatomically plausible ranges rather than optimized thresholds, and were held fixed across all experiments.
Foveal Slice Identification: The optimal foveal slice index is:
i fovea   =   a r g m i n i { 1 , 2 , , 128 }   M ~ pit ( i )
Macular Sub-volume Extraction: A standardized macular sub-volume centered on the detected fovea was extracted:
V macula   =   I ( k ) : k start k k end
where k start   =   m a x 1 , i fovea 16   a n d     k end   =   m i n ( 128 , i fovea   +   16 ) .
A fixed 33-slice macular sub-volume centered on the detected fovea was extracted. Boundary conditions were handled by shifting the extraction window to ensure a consistent slice count without exceeding volume limits. Although rare outliers were observed in severely distorted or off-center scans, boundary-aware correction ensured anatomically valid sub-volume extraction in all cases.

2.2.2. Design Rationale and Robustness Analysis

Tilt-Robustness Analysis: Traditional global centroid methods fail when OCT scans are tilted. Consider a tilted scan where the retinal surface forms an angle θ with the horizontal. The true foveal pit depth Δ y true is preserved in column-wise minima but lost in global averaging:
C y global = 1 W c j C y ( j ) y + Δ y true 2 t a n θ
The method avoids this bias by computing m i n j C y ( j ) , which remains invariant to tilt.
Penalty Function: The penalty value P = 200 was chosen such that:
P > m a x i [ 59 , 69 ] M pit ( i ) m i n i [ 59 , 69 ] M pit ( i )
This ensures the preferred range dominates unless an out-of-range slice has an exceptionally strong foveal signal.
Computational Complexity: The algorithm operates in O ( H W c ) per slice, with vectorized operations achieving near-optimal performance. The total processing time for N patients is approximately O ( 128 N H W c ) . On a standard workstation (12th Generation Intel® Core™ i7-1255U Processor, Santa Clara, CA, USA), fovea detection required <0.3 s per volume, enabling scalable preprocessing of large datasets.

2.2.3. Parameter Summary

This subsection summarizes the key algorithmic parameters and anatomically motivated constraints used for tilt-robust foveal localization. These settings were selected based on clinical priors and empirical validation on the BanglaOCT2025 dataset. The complete parameter configuration and corresponding rationale are provided in Table 5.
Table 5. Algorithm parameters for tilt-robust foveal detection.
Table 5. Algorithm parameters for tilt-robust foveal detection.
ParametersValue and Rationale
Total   slices   ( T slices )128 (standard NIDEK protocol)
Adjacent   slices   ( N adj ) 16   +   fovea   +   16   33 - slice   sub - volume   covering   1.5 mm
Search range 59 69   ( clinically   expected   foveal   location   64   ± 5 slices)
Penalty   value   ( P )200 (empirically validated on the BanglaOCT2025 dataset)
Central width35% of the image (focus on the anatomically relevant region)
Gaussian kernel 7 × 7   ( σ 1.0 , balances noise reduction and edge preservation)
Threshold methodOtsu’s adaptive threshold (robust to brightness variation)

2.2.4. Validation and Error Handling

The algorithm incorporates robustness measures: A fallback mechanism defaults to the mid-volume slice (slice 64) when no valid foveal candidate is detected; Boundary checks ensure that the extracted sub-volume remains within the valid slice range (1–128); Empty or invalid scan folders are automatically skipped; and numerical stability is maintained using a small constant ϵ = 10 6 , which prevents division by zero in centroid calculations. These safeguards ensure reliable operation under real-world clinical conditions, including incomplete scans and acquisition failures.
In rare cases involving severe pathology, motion artifacts, or off-center acquisition, the raw foveal likelihood may peak outside the expected central macular region. For transparency, agreement statistics are reported both with and without a single extreme outlier, following standard practice in anatomical localization evaluation. To prevent such cases from affecting downstream analysis, explicit anatomical constraints are enforced on final foveal selection. When the estimated foveal position lies near the volume boundary, a boundary-aware extraction strategy adaptively shifts the fixed-size sub-volume while preserving valid slice limits. As a result, anatomically implausible detections do not propagate into denoising or classification.

2.3. Self-Supervised Volumetric Denoising Framework Using FFSwin Backbone

OCT imaging is inherently affected by speckle noise [15], which obscures retinal microstructures such as the Inner Segment (IS)/Outer Segment (OS) junction, Retinal Pigment Epithelium (RPE) band, and drusen morphology. Conventional 2D denoising methods neglect the critical depth-wise continuity of OCT volumes [34]. Following fovea-centric sub-volume extraction (33 slices; Section 2.2), a self-supervised volumetric denoising framework based on a Flip–Flop Swin (FFSwin) Transformer [35,36] backbone (FFSwin Architecture is used here as backbone. Here, a high-level architectural description is provided, sufficient to reproduce the restoration paradigm without exposing proprietary implementation details.) was employed (FFSwin Restoration Implementation link) to model the intra- and inter-slice contextual relationships without requiring clean reference volumes [37,38]. An overview of the proposed Flip–Flop Swin Transformer (FFSwin) denoising architecture is illustrated in Figure 1.
Inspired by self-supervised denoising principles such as Noise2Self, the method is formulated as a regularized denoising autoencoder that reconstructs OCT volumes from stochastically corrupted inputs without explicit blind-spot masking [39]. The denoising module acts as a pretext task, producing structurally enhanced volumes that support downstream AMD classification, achieving an upper-bound accuracy of 99.88% on BanglaOCT2025.
Figure 1. Block diagram of the FFSwin denoising model.
Figure 1. Block diagram of the FFSwin denoising model.
Diagnostics 16 00420 g001

2.3.1. Theoretical Premise: 3D Spatio-Temporal Consistency

Our approach relies on the distinction between biological tissue and imaging artifacts in 3D space:
Anatomical Continuity: Retinal layers (e.g., RPE, ILM) and pathologies (e.g., Drusen) are physically continuous structures. If a feature exists at coordinates   x , y   in slice   z , it likely exists near ( x , y ) in slice z 1 and z + 1
Noise Independence: Speckle noise is an interference pattern that is stochastic. A noise granule at ( x , y ) in slice z has no correlation with the pixel at ( x , y ) in slice z + 1
Therefore, a hypothesis is considered where a network forced to predict the content of co-located volumetric neighbors will naturally learn to preserve anatomy while suppressing independent noise.

2.3.2. Network Backbone: The “Flip-Flop” Abstraction

To implement this hypothesis, a custom Flip–Flop Swin Transformer (FFSwin) backbone is used that operates on 3D OCT volumes constructed from stacked 2D B-scans. Unlike standard 3D CNNs, FFSwin employs an alternating attention strategy to capture anisotropic features.
Intra-Slice Attention (Flop Mode): In the intra-slice (flop) mode, attention is applied within the 2D plane (X-Y) to learn local texture and edge information.
Inter-Slice Attention (Flip Mode): The attention window shifts along the depth Z-axis, enabling aggregation of anatomically corresponding features from co-located patches across adjacent slices (z−1, z+1).
This mechanism acts as a volumetric filter, reinforcing features that are spatially consistent across the depth dimension.
Note: The specific architectural micro-design of this backbone is outside the scope of this paper, which focuses on the restoration application and dataset validation.

2.3.3. High-Level Architecture (Decoder-Free Design)

Unlike standard autoencoders, our design does not contain a symmetric decoder. The pipeline consists of 3D Patch Embedding (Conv3D), FFSwin Backbone (encoder-only volumetric attention), Reconstruction Head (ConvTranspose3D), and Sigmoid activation to restore intensities to [0,1]. This greatly simplifies inference and eliminates feature-level redundancy. To improve architectural transparency and support reproducibility, Figure 2 provides the FFSwin denoising architecture. The diagram illustrates the hierarchical volumetric processing pipeline, including patch embedding, multi-stage transformer blocks, feature fusion across depth, and output reconstruction. Although certain low-level architectural parameters cannot be fully disclosed, the provided schematic and accompanying description specify the functional composition, data flow, and hierarchical design principles of the FFSwin backbone. These details are sufficient to reproduce the methodological behavior and experimental setup of the proposed approach, facilitating a meaningful comparison with existing denoising models.

2.3.4. Self-Supervised Training Strategy

Since perfectly clean ground-truth OCT data cannot be obtained physically, the training is formulated as a Self-Supervised reconstruction task. The raw volume itself is used as the supervision signal. Let V r a w R D × H × W be the input sub-volume extracted from the dataset. During training, a stochastic corruption function is introduced to generate a noisy volume V c o r r u p t by injecting additive Gaussian noise N .
V c o r r u p t   =   clamp ( V r a w + N ( μ   =   0 , σ = 0.15 ) , 0 , 1 )
The denoiser learns a mapping f θ :   V c o r r u p t V r a w ^ .
The network f θ maps the corrupted volume back to the original. The objective is to minimize the structural deviation between the reconstruction V r a w ^ and the clean target V r a w :
m i n θ   L ( V r a w ^ , V r a w )
The use of additive Gaussian noise in this study is not intended to model the physical speckle characteristics of OCT acquisition. Instead, the Gaussian perturbation is applied as a controlled, self-supervised regularization during training to prevent trivial identity mapping and to encourage the learning of anatomically consistent volumetric representations without clean reference data. Accordingly, the approach is best described as a self-supervised denoising autoencoder inspired by perturbation-based learning, rather than a strict Noise2Self or Noise2Void implementation that relies on explicit blind-spot masking [40,41]. Consistent with this design, additional validation was conducted to confirm preservation of clinically relevant pathological features (Section 3.3).
Although the network reconstructs the original volume, identity mapping is avoided through several mechanisms: stochastic noise injection, aggregation of contextual information across adjacent B-scans, and regularization with constrained model capacity. Together, these factors promote learning of noise-invariant, structure-aware representations rather than direct input replication.
Downstream classification performance is therefore reported only as a secondary, task-oriented indicator of restoration effectiveness. The denoising framework is not generative and does not synthesize new anatomy; instead, it preserves true retinal structures by exploiting volumetric continuity across neighboring slices, while suppressing spatially uncorrelated speckle noise. This design inherently reduces the risk of hallucinating or exaggerating pathological features.

2.3.5. Volumetric Patch Embedding and Context Modeling

The framework decomposes each OCT volume into non-overlapping 3D patches:
P i   =   ϕ V r a w ,   P i R d × p × p
where d is patch depth and p × p is the spatial size. The FFSwin backbone computes height–width–depth attention across co-located patches:
Z   =   A t t e n t i o n 3 D ( P i ,   P j )
where P j includes both intra-slice neighbors and inter-slice depth-shifted neighbors, enabling the model to exploit anatomical continuity.
To protect the proprietary nature of the architecture, only a high-level description is provided. Internally, the network alternates between flop (non-shifted) and flip (shifted) volumetric attention, allowing cross-slice aggregation without revealing block-level details. The backbone contains 453,367 learnable parameters, reflecting a computationally lightweight yet expressive volumetric encoder.

2.3.6. Reconstruction Loss Function

To prevent the model from learning the identity mapping, a hybrid objective function combining Mean Squared Error (MSE) and Sharpness Regularizer ( L 1 ) [40] is utilized.
L = | V ^ V r a w | 2 M S E   L o s s + λ   . | V ^ V r a w | S h a r p n e s s   R e g u l a r i z e r , L 1
where the MSE term ensures global intensity fidelity; L 1 term λ = 0.1 is crucial for preserving sharp layer boundaries.
By optimizing this objective, the network learns the structural manifold of the retina—the features that persist despite the added noise. Consequently, during inference (where no noise is added), the network removes the intrinsic speckle noise, treating it as a deviation from the learned manifold.

2.3.7. Training, Validation Protocol, and Convergence Analysis

A total of 857 expertly labeled OCT volumes were available (54 DryAMD, 61 WetAMD, and 742 NonAMD). To mitigate severe class imbalance, virtual augmentation was applied during training, expanding the dataset to 2226 volumes. Augmentation was used only during training of both the volumetric denoising model and the downstream classifier, while validation was performed exclusively on the original 857 real OCT volumes without augmentation. No independent test split was created due to the limited availability and high clinical cost of volumetric annotation.
During inference, OCT slices were resized and stacked into volumetric inputs for the FFSwin denoising model. Volumes with odd depth were padded to enable valid attention window partitioning and cropped back after reconstruction, as shown in Figure 3, ensuring compatibility with irregular real-world OCT scans. All reported metrics, therefore, reflect validation results under controlled conditions and should not be interpreted as deployment-level generalization.
The denoising model was trained for 50 epochs using automatic mixed precision (AMP). Training showed stable and monotonic convergence, with reconstruction loss decreasing from 0.00867 to 0.00768 as shown in Figure 4. Loss reduction was rapid in early epochs and stabilized by epochs 35–40, indicating effective optimization without instability or overfitting [41].
The modest yet consistent reduction in reconstruction loss (~11.4%) and its monotonic convergence indicate that the model progressively suppresses noise while preserving fine retinal structures during self-supervised training. The absolute loss values are low due to normalized intensities and the use of MSE + L_1—the monotonic convergence confirms that the model progressively enhances reconstruction fidelity.
When integrated into the classification pipeline, denoising had a pronounced effect. Using the same classifier with frozen weights, accuracy on raw OCT volumes was 69.08%, with poor WetAMD recall (0.21). After denoising, the identical classifier achieved an upper-bound accuracy of 99.88%, with class-wise precision and recall ranging from 0.98 to 1.00. This improvement reflects recovery of clinically meaningful features—particularly subtle WetAMD biomarkers—rather than simple smoothing, supporting volumetric denoising as a critical preprocessing step for automated AMD analysis.
All algorithmic evaluations in this study are performed at the OCT volume (scan) level, while clinical labeling and demographic statistics are defined at the patient level. Because multiple scans per patient are available, the present study does not aim to evaluate cross-patient generalization. Instead, a fixed cohort of patients was used throughout training and validation to enable a controlled, paired evaluation of volumetric denoising effects. Both the denoising and classification models were trained and evaluated on OCT volumes derived exclusively from this cohort, without any patient-wise splitting. No patient-wise train/validation/test split was performed, and no claims of deployment-level generalization are made. Accordingly, reported classification results should be interpreted as upper-bound estimates of diagnostic signal recoverability under controlled conditions rather than deployment-level performance.

3. Results

This section presents the experimental validation of the proposed BanglaOCT2025 dataset and the associated preprocessing pipeline. The results are organized into three parts. First, a summary of the statistical characteristics and clinical composition of the BanglaOCT2025 dataset to establish its representativeness and diagnostic relevance (Section 3.1). Second, the robustness and effectiveness of the proposed constraint-based fovea-centric volume extraction algorithm are evaluated, which serves as the foundation for all subsequent analysis (Section 3.2). Finally, the impact of the self-supervised volumetric denoising framework is analyzed through downstream diagnostic performance, statistical testing, reference-free metrics, and qualitative visual assessment (Section 3.3).
All experiments were conducted on automatically extracted fovea-centered 33-slice OCT sub-volumes from raw 128-slice macular cubes, ensuring consistent anatomical alignment across patients and diagnostic categories.

3.1. BanglaOCT2025 Characteristics and Clinical Composition

BanglaOCT2025 is a large-scale OCT dataset representing a South Asian population and addresses a major gap in existing public resources. After quality control, 1585 OCT volume scans (202,880 B-scans) from 1071 patients were retained, including 857 expert-labeled scans for evaluation. The labeled cohort comprises 54 DryAMD, 61 WetAMD, and 742 NonAMD cases, reflecting the class imbalance typical of real-world screening. Patient ages ranged from 48.5 to 85.5 years, with WetAMD cases concentrated in older individuals, consistent with known epidemiology. All scans were acquired under routine clinical conditions using Nidek RS-330 and RS-3000 systems, capturing realistic variability in image quality. Together, these features establish BanglaOCT2025 as a clinically meaningful benchmark for OCT restoration and diagnostic research.

3.2. Evaluation of Constraint-Based Fovea-Centric Volume Extraction

Accurate fovea localization is essential for macular analysis [11], as AMD biomarkers are concentrated near the foveal pit. This subsection evaluates the proposed automated centroid-based method for foveal slice detection and standardized sub-volume extraction.

3.2.1. Robustness of Automated Foveal Slice Detection

The algorithm reliably identified the foveal slice across the dataset without manual input during deployment. By combining column-wise centroid analysis with a clinically motivated penalty constraint, the method remained robust to common acquisition artifacts, such as retinal tilt, uneven illumination, and pathological deformation. No boundary violations or extraction failures were observed after application of the anatomical constraints and boundary-aware windowing strategy.
Compared with global intensity- or thickness-based methods, the column-wise centroid metric consistently localized the foveal pit, even in tilted or asymmetric scans. Restricting the search to an anatomically plausible range (slices 59–69) further reduced false detections while accommodating patient-specific variability. Manual foveal confirmation was used only for validation; all reported results rely on the fully automated pipeline. No systematic failure patterns were observed across the evaluated volumes; rare extreme deviations occurred in isolated cases and were effectively handled by the enforced anatomical constraints and boundary-aware extraction strategy.
To quantitatively evaluate the reliability of automated fovea localization, clinician-selected foveal slices were compared with automated detections on an independent set of 50 anonymized OCT volumes. The mean absolute difference between automated and expert-selected foveal slices was 2.65 ± 3.26 slices (range: 0–23). This distribution indicates close overall agreement between automated and manual localization without evidence of systematic bias. Excluding a single extreme outlier case, the mean absolute slice difference decreased to 2.21 ± 1.54 slices (range: 0–7), confirming stable and consistent localization performance across the remaining cases.

3.2.2. Standardization of Macular Sub-Volumes

After fovea detection, a fixed 33-slice sub-volume (fovea ±16 slices) was extracted for each scan, ensuring anatomically consistent macular coverage while removing peripheral redundancy. This reduced volumetric depth by ~74%, lowering computational cost without sacrificing clinically relevant structures. Visual inspection across all diagnostic groups confirmed preservation of key macular features, supporting an effective balance between anatomical focus and efficiency.

3.2.3. Role of Fovea-Centric Extraction in Downstream Analysis

All denoising, classification, and evaluation experiments (reported in Section 3.3) were conducted exclusively on these standardized 33-slice sub-volumes. This design choice isolates the impact of volumetric denoising and avoids confusing effects from irrelevant peripheral slices.
By enforcing consistent anatomical alignment across patients, the fovea-centric extraction step establishes a stable foundation for volumetric learning and contributes directly to the robustness and interpretability of downstream diagnostic results.

3.3. Self-Supervised Volumetric Restoration Framework Using FFSwin Backbone

It is essential to note that downstream classification performance is not a standalone sufficient metric for denoising quality in this study. Instead, classification results are reported as a complementary, task-oriented indicator that reflects whether clinically relevant features remain discriminative after restoration. Primary evaluation of the denoising framework is based on reference-free volumetric metrics, blinded clinical assessment, and paired statistical analysis.

3.3.1. Classification Performance on BanglaOCT2025 Dataset

Classification performance of the Flip-Flop Swin Transformer (FFSwin) was assessed on BanglaOCT2025 using patient-wise evaluation. The dataset included 857 scans from 573 patients (54 DryAMD, 61 WetAMD, and 742 NonAMD), each represented by 33 fovea-centric B-scans. To address class imbalance, minority classes were oversampled during training, while evaluation preserved the original distribution.
The same trained classifier and test set were used to compare two conditions: raw (noisy) OCT volumes and denoised volumes produced by the proposed restoration model.
Performanceon Raw OCT Volumes: When evaluated on raw OCT volumes, the classifier achieved an overall validation accuracy of 69.08%; however, performance varied substantially across classes. Precision for the NonAMD category was high (0.95), indicating reliable identification of eyes without AMD. In contrast, WetAMD detection was markedly limited, with a recall of 0.21, indicating that most WetAMD cases were not correctly identified under noisy input conditions. This result highlights the sensitivity of WetAMD biomarkers to speckle noise and reduced image clarity, as summarized in Table 6.
Here, the NonAMD category encompasses both normal eyes and other non-AMD retinal conditions. Accordingly, these results primarily reflect the system’s ability to distinguish AMD from non-AMD presentations in a screening-oriented setting, rather than fine-grained discrimination among heterogeneous non-AMD pathologies.
The macro-averaged F1-score was 0.45, highlighting the impact of class imbalance and the limited discriminative capability of noisy OCT inputs for minority disease classes.
Performance on Denoised OCT Volumes: All results in this subsection are based on validation using the original set of 857 real OCT volumes. When evaluated on denoised inputs, the trained classifier achieved a validation accuracy of 99.88%. Here, classification performance is used solely as a task-oriented probe of restoration quality, rather than as evidence of independent clinical generalization. The classifier was used as a fixed diagnostic probe with frozen weights and identical hyperparameters for both raw and denoised volumes, ensuring that performance differences reflect the impact of denoising rather than changes in model training. Importantly, because no independent patient-wise hold-out test set was available, the reported near-ceiling classification performance should be interpreted strictly as an upper-bound estimate of diagnostic signal recoverability under controlled, paired evaluation conditions.
As summarized in Table 7, class-wise recall reached 1.00 for DryAMD, 0.98 for WetAMD, and 1.00 for NonAMD, with a macro-averaged F1-score of 0.99, indicating strong class separability despite pronounced class imbalance. In contrast, evaluation on the corresponding raw (noisy) volumes yielded a validation accuracy of 69.08%. Because both evaluations were performed on the same non-augmented dataset, this paired comparison isolates the effect of volumetric denoising on input signal quality.
Downstream classification accuracy is reported here as a secondary, task-oriented indicator of restoration effectiveness rather than a direct measure of image fidelity. The observed improvement reflects enhanced signal-to-noise characteristics and inter-slice consistency, not the creation of new diagnostic features. Together with blinded clinical validation and structure-preservation analysis (Section 3.3.2), these findings suggest that volumetric denoising improves diagnostic visibility while preserving retinal anatomy. The near-ceiling accuracy should therefore be interpreted as an upper-bound estimate of diagnostic signal recoverability within BanglaOCT2025, rather than deployment-level performance.

3.3.2. Blinded Clinical Validation of Denoising

To complement algorithmic evaluation, a blinded qualitative clinical assessment was performed by an experienced retina specialist who was not involved in model development and was unaware of the processing status. A randomly selected subset of 27 OCT volumes spanning DryAMD, WetAMD, and NonAMD cases was reviewed. For each case, raw and denoised volumes were presented side-by-side in randomized order.
The clinician assessed preservation of key anatomical features (retinal layer integrity, foveal contour, RPE continuity, drusen morphology, and fluid-related signs) and the presence of artificial artifacts. The clinician independently scored each volume using a 5-point Likert scale for (i) preservation of pathological features (5 = best preserved, 1 = worst preserved) and (ii) presence of artificial features or artifacts (5 = severe artifacts, 1 = minimal artifacts). Denoised volumes achieved higher scores for pathology preservation (mean 4.39 vs. 3.30) and lower artifact scores (mean 1.31 vs. 2.39) compared to raw images, as shown in Table 8. No artificial structures, exaggerated pathology, or anatomically implausible alterations were observed.
These findings indicate that the proposed volumetric denoising improves visual clarity while preserving clinically relevant retinal anatomy, reducing the likelihood that downstream performance gains arise from artificial feature amplification. We note that Gaussian noise was used as a self-supervised perturbation rather than a physical speckle model; broader quantitative benchmarking against alternative volumetric denoising methods remains an important direction for future work.

3.3.3. Denoising Effectiveness via Downstream Diagnostic Task

In the absence of noise-free ground-truth OCT images, the effectiveness of the proposed denoising approach was evaluated indirectly through its impact on a downstream diagnostic task, following a task-driven evaluation paradigm commonly adopted in medical image analysis [21,39].
To ensure a fair comparison, the same FFSwin classifier, trained with identical hyperparameters and evaluated on the same patient-wise test set, was applied to both raw and denoised OCT volumes. The difference between the two evaluation settings was the quality of the input data, as illustrated in Figure 5. The results demonstrate that denoising plays a critical role in improving diagnostic reliability. While raw OCT data led to substantial misclassification of AMD subtypes, denoised OCT volumes enabled nearly perfect recognition of both DryAMD and WetAMD cases. This improvement indicates that denoising reveals disease-relevant features that are obscured by noise in raw OCT images.
Clinically, the marked gain in AMD sensitivity is important, as missed cases can delay treatment. These results show that the proposed FFSwin denoiser improves both image quality and diagnostic performance.

3.3.4. Class-Imbalance-Aware Analysis

BanglaOCT2025 shows a strong class imbalance: NonAMD cases comprise about 87% of the dataset, while DryAMD and WetAMD together account for 13%. In such settings, overall accuracy can be misleading, masking poor performance on minority classes.
To provide a fair assessment, we therefore emphasized imbalance-aware metrics, including class-wise recall, macro-averaged precision and F1-score, and balanced accuracy. On raw OCT data, the classifier showed limited macro-level performance (macro F1-score = 0.45), reflecting weak recognition of AMD subtypes. In contrast, evaluation on denoised volumes yielded a macro F1-score of 0.99, indicating consistently high performance across all classes.
Notably, the recall for DryAMD and WetAMD improved from 0.78 and 0.21 (raw OCT) to 1.00 and 0.98 (denoised OCT), respectively. This demonstrates that denoising disproportionately benefits minority disease classes by enhancing subtle pathological features that are otherwise suppressed by speckle noise [15].
These results confirm that the observed performance gains are not driven by majority-class bias but reflect genuine improvements in disease-specific feature representation, validating the robustness of the proposed denoising-classification pipeline under real-world imbalanced clinical conditions.

3.3.5. McNemar’s Test for Paired Diagnostic Outcomes

To evaluate whether denoising significantly affected patient-level diagnostic correctness, a paired McNemar test was conducted by comparing classifier predictions on raw (noisy) OCT volumes and their corresponding denoised versions for the same patients. The resulting contingency table is reported in Table 9. Out of 857 cases, 592 were correctly classified under both conditions, 264 cases improved after denoising, 1 case was incorrectly classified under both conditions, and no cases showed degraded performance after denoising (c = 0).
This paired evaluation used the same trained classifier with frozen weights and identical OCT volumes before and after denoising, thereby isolating the effect of input-level signal enhancement. Although the absence of degraded cases may appear unusual in large clinical datasets, it reflects the controlled evaluation setting in which denoising operates purely as a preprocessing step and does not introduce new class-discriminative information. We nevertheless acknowledge that this outcome may indicate a degree of coupling between the denoising and classification stages, and that McNemar’s test in this context evaluates relative consistency rather than independent generalization. Accordingly, the results should be interpreted as an upper-bound estimate of achievable improvement under idealized conditions.
McNemarTest:
χ 2   =   ( b c 1 ) 2 b   +   c
The McNemar test demonstrated a highly significant difference between the two conditions (continuity-corrected χ 2 = 262 ,     p < 10 6 ; exact binomial p < 10 12 , Table 10). This strong statistical significance arises from the large imbalance between improved and degraded outcomes (264 vs. 0), indicating that denoising consistently increases diagnostic correctness in this paired setting. No evidence of systematic performance degradation was observed.
Overall, these findings support that the denoising module enhances diagnostic signal quality without introducing clinically harmful misclassifications, while highlighting the need for future validation using independent test sets and cross-scanner data to further assess robustness and generalizability.
Patient-level Diagnostic Impact of Denoising: Patient-level correctness transition heatmap comparing noisy and denoised OCT-based diagnoses.
Denoising resulted in substantial recovery of previously misclassified patients without introducing diagnostic errors, as shown in Figure 6a. Forest plot showing proportions of improved and degraded diagnoses after denoising. All discordant cases favored improvement, with no observed degradation as shown in Figure 6b.

3.3.6. Performance Evaluation

Class-Imbalance Aware Diagnostic Performance: To address severe class imbalance in the BanglaOCT2025 dataset (DryAMD = 54, WetAMD = 61, NonAMD = 742), class-wise confusion matrices were aggregated using a one-vs-rest strategy. This analysis reveals that denoising substantially improves sensitivity across all disease categories while maintaining or improving specificity, as shown in Table 11.
The most noticeable gain was observed for WetAMD, where sensitivity increased from 0.213 to 0.984, effectively eliminating missed diagnoses. In this study, no increase in false positives was observed after denoising, confirming that performance gains were not achieved at the expense of diagnostic specificity.
Due to the severe class imbalance, evaluation relied primarily on class-imbalance aware metrics. When noisy OCT volumes were applied to the FFSwin classification model, the model achieved a balanced accuracy of 0.5715, macro F1-score of 0.4496, and Matthews correlation coefficient (MCC) of 0.2912, indicating limited robustness under noise, as shown in Table 12.
After applying the proposed self-supervised denoising pipeline, performance improved dramatically. Balanced accuracy increased to 0.9945, macro F1-score to 0.9942, and MCC to 0.9952. Cohen’s kappa rose from 0.2426 to 0.9952, indicating near-perfect agreement beyond chance, as shown in Table 12.
Notably, WetAMD sensitivity increased from 0.2131 to 0.9836, representing a critical clinical improvement in detecting the most vision-threatening condition, as shown in Table 13. These gains confirm that denoising substantially enhances diagnostic reliability across all disease categories without bias toward the majority NonAMD class.
After applying the proposed self-supervised denoising pipeline, performance improved dramatically. Balanced accuracy increased to 0.9945, macro F1-score to 0.9942, and MCC to 0.9952. Cohen’s kappa rose from 0.2426 to 0.9952, indicating near-perfect agreement beyond chance.
Interpretation of Performance Metrics: Prior to denoising, the classifier failed to detect the majority of WetAMD cases, representing a high-risk scenario for missed progressive disease. After denoising, the model correctly identified nearly all disease cases across categories, substantially reducing missed diagnoses. This improvement is especially important for WetAMD, where timely detection directly influences treatment outcomes. Notably, the gain in sensitivity did not come at the expense of specificity; false positives were rare, indicating that denoising enhances diagnostic confidence without increasing unnecessary referrals.
Under noisy conditions, positive AMD predictions were often unreliable. Following denoising, precision exceeded 98% for all disease classes, suggesting that positive predictions can be interpreted with high clinical confidence. The marked improvement in F1-score reflects a balanced reduction in both missed cases and false alarms, confirming that performance gains are not driven by class imbalance or trivial predictions.
Improvements in balanced accuracy further demonstrate that denoising enables consistent performance across DryAMD, WetAMD, and NonAMD cases, rather than favoring the majority class. Near-perfect values of MCC and Cohen’s kappa indicate strong agreement with expert annotations and confirm that the observed accuracy is robust and unbiased. Together, these findings support the role of volumetric denoising as a key enabler of reliable OCT-based AMD diagnosis.
Overall, class-imbalance-aware metrics show that the FFSwin-based denoising approach markedly improves diagnostic performance by reducing missed AMD cases while maintaining near-perfect specificity. The close agreement with expert annotations highlights the importance of volumetric denoising for reliable OCT-based AMD diagnosis.

3.3.7. Reference-Free Evaluation of Denoising on Real OCT Volumes

As the noise-free OCT references are unavailable in clinical practice, denoising performance was evaluated using patient-wise, reference-free metrics that compare noisy and denoised volumes, following established real-world evaluation protocols [8,37,39].
Noise reduction was assessed using local variance, while structural preservation and volumetric coherence were evaluated through edge strength and inter-slice consistency. All metrics were computed on paired patient scans, with noisy volumes resized to match denoised images, ensuring fair and spatially aligned comparisons.
Evaluation Metrics: Because noise-free OCT references are not available, denoising performance was evaluated using complementary reference-free metrics, following standard practices in biomedical image analysis when clean ground truth data are unavailable [8,21,33,39]. Changes in local noise variance (ΔLNV) were used to measure residual speckle noise, with lower values indicating effective noise reduction without over-smoothing [32,33]. The Edge Strength Preservation Ratio (ESPR), computed from Sobel gradients, was used to assess how well anatomically meaningful boundaries were preserved after denoising [42,43]. Volumetric coherence was evaluated using the change in inter-slice correlation (ΔISC), which measures improvements in structural consistency between adjacent B-scans along the depth axis [11]. Finally, changes in Shannon entropy (Δ Entropy) were used to assess reductions in randomness while preserving meaningful retinal structures [44,45].
All metrics were computed per patient and then aggregated class-wise for DryAMD, WetAMD, and NonAMD cohorts. Class-wise mean reference-free denoising metrics are summarized in Table 14.
Visual Distribution Analysis: The patient-wise distributions in Figure 7 show that denoising performance is consistent across diagnostic groups despite strong class imbalance. Similar ΔISC and ESPR patterns across classes indicate stable preservation of inter-slice coherence and anatomical edges, supporting the robustness of the proposed approach under real-world clinical variability.
Clinical Relevance and Downstream Impact: The gains in volumetric coherence and edge preservation are reflected in downstream performance, where the same FFSwin classifier achieves 99.88% accuracy on BanglaOCT2025 after denoising. Reference-free analysis confirms that this improvement arises from meaningful noise suppression and structural stabilization rather than artificial smoothing. Overall, the proposed FFSwin-based denoising framework demonstrates robust behavior across resolutions and disease categories, preserves anatomical integrity, and provides a reliable preprocessing step for clinical OCT analysis.

3.3.8. Qualitative Visual Assessment of Denoising Performance

To complement quantitative analysis, representative OCT B-scans from DryAMD, WetAMD, and NonAMD cases were visually examined. Figure 8, Figure 9, Figure 10, Figure 11, Figure 12 and Figure 13 compare the noisy inputs, FFSwin-denoised outputs, absolute difference maps, and zoomed retinal regions of interest. Across all categories, denoising markedly reduces speckle noise—particularly in the vitreous and deeper retinal layers—while preserving clinically relevant retinal anatomy. Difference maps indicate that the restoration primarily targets high-frequency noise with minimal impact on underlying retinal structure.
DryAMD Cases: As illustrated in Figure 8 and Figure 9, FFSwin-based denoising visibly reduces speckle noise and improves layer contrast in DryAMD scans. Retinal boundaries, particularly in the outer retina, appear more continuous and less fragmented. The difference maps show that intensity changes are mainly confined to homogeneous background regions, indicating effective noise suppression without altering retinal structure. Zoomed views further confirm improved intra-layer uniformity while preserving overall retinal morphology, with no evidence of artificial edges or spurious features.
NonAMD Cases: Figure 10 and Figure 11 show that denoising reduces speckle noise and improves layer uniformity while preserving normal foveal contour and retinal thickness. Difference maps indicate that changes are primarily noise-related rather than structural.
WetAMD Cases: WetAMD examples (Figure 12) represent the most challenging scenario due to the presence of highly reflective lesions, fluid accumulations, and shadowing artifacts. Despite these complexities, the denoised image retains critical pathological features such as hyperreflective regions and subretinal fluid contours, while substantially reducing background speckle.
In the zoomed ROI (Figure 13), the denoised output exhibits improved contrast between lesion regions and surrounding tissue, which may facilitate downstream classification and clinical interpretation. The difference map again indicates that the denoising primarily targets stochastic noise rather than disease-specific signal patterns.
Across all diagnostic categories, the qualitative results show effective speckle noise reduction without over-smoothing, preservation of retinal layers and disease-related structures, absence of visible artifacts, and consistency with improvements observed in reference-free metrics and downstream classification performance.
These visual findings corroborate the quantitative improvements reported earlier and support the suitability of the proposed FFSwin-based denoising framework for real-world OCT analysis, particularly in settings where clean reference images are unavailable.

3.3.9. Why Quantitative Metrics (PSNR, SSIM, MSE) Are Not Included

The quantitative denoising metrics are intentionally omitted for the following reasons:
No available clean ground truth for real OCT volumes: Self-supervised denoising cannot be directly benchmarked using reference-based metrics.
The purpose of denoising is functional, not comparative: The FFSwin denoiser is used as a preprocessing backbone for AMD classification.
Indirect validation through diagnostic accuracy: Our classifier trained on denoised volumes achieves 99.88% accuracy, which strongly indicates structural preservation and useful noise suppression.
Novel dataset (BanglaOCT2025): No public baselines exist for fair cross-model comparison.
The goal is diagnostic enhancement, not denoising benchmarking.
Collectively, the results demonstrate that fovea-centric volumetric abstraction combined with self-supervised denoising fundamentally alters the diagnostic utility of OCT data. Across quantitative, statistical, and qualitative evaluations, the proposed pipeline consistently improves structural coherence, suppresses speckle noise without anatomical distortion, and enables near-perfect downstream AMD classification under severe class imbalance.

4. Discussion

For clarity, the discussion is organized into broader thematic sections that integrate methodological contributions, clinical interpretation, and limitations.

4.1. Principal Findings and Methodological Contributions

This study introduces BanglaOCT2025, the first clinically validated OCT dataset representing the Bengali population, along with a fovea-centric volumetric preprocessing strategy and a self-supervised denoising framework based on FFSwin Backbone. The principal findings of this work are threefold.
First, automated extraction of a standardized 33-slice fovea-centered sub-volume substantially reduces volumetric redundancy while preserving diagnostically important macular structures. The fovea localization strategy relies on anatomical priors rather than dataset-specific tuning, supporting robustness across routine clinical scans.
Second, the proposed self-supervised denoising approach effectively suppresses speckle noise and improves volumetric coherence without requiring clean reference data. The method is best interpreted as a self-supervised denoising autoencoder, rather than a strict Noise2Self/Noise2Void implementation, as it does not rely on explicit blind-spot masking.
Third, denoising leads to a statistically and clinically meaningful improvement in downstream AMD classification, particularly under severe class imbalance.
Together, these findings demonstrate that anatomically informed preprocessing combined with volumetric denoising can substantially enhance the diagnostic utility of real-world OCT data, especially in resource-constrained clinical environments.
Compared with existing public OCT datasets (e.g., Duke OCT, OCTA-500, AROI), which typically retain full or angiography-centric volumes with many peripheral slices of limited macular relevance, BanglaOCT2025 adopts a clinically driven fovea-centric design aligned with routine ophthalmic assessment. By standardizing each scan to a compact 33-slice macular stack, the dataset balances anatomical coverage with computational efficiency and concentrates learning on regions rich in AMD biomarkers. To our knowledge, BanglaOCT2025 is the first population-specific OCT dataset to formalize this fovea-centric volumetric paradigm.

4.2. Clinical Relevance and Interpretation of Diagnostic Performance

Speckle noise is an inherent challenge in OCT imaging, often masking subtle retinal details and limiting both visual assessment and automated analysis. Unlike supervised denoising approaches that require synthetic noise models or clean reference images—which are unavailable in clinical settings—the proposed method performs fully self-supervised denoising using only noisy OCT volumes. Consistent improvements across reference-free metrics, together with qualitative visual evidence, indicate effective noise suppression while preserving retinal layers and pathological features, supporting the clinical usability of the restored images.
The improvement in AMD classification performance observed after denoising is statistically significant and indicates enhanced diagnostic reliability. In particular, WetAMD sensitivity increased from 21.3% on noisy data to 98.4% on denoised volumes, a change that was confirmed by paired McNemar’s testing to be highly significant. This result warrants careful interpretation.
WetAMD contains subtle, spatially localized features—such as subretinal fluid and neovascular changes—that are highly sensitive to speckle noise. The denoising process restores these cues, allowing previously missed cases to be correctly identified. Paired analysis further shows that denoising primarily resolves prior errors without introducing new misclassifications, reducing the likelihood that the observed gains arise from overfitting or data leakage.

4.3. Robustness, Class Imbalance, and Statistical Considerations

Clinical AMD datasets are inherently imbalanced, with NonAMD cases substantially outnumbering disease-positive samples. To address this, performance was assessed using class-aware metrics, including balanced accuracy, Matthews correlation coefficient (MCC), and Cohen’s kappa. Although overall accuracy increased after denoising, larger gains in balanced accuracy and MCC indicate recovery of disease-relevant structural information rather than amplification of majority-class bias. Because the same classifier with fixed weights was applied to identical OCT volumes before and after denoising, these improvements can be attributed to enhanced input signal quality.
The near-ceiling performance observed after denoising should be interpreted cautiously. It reflects recovery of the diagnostically meaningful signal under controlled conditions rather than universal generalization. In the paired analysis, the absence of degraded cases renders odds ratios and confidence intervals ill-defined; therefore, effect magnitude is interpreted using sensitivity changes, balanced accuracy, MCC, and agreement statistics rather than significance testing alone.
Finally, the NonAMD category includes both normal eyes and other non-AMD retinal conditions, which may influence absolute metrics. Accordingly, the results are intended to demonstrate robust AMD detection in screening-oriented settings rather than fine-grained negative-class differentiation. Rare degradation effects may be underestimated and warrant further evaluation in future multi-centre and cross-scanner studies.

4.4. Practical Implications and Limitations

The proposed framework offers practical advantages for real-world OCT analysis. Fovea-centric processing reduces storage and computational demands, and the denoising module can be applied as a standalone preprocessing step without modifying existing classifiers, facilitating integration into current OCT analysis pipelines. These characteristics are particularly relevant for large-scale screening in resource-limited clinical settings. All reported results should be interpreted as upper-bound performance obtained under controlled evaluation conditions, rather than as indicators of deployment-level diagnostic accuracy.
A key limitation of this study is the absence of an independent patient-wise hold-out test set. Owing to the limited availability of clinically labeled volumetric OCT data and the cost of expert annotation, both denoising and classification were evaluated on a fixed patient cohort using a paired design. While this approach enables controlled assessment of restoration effects, it does not support claims of cross-patient or real-world generalization. Accordingly, the near-ceiling classification performance observed after denoising should be interpreted strictly as an upper-bound estimate of diagnostic signal recoverability.
A direct quantitative comparison with conventional denoising baselines (e.g., BM3D or non-local means) was not included. Many established OCT denoising methods operate on individual B-scans and do not model volumetric inter-slice coherence, making direct comparison with the proposed 3D fovea-centric framework methodologically inconsistent. Instead, reference-free volumetric metrics and paired clinical evaluation were prioritized to assess structure preservation without reliance on synthetic noise assumptions. Systematic benchmarking against conventional and recent volumetric denoising approaches under matched protocols remains an important direction for future work.
Additional limitations warrant consideration. Although BanglaOCT2025 represents a significant step toward population-specific OCT resources, further validation using multi-center and multi-vendor datasets is necessary to assess broader generalizability. Moreover, reference-free quantitative metrics, while informative, cannot fully substitute expert clinical judgment. The present study also focuses exclusively on AMD. Despite these constraints, blinded clinical review confirmed that the denoising process preserved retinal anatomy and did not introduce systematic alterations in pathology or hallucinations within the evaluated subset.

4.5. Future Directions

Future work will focus on expanding BanglaOCT2025 with multi-centre and multi-vendor OCT data to further assess generalizability across scanners and acquisition protocols. Increasing the availability of labeled volumetric data will enable the construction of independent test cohorts and more rigorous evaluation of deployment-level performance. Public release of anonymized BanglaOCT2025 resources and preprocessing tools is planned to facilitate benchmarking and reproducible research in population-aware ophthalmic AI. In addition, the proposed fovea-centric volumetric denoising framework will be extended to other retinal diseases, such as diabetic macular edema and glaucoma, where subtle structural changes are similarly affected by noise. As training augmentation was derived from the same set of real OCT volumes used for validation, future work will focus on expanding labeled datasets and establishing independent test cohorts to assess generalization.

5. Conclusions

This study introduces BanglaOCT2025, the first clinically validated OCT dataset representing the Bengali population, together with a fovea-centric volumetric preprocessing strategy and a self-supervised denoising framework tailored for real-world OCT analysis. By combining anatomically guided foveal localization with transformer-based volumetric restoration, the proposed pipeline improves OCT signal quality and inter-slice coherence without requiring clean reference data.
Evaluation using reference-free metrics, paired statistical analysis, and blinded clinical review demonstrates that the denoising process preserves clinically relevant retinal structures and does not introduce anatomically implausible artifacts. When assessed under a controlled paired setting using a fixed classifier, denoising leads to substantial improvements in downstream AMD detection. Importantly, these gains are interpreted as upper-bound estimates of diagnostic signal recoverability within the dataset, rather than evidence of deployment-level or cross-patient generalization.
While the results highlight the potential of anatomically informed volumetric denoising to recover diagnostically meaningful signal from noisy OCT data, independent patient-wise validation is required before clinical generalization can be claimed. Future work will therefore focus on evaluation using independent test cohorts and multi-center, multi-vendor datasets.
Overall, this study underscores the importance of clinically informed preprocessing and volumetric restoration for improving the usability of real-world OCT data, particularly in resource-constrained settings. Beyond AMD, the proposed fovea-centric restoration paradigm provides a reproducible foundation for future investigations into other retinal diseases, subject to continued validation under broader clinical conditions.

Author Contributions

Conceptualization, C.B., G.M.A.R. and R.D.; Methodology, C.B.; Software, C.B.; Validation, C.B., G.M.A.R., S.S. and R.D.; Formal Analysis, C.B.; Investigation, C.B., M.S.I., M.E.I.A. and S.K.S.; Data Curation, C.B., M.S.I., M.E.I.A. and S.K.S.; Writing—Original Draft Preparation, C.B.; Writing—Review and Editing, C.B., G.M.A.R., R.D., S.S., M.S.I., M.E.I.A. and S.K.S.; Supervision, G.M.A.R. and R.D.; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study was conducted in accordance with the Declaration of Helsinki. The OCT data were retrospectively collected from routine clinical examinations at the National Institute of Ophthalmology and Hospital (NIOH), Bangladesh, under formal institutional authorization. All data were fully anonymized prior to analysis, and only non-identifiable demographic information (age and sex) was retained. Ethical approval for this study was obtained from the Institutional Review Board of Sher-e-Bangla Medical College (SBMC), Barishal, Bangladesh (IRB approval Memo no. 59.14.0000.130.99.001.26.134, dated 21 January 2026), including approval for retrospective data use and waiver of informed consent.

Informed Consent Statement

Patient consent was waived due to the retrospective nature of the study and the use of fully de-identified data, in accordance with institutional and national ethical guidelines.

Data Availability Statement

The BanglaOCT2025 dataset is publicly available at the link (BanglaOCT2025 https://drive.google.com/drive/folders/1LOMXwcAAyyYc9ZzGt1LbUBdH-dGZJrTr, accessed on 24 January 2026). The Tilt-Robust Foveal Slice Detection and Macular Extraction is available in GitHub Version 3.5.4 and link is Tilt Fovea Centric OCT Volume Extraction Implementation (https://github.com/ChinmayBepery/BanglaOCT2025_Fovea_33_extract_Denoising/tree/4ab719e6409705a099aaa40930e55e3465915d55/Code_Tilt_Roboust_Fovea_extraction, accessed on 24 January 2026). The restoration module can be found at the link FFSwin Restoration Implementation (https://github.com/ChinmayBepery/BanglaOCT2025_Fovea_33_extract_Denoising/tree/4ab719e6409705a099aaa40930e55e3465915d55/Code_FFSwin_Denoise, accessed on 24 January 2026). All shared materials are fully anonymized and contain no identifiable patient information.

Acknowledgments

We extend our sincere gratitude to the technicians at the National Institute of Ophthalmology and Hospital for their valuable assistance in data retrieval. We are also grateful to the ICT Division, Ministry of Posts, Telecommunications and Information Technology, Government of the People’s Republic of Bangladesh, for the ICT fellowship (Code No.: 1280101-120008431-3821117). We further acknowledge with thanks the essential technical support provided by Mahmudul Hasan Faisal, Anik Biswas, Surojit Biswas, Shahariar Rahman, Yasin Arafat, Shahidul Islam, Ramkrishna, S. M. Shakhawat Hossain, Farzana, Akhi, Samsuzzaman, and Aminul Islam during the data collection, system implementation, and integration phases of this work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Huang, D.; Swanson, E.A.; Lin, C.P.; Schuman, J.S.; Stinson, W.G.; Chang, W.; Hee, M.R.; Flotte, T.; Gregory, K.; Puliafito, C.A.; et al. Optical Coherence Tomography. Science 1991, 254, 1178–1181. [Google Scholar] [CrossRef] [PubMed]
  2. Drexler, W.; Fujimoto, J.G. Optical Coherence Tomography: Technology and Applications; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2008; ISBN 3540775501. [Google Scholar]
  3. Schmidt-Erfurth, U.; Waldstein, S.M. A Paradigm Shift in Imaging Biomarkers in Neovascular Age-Related Macular Degeneration. Prog. Retin. Eye Res. 2016, 50, 1–24. [Google Scholar] [CrossRef]
  4. Farsiu, S.; Chiu, S.J.; O’Connell, R.V.; Folgar, F.A.; Yuan, E.; Izatt, J.A.; Toth, C.A. Quantitative Classification of Eyes with and without Intermediate Age-Related Macular Degeneration Using Optical Coherence Tomography. Ophthalmology 2014, 121, 162–172. [Google Scholar] [CrossRef]
  5. Li, M.; Huang, K.; Xu, Q.; Yang, J.; Zhang, Y.; Ji, Z.; Xie, K.; Yuan, S.; Liu, Q.; Chen, Q. OCTA-500: A Retinal Dataset for Optical Coherence Tomography Angiography Study. Med. Image Anal. 2024, 93, 103092. [Google Scholar] [CrossRef]
  6. Melinščak, M.; Radmilović, M.; Vatavuk, Z.; Lončarić, S. Annotated Retinal Optical Coherence Tomography Images (AROI) Database for Joint Retinal Layer and Fluid Segmentation. Autom. Časopis Autom. Mjer. Elektron. Računarstvo Komun. 2021, 62, 375–385. [Google Scholar] [CrossRef]
  7. Wagner-Schuman, M.; Dubis, A.M.; Nordgren, R.N.; Lei, Y.; Odell, D.; Chiao, H.; Weh, E.; Fischer, W.; Sulai, Y.; Dubra, A.; et al. Race- and Sex-Related Differences in Retinal Thickness and Foveal Pit Morphology. Investig. Ophthalmol. Vis. Sci. 2011, 52, 625–634. [Google Scholar] [CrossRef]
  8. Esteva, A.; Robicquet, A.; Ramsundar, B.; Kuleshov, V.; DePristo, M.; Chou, K.; Cui, C.; Corrado, G.; Thrun, S.; Dean, J. A Guide to Deep Learning in Healthcare. Nat. Med. 2019, 25, 24–29. [Google Scholar] [CrossRef]
  9. NIDEK Co., Ltd. Available online: https://www.nidek-intl.com/ (accessed on 7 November 2025).
  10. Logiciel NAVIS-EX—NIDEK France. Available online: https://www.nidek.fr/en/logiciel-navis-ex/logiciel-navis-ex-2/ (accessed on 16 December 2025).
  11. Ishikawa, H.; Stein, D.M.; Wollstein, G.; Beaton, S.; Fujimoto, J.G.; Schuman, J.S. Macular Segmentation with Optical Coherence Tomography. Invest. Ophthalmol. Vis. Sci. 2005, 46, 2012–2017. [Google Scholar] [CrossRef]
  12. Ahlers, C.; Golbaz, I.; Stock, G.; Fous, A.; Kolar, S.; Pruente, C.; Schmidt-Erfurth, U. Time Course of Morphologic Effects on Different Retinal Compartments after Ranibizumab Therapy in Age-Related Macular Degeneration. Ophthalmology 2008, 115, e39–e46. [Google Scholar] [CrossRef] [PubMed]
  13. Mylonas, G.; Ahlers, C.; Malamos, P.; Golbaz, I.; Deak, G.; Schütze, C.; Sacu, S.; Schmidt-Erfurth, U. Comparison of Retinal Thickness Measurements and Segmentation Performance of Four Different Spectral and Time Domain OCT Devices in Neovascular Age-Related Macular Degeneration. Br. J. Ophthalmol. 2009, 93, 1453–1460. [Google Scholar] [CrossRef] [PubMed]
  14. Liu, Y.Y.; Chen, M.; Ishikawa, H.; Wollstein, G.; Schuman, J.S.; Rehg, J.M. Automated Macular Pathology Diagnosis in Retinal OCT Images Using Multi-Scale Spatial Pyramid and Local Binary Patterns in Texture and Shape Encoding. Med. Image Anal. 2011, 15, 748–759. [Google Scholar] [CrossRef]
  15. Goodman, J.W. Some Fundamental Properties of Speckle. J. Opt. Soc. Am. 1976, 66, 1145–1150. [Google Scholar] [CrossRef]
  16. Mehdizadeh, M.; MacNish, C.; Xiao, D.; Alonso-Caneiro, D.; Kugelman, J.; Bennamoun, M. Deep Feature Loss to Denoise OCT Images Using Deep Neural Networks. J. Biomed. Opt. 2021, 26, 046003. [Google Scholar] [CrossRef]
  17. Li, F.; Wu, Q.; Jia, B.; Yang, Z. Speckle Noise Removal in OCT Images via Wavelet Transform and DnCNN. Appl. Sci. 2025, 15, 6557. [Google Scholar] [CrossRef]
  18. Bepery, C.; Rahaman, G.M.A.; Debnath, R.; Saha, S. Forward Autoencoder Approach for Denoising Retinal OCT Images. In Proceedings of the 2025 International Conference on Electrical, Computer and Communication Engineering, ECCE, Chittagong, Bangladesh, 13–15 February 2025. [Google Scholar] [CrossRef]
  19. Ahmed, H.; Zhang, Q.; Donnan, R.; Alomainy, A. Transformer Enhanced Autoencoder Rendering Cleaning of Noisy Optical Coherence Tomography Images. J. Med. Imaging 2024, 11, 34008. [Google Scholar]
  20. Özkan, A.; Stoykova, E.; Sikora, T.; Madjarova, V. Denoising OCT Images Using Steered Mixture of Experts with Multi-Model Inference. arXiv 2024, arXiv:2402.12735. [Google Scholar] [CrossRef]
  21. Weigert, M.; Schmidt, U.; Boothe, T.; Müller, A.; Dibrov, A.; Jain, A.; Wilhelm, B.; Schmidt, D.; Broaddus, C.; Culley, S.; et al. Content-Aware Image Restoration: Pushing the Limits of Fluorescence Microscopy. Nat. Methods 2018, 15, 1090–1097. [Google Scholar] [CrossRef]
  22. Download NAVIS-EX by NIDEK Co., Ltd. Available online: https://navis-ex.software.informer.com/download/ (accessed on 7 November 2025).
  23. RS-330—NIDEK France. Available online: https://www.nidek.fr/en/oct-angiographie/rs-330/ (accessed on 16 December 2025).
  24. RS-3000 Advance 2—NIDEK France. Available online: https://www.nidek.fr/en/oct-angiographie/rs-3000-advance-2/ (accessed on 16 December 2025).
  25. RS-330 Manual|ManualsLib. Available online: https://www.manualslib.com/manual/3952363/Nidek-Medical-Rs-330.html#manual (accessed on 16 December 2025).
  26. Hee, M.R.; Izatt, J.A.; Swanson, E.A.; Huang, D.; Schuman, J.S.; Lin, C.P.; Puliafito, C.A.; Fujimoto, J.G. Optical Coherence Tomography of the Human Retina. Arch. Ophthalmol. 1995, 113, 325–332. [Google Scholar] [CrossRef] [PubMed]
  27. Ting, D.S.W.; Cheung, C.Y.L.; Lim, G.; Tan, G.S.W.; Quang, N.D.; Gan, A.; Hamzah, H.; Garcia-Franco, R.; Yeo, I.Y.S.; Lee, S.Y.; et al. Development and Validation of a Deep Learning System for Diabetic Retinopathy and Related Eye Diseases Using Retinal Images from Multiethnic Populations with Diabetes. JAMA 2017, 318, 2211–2223. [Google Scholar] [CrossRef]
  28. Landis, J.R.; Koch, G.G. The Measurement of Observer Agreement for Categorical Data. Biometrics 1977, 33, 159. [Google Scholar] [CrossRef]
  29. Diabetic Retinopathy Clinical Research Network; Browning, D.J.; Glassman, A.R.; Aiello, L.P.; Beck, R.W.; Brown, D.M.; Fong, D.S.; Bressler, N.M.; Danis, R.P.; Kinyoun, J.L.; et al. Relationship between Optical Coherence Tomography–Measured Central Retinal Thickness and Visual Acuity in Diabetic Macular Edema. Ophthalmology 2007, 114, 525–536. [Google Scholar] [CrossRef]
  30. Chiu, S.J.; Li, X.T.; Nicholas, P.; Izatt, J.A.; Toth, C.A.; Chiu, S.J.; Farsiu, S. Automatic Segmentation of Seven Retinal Layers in SDOCT Images Congruent with Expert Manual Segmentation. Opt. Express 2010, 18, 19413–19428. [Google Scholar] [CrossRef]
  31. Garvin, M.K.; Abràmoff, M.D.; Wu, X.; Russell, S.R.; Burns, T.L.; Sonka, M. Automated 3-D Intraretinal Layer Segmentation of Macular Spectral-Domain Optical Coherence Tomography Images. IEEE Trans. Med. Imaging 2009, 28, 1436–1447. [Google Scholar] [CrossRef] [PubMed]
  32. Goodman, J.W. Statistical Properties of Laser Speckle Patterns. In Laser Speckle and Related Phenomena; Springer: Berlin, Heidelberg, 1975; pp. 9–75. [Google Scholar] [CrossRef]
  33. Schmitt, J.M.; Xiang, S.H.; Yung, K.M. Speckle in Optical Coherence Tomography. J. Biomed. Opt. 1999, 4, 95–105. [Google Scholar] [CrossRef] [PubMed]
  34. Fang, L.; Li, S.; McNabb, R.P.; Nie, Q.; Kuo, A.N.; Toth, C.A.; Izatt, J.A.; Farsiu, S. Fast Acquisition and Reconstruction of Optical Coherence Tomography Images via Sparse Representation. IEEE Trans. Med. Imaging 2013, 32, 2034–2049. [Google Scholar] [CrossRef]
  35. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
  36. Hatamizadeh, A.; Tang, Y.; Nath, V.; Yang, D.; Myronenko, A.; Landman, B.; Roth, H.R.; Xu, D. Unetr: Transformers for 3d Medical Image Segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2022; pp. 574–584. [Google Scholar]
  37. Lehtinen, J.; Munkberg, J.; Hasselgren, J.; Laine, S.; Karras, T.; Aittala, M.; Aila, T. Noise2Noise: Learning Image Restoration without Clean Data. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; Volume 7, pp. 4620–4631. [Google Scholar]
  38. Krull, A.; Buchholz, T.-O.; Jug, F. Noise2Void—Learning Denoising From Single Noisy Images. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 2129–2137. [Google Scholar]
  39. Batson, J.; Royer, L. Noise2Self: Blind Denoising by Self-Supervision. arXiv 2019, arXiv:1901.11365. [Google Scholar] [CrossRef]
  40. Zhao, H.; Gallo, O.; Frosio, I.; Kautz, J. Loss Functions for Image Restoration With Neural Networks. IEEE Trans. Comput. Imaging 2017, 3, 47–57. [Google Scholar] [CrossRef]
  41. Goodfellow, I.; Bengio, Y.; Courville, A.; Bengio, Y. Deep Learning; MIT Press: Cambridge, UK, 2016; Volume 1. [Google Scholar]
  42. Perona, P.; Malik, J. Scale-Space and Edge Detection Using Anisotropic Diffusion. IEEE Trans. Pattern Anal. Mach. Intell. 1990, 12, 629–639. [Google Scholar] [CrossRef]
  43. Sobel, I.; Feldman, G. A 3×3 Isotropic Gradient Operator for Image Processing; Stanford Artificial Intelligence Project (SAIL): Stanford, CA, USA, 1968. [Google Scholar]
  44. Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [PubMed]
  45. Wang, Z.; Bovik, A.C. Mean Squared Error: Love It or Leave It? A New Look at Signal Fidelity Measures. IEEE Signal Process. Mag. 2009, 26, 98–117. [Google Scholar] [CrossRef]
Figure 2. Structure-preserving FFSwin volumetric restoration network architecture.
Figure 2. Structure-preserving FFSwin volumetric restoration network architecture.
Diagnostics 16 00420 g002
Figure 3. FFSwin self-supervised denoise autoencoder—(a) Overall denoising pipeline; (b) Self-supervised training flowchart.
Figure 3. FFSwin self-supervised denoise autoencoder—(a) Overall denoising pipeline; (b) Self-supervised training flowchart.
Diagnostics 16 00420 g003
Figure 4. Training convergence curve of the Self-Supervised Volumetric Restoration Network.
Figure 4. Training convergence curve of the Self-Supervised Volumetric Restoration Network.
Diagnostics 16 00420 g004
Figure 5. Performance comparison of FFSwin denoising autoencoder—(a) Confusion matrix for denoised (clean) BanglaOCT2025; (b) Confusion matrix for raw (noisy) BanglaOCT2025.
Figure 5. Performance comparison of FFSwin denoising autoencoder—(a) Confusion matrix for denoised (clean) BanglaOCT2025; (b) Confusion matrix for raw (noisy) BanglaOCT2025.
Diagnostics 16 00420 g005
Figure 6. Patient-level diagnostic impact of denoising—(a) Patient-level “before → after” correctness heatmap; (b) Forest plot: Improvement vs. Degradation.
Figure 6. Patient-level diagnostic impact of denoising—(a) Patient-level “before → after” correctness heatmap; (b) Forest plot: Improvement vs. Degradation.
Diagnostics 16 00420 g006
Figure 7. Class-wise boxplots of the four reference-free metrics.
Figure 7. Class-wise boxplots of the four reference-free metrics.
Diagnostics 16 00420 g007
Figure 8. Qualitative denoising results for a DryAMD case.
Figure 8. Qualitative denoising results for a DryAMD case.
Diagnostics 16 00420 g008
Figure 9. Zoomed retinal region of interest (ROI) for the DryAMD case.
Figure 9. Zoomed retinal region of interest (ROI) for the DryAMD case.
Diagnostics 16 00420 g009
Figure 10. Qualitative denoising results for a NonAMD case.
Figure 10. Qualitative denoising results for a NonAMD case.
Diagnostics 16 00420 g010
Figure 11. Zoomed retinal region of interest (ROI) for the NonAMD case.
Figure 11. Zoomed retinal region of interest (ROI) for the NonAMD case.
Diagnostics 16 00420 g011
Figure 12. Qualitative denoising results for a WetAMD case.
Figure 12. Qualitative denoising results for a WetAMD case.
Diagnostics 16 00420 g012
Figure 13. Zoomed retinal region of interest (ROI) for the WetAMD case.
Figure 13. Zoomed retinal region of interest (ROI) for the WetAMD case.
Diagnostics 16 00420 g013
Table 1. Collected retrospective OCT scans from NIOH.
Table 1. Collected retrospective OCT scans from NIOH.
OCT Machine
Model
Patients in
NAVIS-EX *
Valid
Patients
Valid
Scans **
Scans for Bangla-
OCT2025
Slices in Bangla-
OCT2025
Nidek RS-330 Duo 2107173811281147146,816
Nidek RS-3000 Advance34833353043856,064
Total 1419107116581585202,880
* For this research purpose, we have used a trial version of this software (NAVIS-EX V-1.12 19702-E201 software by NIDEK Co., Ltd., Gamagori, Japan [22]). Some scans corresponded to invalid or empty image folders. ** The patient may need to scan both eyes or multiple scans per eye for some rare cases.
Table 2. Summary of the “BanglaOCT2025”.
Table 2. Summary of the “BanglaOCT2025”.
ParticularsQuantity
Total patients1419
Valid patients1071
Scans from both eyes or multiple scans from a single eye1658
Discard scans due to image acquisition issues73
Considered scans for BanglaOCT20251585
Considered 2D OCT slices for BanglaOCT2025202,880
Patients for ground truth labelling in BanglaOCT2025573
Scans in BanglaOCT2025 without ground truth labelling728
Scans for doctor labelling in BanglaOCT2025857
Dry AMD54
Wet AMD61
Non-AMD742
Table 3. Age-wise patient distribution and ground truth labelling.
Table 3. Age-wise patient distribution and ground truth labelling.
Age RangeNo. of PatientsGround Truth Labelled 573 Patients
No. of PatientsDry AMDWet AMD
5–10.56000
11–20.545000
21–30.596000
31–40.5186000
41–45.5105000
46–50.51418144
51–55.51601601111
56–60.512512599
61–65.59898610
66–70.570701715
71–75.5262629
76–80.59952
81–85.54401
Total 10715735461
Table 4. Gender-wise dry and wet AMD distribution in BanglaOCT2025.
Table 4. Gender-wise dry and wet AMD distribution in BanglaOCT2025.
GenderTotal PatientsGround Truth
Labeling 1
Dry AMDWet AMDTotal AMD
Male658349313667
Female413224232548
Total 10715735461115
1 Some patients have both eyes’ scans, and for some rare cases, multiple scans per eye. The ground-truth labeling and classification experiments are conducted scan-wise, with one diagnostic label per scan.
Table 6. Precision, Recall, F1-score, Support from raw (noisy) OCT volume of BanglaOCT2025.
Table 6. Precision, Recall, F1-score, Support from raw (noisy) OCT volume of BanglaOCT2025.
PrecisionRecallF1-ScoreSupport
DryAMD0.170.780.2854
WetAMD0.300.210.2561
NonAMD0.950.720.82742
Table 7. Precision, Recall, F1-score, Support from denoised (clean) OCT volume of BanglaOCT2025.
Table 7. Precision, Recall, F1-score, Support from denoised (clean) OCT volume of BanglaOCT2025.
PrecisionRecallF1-ScoreSupport
DryAMD0.981.000.9954
WetAMD1.000.980.9961
NonAMD1.001.001.00742
Table 8. Blinded clinician assessment of denoising effects.
Table 8. Blinded clinician assessment of denoising effects.
MetricRawDenoised
Pathology preservation (↑)3.34.39
Artifacts (↓)2.391.31
↑ indicates higher scores represent better pathology preservation; ↓ indicates lower scores represent fewer artifacts.
Table 9. McNemar contingency table.
Table 9. McNemar contingency table.
Clean Correct: YesClean Correct: No
Noisy Correct: YES592 → a0 → c
Noisy Correct: No264 → b1→ d
Table 10. McNemar test results.
Table 10. McNemar test results.
Measurement ParametersValue
b (improved)264
c (degraded)0
χ 2 ( Continuity-corrected ) :   χ 2 = ( b c 1 ) 2 b + c 262.0038
p - value   ( Continuity-corrected ) :   p = P ( χ d f = 1 2 262.0038 ) 0.000000
χ 2 ( uncorrected ) :   χ 2 = ( b c ) 2 b + c 264
p - value   ( uncorrected ) :   p = P ( χ d f = 1 2 264 ) 0.000000
Exact binomial p-value < 1 × 10 12 (Essentially 0)
Table 11. Class-wise aggregated confusion matrix analysis.
Table 11. Class-wise aggregated confusion matrix analysis.
ClassConditionTPFNFPTNSensitivitySpecificity
DryAMDNoisy42122085950.77780.7407
DryAMDDenoised5401802 1 0.9988
WetAMDNoisy1348307660.21310.9623
WetAMDDenoised60107960.98361
NonAMDNoisy53720527880.72370.7652
NonAMDDenoised7420011511
Table 12. Class-imbalance aware performance comparison before and after denoising.
Table 12. Class-imbalance aware performance comparison before and after denoising.
MetricNoisy DataDenoised DataΔ Improvement
Overall Accuracy0.69080.99880.308
Balanced Accuracy0.57150.99450.423
Macro Precision0.47420.99390.5197
Macro Recall0.57150.99450.423
Macro F1-score0.44960.99420.5446
Weighted F1-score0.74720.99880.2516
MCC0.29120.99520.704
Cohen’s Kappa0.24260.99520.7526
Table 13. Per-class sensitivity (recall) comparison.
Table 13. Per-class sensitivity (recall) comparison.
ClassNoisy RecallDenoised Recall
DryAMD0.7778 1
WetAMD0.21310.9836
NonAMD0.72371
Table 14. Class-wise mean reference-free denoising metrics.
Table 14. Class-wise mean reference-free denoising metrics.
ClassΔ LNV 1ESPR 2Δ ISC 3Δ Entropy 4
DryAMD0.00150.2690.28594.183
WetAMD0.00180.26840.29634.1497
NonAMD0.00120.28290.26954.2225
Interpretation: Across all disease categories, the denoising framework demonstrates consistent and balanced improvements: 1 Noise Suppression (Δ LNV): Low and tightly clustered ΔLNV values ( 0.001 0.002 ) across all classes indicate effective noise reduction without excessive smoothing. The slightly lower values observed for NonAMD are consistent with its more uniform retinal structure. 2 Edge Preservation (ESPR): ESPR values are consistent across classes ( 0.27 0.28 ) , showing that the denoising process preserves retinal layer boundaries and lesion edges. This is particularly relevant for WetAMD, where accurate delineation of fluid pockets and neovascular features is clinically important. 3 Volumetric Consistency (Δ ISC): Positive ΔISC values for all categories reflect improved coherence between adjacent B-scans after denoising. The higher ΔISC (0.2963) observed in WetAMD suggests that the model effectively stabilizes structurally variable pathological volumes by leveraging inter-slice context. 4 Structural Regularization (Δ Entropy): Marked reductions in entropy 4.15 4.22 indicate suppression of stochastic noise while maintaining meaningful retinal texture. Similar entropy changes across classes suggest that the denoising behavior is consistent and not biased toward any specific disease group.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bepery, C.; Rahaman, G.M.A.; Debnath, R.; Saha, S.; Islam, M.S.; Abir, M.E.I.; Sarker, S.K. BanglaOCT2025: A Population-Specific Fovea-Centric OCT Dataset with Self-Supervised Volumetric Restoration Using Flip-Flop Swin Transformers. Diagnostics 2026, 16, 420. https://doi.org/10.3390/diagnostics16030420

AMA Style

Bepery C, Rahaman GMA, Debnath R, Saha S, Islam MS, Abir MEI, Sarker SK. BanglaOCT2025: A Population-Specific Fovea-Centric OCT Dataset with Self-Supervised Volumetric Restoration Using Flip-Flop Swin Transformers. Diagnostics. 2026; 16(3):420. https://doi.org/10.3390/diagnostics16030420

Chicago/Turabian Style

Bepery, Chinmay, G. M. Atiqur Rahaman, Rameswar Debnath, Sajib Saha, Md. Shafiqul Islam, Md. Emranul Islam Abir, and Sanjay Kumar Sarker. 2026. "BanglaOCT2025: A Population-Specific Fovea-Centric OCT Dataset with Self-Supervised Volumetric Restoration Using Flip-Flop Swin Transformers" Diagnostics 16, no. 3: 420. https://doi.org/10.3390/diagnostics16030420

APA Style

Bepery, C., Rahaman, G. M. A., Debnath, R., Saha, S., Islam, M. S., Abir, M. E. I., & Sarker, S. K. (2026). BanglaOCT2025: A Population-Specific Fovea-Centric OCT Dataset with Self-Supervised Volumetric Restoration Using Flip-Flop Swin Transformers. Diagnostics, 16(3), 420. https://doi.org/10.3390/diagnostics16030420

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop