Development of Stacked Neural Networks for Application with OCT Data, to Improve Diabetic Retinal Health Care Management

Rebolo, Pedro; Barbosa, Guilherme; Carvalho, Eduardo; Areias, Bruno; Guerra, Ana; Torres-Costa, Sónia; Ramião, Nilza; Falcão, Manuel; Parente, Marco

doi:10.3390/info16080649

Open AccessArticle

Development of Stacked Neural Networks for Application with OCT Data, to Improve Diabetic Retinal Health Care Management

by

Pedro Rebolo

^1,*

,

Guilherme Barbosa

¹

,

Eduardo Carvalho

¹

,

Bruno Areias

¹

,

Ana Guerra

¹

,

Sónia Torres-Costa

²

,

Nilza Ramião

¹

,

Manuel Falcão

^2,3

and

Marco Parente

^1,4

¹

INEGI—Institute of Science and Innovation in Mechanical and Industrial Engineering, 4200-465 Porto, Portugal

²

Department of Ophthalmology, Centro Hospitalar e Universitário São João, 4200-319 Porto, Portugal

³

Department of Surgery and Physiology, Faculty of Medicine, University of Porto, 4200-319 Porto, Portugal

⁴

DEMec, Faculty of Engineering, University of Porto, 4200-319 Porto, Portugal

^*

Author to whom correspondence should be addressed.

Information 2025, 16(8), 649; https://doi.org/10.3390/info16080649

Submission received: 22 May 2025 / Revised: 14 July 2025 / Accepted: 29 July 2025 / Published: 30 July 2025

(This article belongs to the Special Issue AI-Based Biomedical Signal Processing)

Download

Browse Figures

Versions Notes

Abstract

Background: Retinal diseases are becoming an important public health issue, with early diagnosis and timely intervention playing a key role in preventing vision loss. Optical coherence tomography (OCT) remains the leading non-invasive imaging technique for identifying retinal conditions. However, distinguishing between diabetic macular edema (DME) and macular edema resulting from retinal vein occlusion (RVO) can be particularly challenging, especially for clinicians without specialized training in retinal disorders, as both conditions manifest through increased retinal thickness. Due to the limited research exploring the application of deep learning methods, particularly for RVO detection using OCT scans, this study proposes a novel diagnostic approach based on stacked convolutional neural networks. This architecture aims to enhance classification accuracy by integrating multiple neural network layers, enabling more robust feature extraction and improved differentiation between retinal pathologies. Methods: The VGG-16, VGG-19, and ResNet50 models were fine-tuned using the Kermany dataset to classify the OCT images and afterwards were trained using a private OCT dataset. Four stacked models were then developed using these models: a model using the VGG-16 and VGG-19 networks, a model using the VGG-16 and ResNet50 networks, a model using the VGG-19 and ResNet50 models, and finally a model using all three networks. The performance metrics of the model includes accuracy, precision, recall,

F_{2}

-score, and area under of the receiver operating characteristic curve (AUROC). Results: The stacked neural network using all three models achieved the best results, having an accuracy of 90.7%, precision of 99.2%, a recall of 90.7%, and an

F_{2}

-score of 92.3%. Conclusions: This study presents a novel method for distinguishing retinal disease by using stacked neural networks. This research aims to provide a reliable tool for ophthalmologists to improve diagnosis accuracy and speed.

Keywords:

deep learning; stacked neural networks; diabetic retinopathy; diabetic macular edema; retinal vein occlusion; optical coherence tomography

1. Introduction

Retinal vein occlusion (RVO) is the second most prevalent retinal vascular condition after diabetic retinopathy (DR) and is a leading cause of vision loss and impairment. This condition arises when a blockage in the retinal venous system leads to thrombus formation, increasing capillary pressure and causing fluid leakage and macular edema—both of which contribute significantly to visual impairment [1,2].

Globally, RVO affects approximately 16 million individuals, with its occurrence influenced by factors such as age, gender, and underlying health conditions. The risk of developing RVO increases with age, particularly in individuals over 60 [3].

RVO is classified into two main types: branch retinal vein occlusion (BRVO) and central retinal vein occlusion (CRVO), depending on the location of the blockage. BRVO typically develops at arteriovenous crossings, whereas CRVO occurs near or at the lamina cribrosa of the optic nerve. The most common cause is the compression of retinal veins by adjacent atherosclerotic arteries. CRVO is associated with features such as optic disc swelling, dilated and tortuous retinal veins, widespread hemorrhages, cotton wool spots, macular edema, and capillary non-perfusion throughout the retina. In contrast, BRVO shares similar characteristics but is limited to the region served by the affected vein. Vision loss in RVO is mainly due to macular edema but can also result from macular ischemia or complications like neovascular glaucoma and vitreous hemorrhage [1]. While RVO can be classified in two subtypes, in many image diagnosis workflows, they are considered as RVO-associated edema due to shared features observable in OCT images.

Diabetic macular edema (DME) is also a major cause of vision loss that is characterized by retinal thickening and intraretinal fluid accumulation. It remains a leading complication of diabetic retinopathy and is a critical diagnosis in OCT-based diagnosis.

The use of artificial intelligence (AI) in medical imaging for diagnostic support has shown promising results across various medical fields such as ophthalmology. AI can assist in screening optical coherence tomography (OCT) images for disease detection, helping to reduce challenges such as human fatigue, bias, and cognitive limitations [4].

Early approaches to computer-assisted diagnosis (CAD) relied on extracting relevant features, which required domain expertise and varied depending on the dataset. These features often included texture or structural information from the images [5]. More recently, deep learning (DL) techniques have been employed to classify OCT images, with the development of convolutional neural networks (CNNs) significantly enhancing image classification performance [6,7].

CNNs use convolutional operators to extract features directly from image pixels, making them highly effective for image classification tasks. Convolutional, pooling, and activation layers process the images to identify key features, which are then passed through a fully connected network for classification [8]. These networks have been used to classify multiple diseases simultaneously, making them highly practical for real-world disease screening applications. An example of this is the work developed by Kermany et al. [9], where a DL model was developed that was capable of classifying four distinct types of classes using OCT scans, with them being normal eyes, eyes with choroidal neovascularization (CNV), eyes with diabetic macular edema (DME), and eyes with drusen. Extensive research has been conducted using the Kermany dataset with study’s reaching an accuracy of 98.6% using the VGG-16 network [10].

However, a notable research gap remains in distinguishing retinal vein occlusion (RVO) from diabetic macular edema (DME). Most existing studies that classify these two retinal conditions primarily utilize fundus images. For instance, Choi et al. [11] applied random forest transfer learning (TL) using the VGG-19 architecture for fundus image classification, achieving an accuracy of 74.7% in differentiating background diabetic retinopathy (DR) from RVO. Similarly, Abitbol et al. [12] developed a deep learning model for classifying widefield color fundus images, achieving an accuracy of 85.2% for DR and 88.4% for RVO. Other studies have concentrated solely on distinguishing various types of RVO from healthy eyes using fundus images [13,14]. There is limited research on using OCT images for diagnosing patients with RVO, despite this imaging modality being the preferred non-invasive diagnostic tool. Consequently, developing a model that can accurately diagnose RVO using OCT has significant potential for clinical application.

A model developed by Pin et al. [15] composed of an ensemble of two TL models, MobileNetV3Large and ResNet50, to classify OCT images into four distinct categories, RVO, age-related macular degeneration (AMD), central serous chorioretinopathy (CSCR), and diabetic macular edema (DME), achieving an overall accuracy of 91.69%. Similarly, Khan et al. [16] modified three models and extracted the features using TL. The best features were selected using ant colony optimization. These features were then classified using k-nearest neighbors and support vector machines, reaching accuracies of 99.1% and 97.4% with and without feature optimization, respectively. More recently, Kulyabin et al. [17] evaluated the performance of VGG-16 and ResNet50 on a publicly available OCT dataset [17], which includes conditions such as AMD, DME, RVO, epiretinal membrane, retinal artery occlusion, and vitreomacular interface disease. The VGG-16 network achieved the highest accuracy at 89.5%.

This study proposes a DL approach based on an ensemble technique known as stacking for the automated diagnosis of retinal diseases, namely DME, RVO-associated edema, and other generic retinal pathologies, using OCT images. These pathologies can be observed in Figure 1. The model is specifically evaluated for its ability to differentiate RVO-associated edema from DME, a critical distinction for guiding appropriate treatment strategies.

2. Materials and Methods

2.1. Ethical Approval

The Institutional Ethics Review Board of ULS São João approved this study. The protocol conformed with the canons of the Declaration of Helsinki for research involving human participants, as well as the EU’s General Data Protection Regulation.

Informed consent was waived in view of the retrospective nature of the study. The dataset includes OCT images with Normal, DME, RVO, and other pathologies. All personal identifiers were removed to ensure patient anonymity and data confidentiality.

2.2. Datasets

This work used two datasets: the publicly available Kermany dataset and a private dataset from ULS São João in Porto, Portugal. The Kermany dataset is a well-known public dataset for OCT images that consists of four distinct classes: Normal, Drusen, DME, and CNV. It includes over 100,000 images that were obtained with Heidelberg Spectralis, selected from retrospective cohorts of adult patients from multiple institutions.

The ULS São João dataset contains OCT and fundus images from four distinct classes: normal, RVO, DME, and other retinal pathologies.

2.3. Models Fine-Tuning

Three models were used in this study, VGG-16, VGG-19, and ResNet50 architectures. These models were first trained on the Kermany public OCT dataset so that the networks were trained on OCT images. Afterwards, the models learned weights through the Kermany dataset were used in the ULS São João dataset.

2.3.1. Kermany

The models were fine-tuned using images from the Kermany dataset. To achieve this, the feature extractor was unfrozen, and transfer learning (TL) was applied by loading pretrained weights of the respective models from the ImageNet dataset.

The Kermany dataset has over one hundred thousand images, although some of these images are duplicates. After removing the duplicates, the dataset had 101,444 images. A total of 250 images from each class from the Kermany dataset were used for the test set. The remaining images were split into training and validation sets, with a ratio of 80%/20%, respectively.

Due to the class imbalance of the Kermany dataset, a weighted random sampler was used during training to ensure equal sampling probability for each class during training. Table 1 shows the number of images for each class in the Kermany dataset.

The used loss function was the cross-entropy loss, defined in Equation (1), where N is the number of training images,

θ

represents the model’s trainable parameters,

X_{i}

is the OCT image,

y_{i}

is the image label, and

{\hat{y}}_{i}

is the predicted probability after the Softmax activation function being applied.

C E_{L o s s} (θ) = - \frac{1}{N} \sum_{i = 1}^{N} y_{i} log ({\hat{y}}_{i} (X_{i}, θ))

(1)

In order to prevent overfitting, L2 regularization was employed and an additional term was added to the loss function:

L o s s (θ) = C E_{L o s s} (θ) + α θ^{2},

(2)

where

α

is the parameter that governs the regularization.

An early stop criterion was also employed to prevent overfitting during training by monitoring the validation

F_{2}

-score metric. If the metric did not increase for 10 epochs, then the training would stop.

Two different TL strategies were analyzed: a strategy where only the classifier parameters were trained (feature extractor was frozen) and a strategy where all the parameters, including the feature extractor, were trained from the initialized weights of the pretrained model.

The VGG16 and VGG19 pretrained models have two hidden layers with 4096 neurons, while the ResNet50 pretrained model has an input layer with 2048 neurons and an output layer with 1000 neurons. In this work, a third hidden layer was added to the VGG16 and VGG19 classifiers, with 2048 neurons and an output layer with 4 neurons. The ResNet50 classifier also suffered changes, with a hidden layer with 1024 neurons added to the classifier and an output layer with 4 neurons. Table 2 shows a summary for the added hidden layer to the models.

2.3.2. ULS São João

The used ULS São João dataset comprises of 5360 images with DME, RVO, normal, and other pathologies. For the test set, 125 images of each class were used, meaning a total of 500 images. The remaining images were also split using the same ratio as used in the Kermany dataset, i.e., 80%/20%. The number of images by class and the demographic distribution is shown in Table 3. The other pathologies class was created to include other pathologies than DME and RVO to enhance the model’s generalization and applicability.

In order to improve the model generalization, the ULS São João dataset was divided using patient-wise separation, ensuring that images from the same patient were confined in the same set, avoiding data leakage, a strategy that was already carried out in previous works. The models were initialized with the weights that were previous learned by the respective model using the Kermany dataset.

No preprocessing was applied to the images from the ULS São João dataset since the models were trained with a Kermany dataset that also did not have any preprocessing.

Data augmentation techniques were employed when dealing with this dataset during the training to enhance the models’ generalization. The techniques used were a combination of Horizontal Flip, a Gaussian blur, and a combination of both techniques.

As in the Kermany dataset, L2 regularization was applied to the models to prevent overfitting. The loss function used was the cross-entropy loss function, the same as Equation (1). Cross-validation with five folds was also employed to prevent overfitting on the dataset. The early stop criterion was also maintained with a patience of 10 epochs.

2.4. Stacked Model

The stacked model is based on an ensemble technique called stacking, where the features of two or more models are extracted to an intermediate layer of the network and then concatenated together to form a large feature set. This feature set is then used to train a classifier. An overview of this model is represented in Figure 2.

In this work, the stacked model is based on the models that were previous studied using the ULS São João dataset. The model parameters with the best results for the Kermany dataset were combined, in order to initialize the weight in the new model.

The overview of the classifier used in the stacked model is present in Figure 3. The classifier architecture was based on the classifiers of the VGG models. The number of features differ from each combination of architectures used in the stacked model. The number of features of each combination is depicted in Table 4.

The dataset used for the stacked model was the ULS São João dataset. Similar to when only one model was used, a cross-validation with 5 folds was performed, L2 regularization was performed, and an early stopping criterion with a patience of 10 epochs was used in order to prevent overfitting on the dataset.

2.5. Metric Indices

The performance of the model in this study was assessed with various metrics, including accuracy, precision, recall, and

F_{2}

-score.

Accuracy is a commonly used evaluation metric that reflects the overall performance of a classification model. It is defined as the ratio between the number of correct predictions and the total number of predictions, providing a general measure of how often the model is correct [18,19].

Precision is a quantitative metric that assesses the correctness of positive predictions by quantifying the proportion of true positive instances among all instances classified as positive. It is calculated by dividing the number of true positives by the sum of true positives and false positives [19,20].

Recall is a quantitative metric that evaluates the model’s ability to identify all relevant positive cases. It is defined as the ratio of true positives to the sum of true positives and false negatives. The concept of this metric emphasizes the model’s ability to correctly identify correct instances from the overall amount of actual true positive instances. This measure is crucial in scenarios where failing to detect positive instances can lead to significant consequences [19,21].

The

F_{2}

-score metric comes from the general

F_{β}

-score metric. If

β

is 1, then the metric is the

F_{1}

-score, which achieves a fair assessment between precision and recall. However, in the

F_{2}

-score metric, there is a higher emphasis on recall than prediction. This is because in medical scenarios, there is a higher weight in detecting positive instances, since it is essential to diagnose positive instances [22].

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N} \times 100

(3)

P r e c i s i o n = \frac{T P}{T P + F P} \times 100

(4)

R e c a l l = \frac{T P}{T P + F N} \times 100

(5)

F_{2} - score = (1 + 2^{2}) \frac{P r e c i s i o n \times R e c a l l}{(2^{2} \times P r e c i s i o n) + R e c a l l} \times 100

(6)

where

T P

= True positive,

T N

= True negative,

F P

= False positive, and

F N

= False negative.

3. Results

The Kermany dataset was only used to obtain initial weights on networks using OCT images. As such, the results will only focus on the dataset from ULS São João.

All models were trained for a maximum of 500 epochs. The ADAM optimizer was used and early stopping was used to prevent overfitting by monitoring the

F_{2}

-score. If this metric did not increase for 10 epochs then the training would stop immediately. The training of all models was performed on an NVIDA RTX A6000 GPU workstation.

Since the Kermany dataset was only used to obtained initial weights for OCT images, and since the SNN was trained using the ULS São João dataset, only detailed results are shown for the ULS São João dataset.

3.1. Grad-CAM

Gradient-weighted Class Activation Mapping (Grad-CAM) is a widely used technique for interpreting deep convolutional neural networks by highlighting the image’s most relevant regions to a model’s decision. In this implementation, the process begins by selecting the last layer in the feature extractor, where the feature maps are captured via a forward hook during the model’s forward pass.

Following the prediction, the gradient of the target class score is back-propagated with respect to the final layer’s activations. These gradients are globally averaged to produce importance weights that reflect each feature map’s contribution to the class prediction. A weighted sum of the feature maps is then computed to generate a class activation map (CAM), which is subsequently rectified and normalized.

Finally, the CAM is resized and overlaid on the original input image using a heatmap. This visual representation reveals the spatial focus of the model’s attention, offering insight into which regions influenced the classification—supporting ophthalmologists in understanding model-driven diagnostic decisions.

In Figure 4, Grad-CAM visualizations generated using the VGG16, VGG19, and ResNet50 models on the ULS São João OCT dataset are presented. These heatmaps highlight the regions of the images that each model focuses on to make its predictions.

Although the images are different from each model, the Grad-CAM results for ResNet50 indicate a more concentrated focus on a smaller region of the image. In contrast, VGG16 and VGG19 are distributing their attention across broader areas of the input image.

3.2. ULS São João Dataset

The results obtained for the three selected models using the ULS São João dataset are present in Table 5. The confusion matrices for the best fold for the respective model are shown in Figure 5, and in Figure 6, the respective ROC curves are shown.

3.3. Stacked Neural Networks

Regarding the SNN, four different combinations were used (Table 4) with the three models selected. The confusion matrices for the best fold for the respective SNN is shown in Figure 7, and the respective ROC curve is shown in Figure 8. The metric results are present in Table 6, where: SNN 1—VGG16 & ResNet50; SNN 2—VGG19 & ResNet50; SNN 3—VGG16 & VGG19; and SNN 4—VGG16 & VGG19 & ResNet50.

4. Discussion

The usage of AI in the medical field, specifically, in retinal disease diagnosis, can significantly increase the capabilities of healthcare professionals, enabling them to deliver fast and accurate diagnosis. This improved efficiency leads to better patient outcomes through earlier detection and disease treatment, as well as allowing healthcare personnel to manage a higher volume of cases, thereby expanding their reach and overall impact.

The results from the public dataset by Kermany et al. [9] are not presented, as this dataset was used solely for pretraining the convolutional layers of the models applied to the OCT images.

Regarding the results from the private ULS São João dataset, the three models performed similarly, achieving an accuracy of 85% and an

F_{2}

-score of 88%. Analysis of the confusion matrices and ROC curves reveals that the models were most effective at classifying normal eyes, while also demonstrating strong performance in identifying other ocular pathologies.

The confusion matrices reveal misclassifications between DME and RVO pathologies. Given the limited research on classifying RVO using OCT images, and the existing gap in distinguishing RVO from DME, such misclassifications are to be expected. There is also a recurring pattern of misclassifying pathological cases as normal. This can be attributed to the nature of the OCT images used, as not all scans from a patient with a given pathology visibly exhibit signs of the disease—leading to some being mistaken for images of healthy eyes.

Building on the previous observations, the SNNs used in this study show that combining two models, when one of the models is the ResNet50 model, yields results similar to those of a single-model approach. The recurring issue of misclassifying DME and RVO is also evident in these SNNs. Notably, the SNN that combines the VGG16 and VGG19 models demonstrates improved performance, with accuracy rising to 90% and the

F_{2}

-score increasing to 91%, compared to single-model usage. It also has a better performance of classifying DME and RVO pathologies, although there are still multiple misclassifications. The SNN incorporating all three models achieves comparable results to the VGG16 and VGG19 combination, with slightly enhanced metrics.

A possible explanation for the similar performance between the SNNs using the ResNet50 model and single models, as well as between the SNNs combining VGG16 and VGG19 and that of using all three models, lies in the number of features fed into the classifier. The convolutional layers of ResNet50 extract 2048 features—significantly fewer than the 25,088 features extracted by both VGG16 and VGG19. As a result, incorporating the ResNet50 features does not substantially increase the total feature set, which may explain the lack of significant improvement in performance in both cases. This can be better perceived in Figure 9. It can also be perceived by observing the grad-CAM images from Figure 4. The grad-CAM images from the VGG-16 and VGG-19 networks show that these networks see a broader area of the map comparing with the grad-CAM images from the ResNet50 architecture.

The performance metrics were calculated using a weighted average across all categories to ensure a balanced evaluation. For consistency and comparability with existing literature, the metrics from previous studies were inferred from their respective confusion matrices as reported. It is important to note that discrepancies may exist between the figures reported here and those in the original publications due to differing methods used for computing average values. The top-performing model in this study demonstrated superior overall accuracy compared to the results of Abitbol et al. [12], achieved comparable accuracy to that of Pin et al. [15] and Kulyabin et al. [17], and performed slightly below the models proposed by Khan et al. [16] and Barbosa et al. [23]. In terms of individual class performance, particularly for RVO, the proposed model attained higher accuracy and recall than all referenced studies except for those by Khan and Barbosa, although, in the case of Barbosa et al., the model had more inputs than the OCT images, such as infrared images and clinical data from the patients.

This model provides a viable clinical tool for differentiating between RVO and DME and reinforces the importance of incorporating multimodal imaging data and patient-specific clinical information in the development of AI-based diagnostic systems. The use of stacked neural networks enhances the model’s ability to distinguish between OCT images of DME and RVO pathologies more effectively.

This study has several limitations. Notably, no image preprocessing was applied to the OCT scans used for training the model. Introducing preprocessing techniques—such as contrast enhancement—could potentially improve the model’s performance by highlighting relevant features more clearly. Additionally, while the model is capable of classifying patients into four categories, i.e., normal, DME, RVO, or other pathologies, it does not specify the nature of the “other” conditions. In such cases, clinical expertise is still required to determine the exact diagnosis. Future work should aim to expand the number of diagnostic categories included in the training process to enable more comprehensive and precise classification. Future studies should also think about removing the ResNet50 model form the SNN and add a fundus image classification, like the multimodal algorithm developed by Barbosa et al. [23], with hyperspectral imaging, like in the work performed by Wang et al. [24]. To improve the model’s generalization, future training with a dataset containing OCT images taken from different devices should be performed.

5. Conclusions

The integration of artificial intelligence into medical diagnostics, as illustrated in this study, presents significant advantages. AI-based models enable remote diagnostic capabilities, which can decrease the frequency of hospital visits and contribute to reducing overall healthcare costs. This approach not only improves access to specialist consultations but also supports the regular updating of medical records, ultimately enhancing the quality and continuity of patient care.

This study presents a novel approach for diagnosing retinal diseases using stacked neural networks. The results show that incorporating features from multiple models and stacking them enhances the model’s performance, especially in distinguishing between DME and RVO—two retinal diseases that share similar visual characteristics. Another important observation is that the use of the ResNet50 model in the stacking algorithm only marginally improves performance, primarily due to the limited number of features extracted by this model. Therefore, when employing a stacking algorithm, it is crucial that the model extracts a sufficient number of features to produce a meaningful impact on performance.

Although the four-class model introduces greater complexity by incorporating a category for other pathologies, it shows strong potential for clinical use. Including this additional class is essential, as it more accurately reflects real-world clinical settings, where patients may present with a variety of retinal conditions.

Author Contributions

Conceptualization, P.R. and M.P.; methodology, P.R., M.P., G.B. and E.C.; software, P.R.; validation, G.B., E.C., S.T.-C. and M.F.; writing—original draft preparation, P.R.; writing—review and editing, G.B., E.C., B.A., A.G., S.T.-C., N.R., M.F. and M.P.; Supervision, A.G., N.R., M.F. and M.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of the Centro Hospitalar Universitário of São João (protocol code number 329/2023, approval date 29 February 2024).

Informed Consent Statement

Due to the retrospective nature of this study and the use of deidentified data, the requirement for written informed consent was waived.

Data Availability Statement

The datasets presented in this article are not readily available because the data are part of an ongoing study.

Acknowledgments

The authors would like to acknowledge the support from LAETA.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
AUC	Area Under Curve
CAD	Computer-Assisted Diagnosis
CAM	Class Activation Map
CNN	Convolutional Neural Network
CNV	Choroidal Neovascularization
DL	Deep Learning
DME	Diabetic Macular Edema
DR	Diabetic Retinopathy
OCT	Optical Coherence Tomography
ROC	Receiving Operating Curve
RVO	Retinal Vein Occlusion
SNN	Stacked Neural Network
TL	Transfer Learning

References

Nicholson, L.; Talks, S.J.; Amoaku, W.; Talks, K.; Sivaprasad, S. Retinal vein occlusion (RVO) guideline: Executive summary. Eye 2022, 36, 909–912. [Google Scholar] [CrossRef]
Rogers, S.; McIntosh, R.L.; Cheung, N.; Lim, L.; Wang, J.J.; Mitchell, P.; Kowalski, J.W.; Nguyen, H.; Wong, T.Y. The prevalence of retinal vein occlusion: Pooled data from population studies from the United States, Europe, Asia, and Australia. Ophthalmology 2010, 117, 313–319.e1. [Google Scholar] [CrossRef]
Khayat, M.; Williams, M.; Lois, N. Ischemic retinal vein occlusion: Characterizing the more severe spectrum of retinal vein occlusion. Surv. Ophthalmol. 2018, 63, 816–850. [Google Scholar] [CrossRef]
Schmitz-Valckenberg, S.; Göbel, A.P.; Saur, S.C.; Steinberg, J.S.; Thiele, S.; Wojek, C.; Russmann, C.; Holz, F.G.; For The Modiamd-Study, G. Automated Retinal Image Analysis for Evaluation of Focal Hyperpigmentary Changes in Intermediate Age-Related Macular Degeneration. Transl. Vis. Sci. Technol. 2016, 5, 3. [Google Scholar] [CrossRef]
Hussain, M.A.; Bhuiyan, A.; C, D.L.; Theodore Smith, R.; R, H.G.; Ishikawa, H.; J, S.S.; Ramamohanarao, K. Classification of healthy and diseased retina using SD-OCT imaging and Random Forest algorithm. PLoS ONE 2018, 13, e0198281. [Google Scholar] [CrossRef]
Lee, C.S.; Baughman, D.M.; Lee, A.Y. Deep learning is effective for the classification of OCT images of normal versus Age-related Macular Degeneration. Ophthalmol. Retin. 2017, 1, 322–327. [Google Scholar] [CrossRef]
Wang, D.; Wang, L. On OCT Image Classification via Deep Learning. IEEE Photonics J. 2019, 11, 1–14. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Kermany, D.S.; Goldbaum, M.; Cai, W.; Valentim, C.C.S.; Liang, H.; Baxter, S.L.; McKeown, A.; Yang, G.; Wu, X.; Yan, F.; et al. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell 2018, 172, 1122–1131.e9. [Google Scholar] [CrossRef] [PubMed]
Li, F.; Chen, H.; Liu, Z.; Zhang, X.; Wu, Z. Fully automated detection of retinal disorders by image-based deep learning. Graefes Arch. Clin. Exp. Ophthalmol. 2019, 257, 495–505. [Google Scholar] [CrossRef] [PubMed]
Choi, J.Y.; Yoo, T.K.; Seo, J.G.; Kwak, J.; Um, T.T.; Rim, T.H. Multi-categorical deep learning neural network to classify retinal images: A pilot study employing small database. PLoS ONE 2017, 12, e0187336. [Google Scholar] [CrossRef] [PubMed]
Abitbol, E.; Miere, A.; Excoffier, J.B.; Mehanna, C.J.; Amoroso, F.; Kerr, S.; Ortala, M.; Souied, E.H. Deep learning-based classification of retinal vascular diseases using ultra-widefield colour fundus photographs. BMJ Open Ophthalmol. 2022, 7, e000924. [Google Scholar] [CrossRef] [PubMed]
Zhang, G.; Sun, B.; Zhang, Z.; Wu, S.; Zhuo, G.; Rong, H.; Liu, Y.; Yang, W. Hypermixed Convolutional Neural Network for Retinal Vein Occlusion Classification. Dis. Markers 2022, 2022, 1730501. [Google Scholar] [CrossRef]
Xu, W.; Yan, Z.; Chen, N.; Luo, Y.; Ji, Y.; Wang, M.; Zhang, Z. Development and Application of an Intelligent Diagnosis System for Retinal Vein Occlusion Based on Deep Learning. Dis. Markers 2022, 2022, 4988256. [Google Scholar] [CrossRef] [PubMed]
Pin, K.; Nam, Y.; Ha, S.; Han, J. Deep Learning Based on Ensemble to Diagnose of Retinal Disease using Optical Coherence Tomography. In Proceedings of the 2021 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 15–17 December 2021; pp. 661–664. [Google Scholar] [CrossRef]
Khan, A.; Pin, K.; Aziz, A.; Han, J.W.; Nam, Y. Optical Coherence Tomography Image Classification Using Hybrid Deep Learning and Ant Colony Optimization. Sensors 2023, 23, 6706. [Google Scholar] [CrossRef]
Kulyabin, M.; Zhdanov, A.; Nikiforova, A.; Stepichev, A.; Kuznetsova, A.; Ronkin, M.; Borisov, V.; Bogachev, A.; Korotkich, S.; Constable, P.A.; et al. OCTDL: Optical Coherence Tomography Dataset for Image-Based Deep Learning Methods. Sci. Data 2024, 11, 365. [Google Scholar] [CrossRef]
Arisholm, E.; Briand, L.C.; Johannessen, E.B. A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. J. Syst. Softw. 2010, 83, 2–17. [Google Scholar] [CrossRef]
Lin, T.L.; Lu, C.T.; Karmakar, R.; Nampalley, K.; Mukundan, A.; Hsiao, Y.P.; Hsieh, S.C.; Wang, H.C. Assessing the Efficacy of the Spectrum-Aided Vision Enhancer (SAVE) to Detect Acral Lentiginous Melanoma, Melanoma In Situ, Nodular Melanoma, and Superficial Spreading Melanoma. Diagnostics 2024, 14, 1672. [Google Scholar] [CrossRef]
Gray, D.; Bowes, D.; Davey, N.; Sun, Y.; Christianson, B. Further thoughts on precision. In Proceedings of the 15th Annual Conference on Evaluation & Assessment in Software Engineering (EASE 2011), Durham, UK, 11–12 April 2011; IET: Hertfordshire, UK, 2011; pp. 129–133. [Google Scholar]
Saito, T.; Rehmsmeier, M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 2015, 10, e0118432. [Google Scholar] [CrossRef]
Sasaki, Y. The truth of the F-measure. Teach Tutor Mater 2007, 1, 1–5. [Google Scholar]
Barbosa, G.; Carvalho, E.; Guerra, A.; Torres-Costa, S.; Ramião, N.; Parente, M.L.P.; Falcão, M. Deep Learning to Distinguish Edema Secondary to Retinal Vein Occlusion and Diabetic Macular Edema: A Multimodal Approach Using OCT and Infrared Imaging. J. Clin. Med. 2025, 14, 1008. [Google Scholar] [CrossRef] [PubMed]
Wang, C.Y.; Mukundan, A.; Liu, Y.S.; Tsao, Y.M.; Lin, F.C.; Fan, W.S.; Wang, H.C. Optical Identification of Diabetic Retinopathy Using Hyperspectral Imaging. J. Pers. Med. 2023, 13, 939. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Example of OCT images of (a) a healthy patient, (b) a patient with DME, (c) a patient with RVO, and (d) a patient with other pathology.

Figure 2. Overview of the stacked model architecture.

Figure 3. Overview of the classifier used for the stacked model.

Figure 4. Grad-CAM heatmaps overlaid on OCT images, highlighting the regions of interest used by each model for prediction. From left to right: VGG16, VGG19, and ResNet50.

Figure 5. Confusion matrices for the three models used: (a) VGG16; (b) VGG19; (c) ResNet50 (DME—diabetic macular edema; RVO—retinal vein occlusion).

Figure 6. ROC curves for the three models used: (a) VGG16; (b) VGG19; (c) ResNet50 (0: Normal; 1: DME—diabetic macular edema; 2: RVO—retinal vein occlusion; 3: other).

Figure 7. Confusion matrices for the three stacked model networks: (a) VGG16 & ResNet50; (b) VGG19 & ResNet50; (c) VGG16 & VGG19; (d) VGG16 & VGG19 & ResNet50 (DME—diabetic macular edema; RVO—retinal vein occlusion).

Figure 8. ROC curves for the three stacked model networks: (a) VGG16 & ResNet50; (b) VGG19 & ResNet50; (c) VGG16 & VGG19; (d) VGG16 & VGG19 & ResNet50 (0: Normal; 1: DME—diabetic macular edema; 2: RVO—retinal vein occlusion; 3: other).

Figure 9. Number of features extracted in the different combinations.

Table 1. Number of images by class of the Kermany dataset.

DME	Drusen	CNV	Normal
11,171	8118	31,838	50,317

Table 2. Added hidden layer parameters.

Parameter	VGG16 & VGG19	ResNet50
Neurons	2048	1024
Activation Function	ReLU	ReLU
Dropout	0.5	0.5

Table 3. Number of images by class of the ULS São João dataset.

	DME	RVO	Normal	Other
Training set	980	948	968	992
Validation set	245	237	242	248
Age (years)	65.14 ± 11.41	70.13 ± 11.60	56.16 ± 17.14	59.93 ± 20.77
Gender, n (%)
Male	120 (55.05)	104 (50.24)	85 (41.67)	101 (50.50)
Female	98 (44.96)	103 (49.76)	119 (58.33)	99 (49.50)

Table 4. Number of features of the different combinations.

Combination	Number of Features
VGG16 & VGG19	50,176
VGG16 & ResNet50	27,136
VGG19 & ResNet50	27,136
VGG16 & VGG19 & ResNet50	52,224

Table 5. Average and best metrics for selected models using the ULS São João dataset.

	VGG16		VGG19		ResNet50
Metric (%)	Average	Maximum	Average	Maximum	Average	Maximum
Accuracy	84.57 ± 0.72	85.63	84.87 ± 0.73	85.54	85.71 ± 4.19	88.90
Precision	98.82 ± 0.39	98.35	99.04 ± 0.58	98.61	99.00 ± 0.73	99.72
Recall	84.57 ± 0.72	85.63	84.87 ± 0.73	85.54	85.71 ± 4.19	88.90
$F_{2}$	87.08 ± 0.56	87.90	87.37 ± 0.62	87.87	88.05 ± 3.65	90.70

Table 6. Average and best metrics for the stacked models using the ULS São João dataset.

Metric (%)	SNN 1	SNN 2	SNN 3	SNN 4
Accuracy	86.5	85.7	89.6	90.7
Precision	99.4	98.3	98.6	99.2
Recall	86.5	85.7	89.6	90.7
$F_{2}$	88.8	88.0	91.3	92.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rebolo, P.; Barbosa, G.; Carvalho, E.; Areias, B.; Guerra, A.; Torres-Costa, S.; Ramião, N.; Falcão, M.; Parente, M. Development of Stacked Neural Networks for Application with OCT Data, to Improve Diabetic Retinal Health Care Management. Information 2025, 16, 649. https://doi.org/10.3390/info16080649

AMA Style

Rebolo P, Barbosa G, Carvalho E, Areias B, Guerra A, Torres-Costa S, Ramião N, Falcão M, Parente M. Development of Stacked Neural Networks for Application with OCT Data, to Improve Diabetic Retinal Health Care Management. Information. 2025; 16(8):649. https://doi.org/10.3390/info16080649

Chicago/Turabian Style

Rebolo, Pedro, Guilherme Barbosa, Eduardo Carvalho, Bruno Areias, Ana Guerra, Sónia Torres-Costa, Nilza Ramião, Manuel Falcão, and Marco Parente. 2025. "Development of Stacked Neural Networks for Application with OCT Data, to Improve Diabetic Retinal Health Care Management" Information 16, no. 8: 649. https://doi.org/10.3390/info16080649

APA Style

Rebolo, P., Barbosa, G., Carvalho, E., Areias, B., Guerra, A., Torres-Costa, S., Ramião, N., Falcão, M., & Parente, M. (2025). Development of Stacked Neural Networks for Application with OCT Data, to Improve Diabetic Retinal Health Care Management. Information, 16(8), 649. https://doi.org/10.3390/info16080649

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Development of Stacked Neural Networks for Application with OCT Data, to Improve Diabetic Retinal Health Care Management

Abstract

1. Introduction

2. Materials and Methods

2.1. Ethical Approval

2.2. Datasets

2.3. Models Fine-Tuning

2.3.1. Kermany

2.3.2. ULS São João

2.4. Stacked Model

2.5. Metric Indices

3. Results

3.1. Grad-CAM

3.2. ULS São João Dataset

3.3. Stacked Neural Networks

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI