Next Article in Journal
Electronic Voting Worldwide: The State of the Art
Previous Article in Journal
Explainable Machine Learning Model for Source Type Identification of Mine Inrush Water
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Development of Stacked Neural Networks for Application with OCT Data, to Improve Diabetic Retinal Health Care Management

1
INEGI—Institute of Science and Innovation in Mechanical and Industrial Engineering, 4200-465 Porto, Portugal
2
Department of Ophthalmology, Centro Hospitalar e Universitário São João, 4200-319 Porto, Portugal
3
Department of Surgery and Physiology, Faculty of Medicine, University of Porto, 4200-319 Porto, Portugal
4
DEMec, Faculty of Engineering, University of Porto, 4200-319 Porto, Portugal
*
Author to whom correspondence should be addressed.
Information 2025, 16(8), 649; https://doi.org/10.3390/info16080649
Submission received: 22 May 2025 / Revised: 14 July 2025 / Accepted: 29 July 2025 / Published: 30 July 2025
(This article belongs to the Special Issue AI-Based Biomedical Signal Processing)

Abstract

Background: Retinal diseases are becoming an important public health issue, with early diagnosis and timely intervention playing a key role in preventing vision loss. Optical coherence tomography (OCT) remains the leading non-invasive imaging technique for identifying retinal conditions. However, distinguishing between diabetic macular edema (DME) and macular edema resulting from retinal vein occlusion (RVO) can be particularly challenging, especially for clinicians without specialized training in retinal disorders, as both conditions manifest through increased retinal thickness. Due to the limited research exploring the application of deep learning methods, particularly for RVO detection using OCT scans, this study proposes a novel diagnostic approach based on stacked convolutional neural networks. This architecture aims to enhance classification accuracy by integrating multiple neural network layers, enabling more robust feature extraction and improved differentiation between retinal pathologies. Methods: The VGG-16, VGG-19, and ResNet50 models were fine-tuned using the Kermany dataset to classify the OCT images and afterwards were trained using a private OCT dataset. Four stacked models were then developed using these models: a model using the VGG-16 and VGG-19 networks, a model using the VGG-16 and ResNet50 networks, a model using the VGG-19 and ResNet50 models, and finally a model using all three networks. The performance metrics of the model includes accuracy, precision, recall, F 2 -score, and area under of the receiver operating characteristic curve (AUROC). Results: The stacked neural network using all three models achieved the best results, having an accuracy of 90.7%, precision of 99.2%, a recall of 90.7%, and an F 2 -score of 92.3%. Conclusions: This study presents a novel method for distinguishing retinal disease by using stacked neural networks. This research aims to provide a reliable tool for ophthalmologists to improve diagnosis accuracy and speed.

1. Introduction

Retinal vein occlusion (RVO) is the second most prevalent retinal vascular condition after diabetic retinopathy (DR) and is a leading cause of vision loss and impairment. This condition arises when a blockage in the retinal venous system leads to thrombus formation, increasing capillary pressure and causing fluid leakage and macular edema—both of which contribute significantly to visual impairment [1,2].
Globally, RVO affects approximately 16 million individuals, with its occurrence influenced by factors such as age, gender, and underlying health conditions. The risk of developing RVO increases with age, particularly in individuals over 60 [3].
RVO is classified into two main types: branch retinal vein occlusion (BRVO) and central retinal vein occlusion (CRVO), depending on the location of the blockage. BRVO typically develops at arteriovenous crossings, whereas CRVO occurs near or at the lamina cribrosa of the optic nerve. The most common cause is the compression of retinal veins by adjacent atherosclerotic arteries. CRVO is associated with features such as optic disc swelling, dilated and tortuous retinal veins, widespread hemorrhages, cotton wool spots, macular edema, and capillary non-perfusion throughout the retina. In contrast, BRVO shares similar characteristics but is limited to the region served by the affected vein. Vision loss in RVO is mainly due to macular edema but can also result from macular ischemia or complications like neovascular glaucoma and vitreous hemorrhage [1]. While RVO can be classified in two subtypes, in many image diagnosis workflows, they are considered as RVO-associated edema due to shared features observable in OCT images.
Diabetic macular edema (DME) is also a major cause of vision loss that is characterized by retinal thickening and intraretinal fluid accumulation. It remains a leading complication of diabetic retinopathy and is a critical diagnosis in OCT-based diagnosis.
The use of artificial intelligence (AI) in medical imaging for diagnostic support has shown promising results across various medical fields such as ophthalmology. AI can assist in screening optical coherence tomography (OCT) images for disease detection, helping to reduce challenges such as human fatigue, bias, and cognitive limitations [4].
Early approaches to computer-assisted diagnosis (CAD) relied on extracting relevant features, which required domain expertise and varied depending on the dataset. These features often included texture or structural information from the images [5]. More recently, deep learning (DL) techniques have been employed to classify OCT images, with the development of convolutional neural networks (CNNs) significantly enhancing image classification performance [6,7].
CNNs use convolutional operators to extract features directly from image pixels, making them highly effective for image classification tasks. Convolutional, pooling, and activation layers process the images to identify key features, which are then passed through a fully connected network for classification [8]. These networks have been used to classify multiple diseases simultaneously, making them highly practical for real-world disease screening applications. An example of this is the work developed by Kermany et al. [9], where a DL model was developed that was capable of classifying four distinct types of classes using OCT scans, with them being normal eyes, eyes with choroidal neovascularization (CNV), eyes with diabetic macular edema (DME), and eyes with drusen. Extensive research has been conducted using the Kermany dataset with study’s reaching an accuracy of 98.6% using the VGG-16 network [10].
However, a notable research gap remains in distinguishing retinal vein occlusion (RVO) from diabetic macular edema (DME). Most existing studies that classify these two retinal conditions primarily utilize fundus images. For instance, Choi et al. [11] applied random forest transfer learning (TL) using the VGG-19 architecture for fundus image classification, achieving an accuracy of 74.7% in differentiating background diabetic retinopathy (DR) from RVO. Similarly, Abitbol et al. [12] developed a deep learning model for classifying widefield color fundus images, achieving an accuracy of 85.2% for DR and 88.4% for RVO. Other studies have concentrated solely on distinguishing various types of RVO from healthy eyes using fundus images [13,14]. There is limited research on using OCT images for diagnosing patients with RVO, despite this imaging modality being the preferred non-invasive diagnostic tool. Consequently, developing a model that can accurately diagnose RVO using OCT has significant potential for clinical application.
A model developed by Pin et al. [15] composed of an ensemble of two TL models, MobileNetV3Large and ResNet50, to classify OCT images into four distinct categories, RVO, age-related macular degeneration (AMD), central serous chorioretinopathy (CSCR), and diabetic macular edema (DME), achieving an overall accuracy of 91.69%. Similarly, Khan et al. [16] modified three models and extracted the features using TL. The best features were selected using ant colony optimization. These features were then classified using k-nearest neighbors and support vector machines, reaching accuracies of 99.1% and 97.4% with and without feature optimization, respectively. More recently, Kulyabin et al. [17] evaluated the performance of VGG-16 and ResNet50 on a publicly available OCT dataset [17], which includes conditions such as AMD, DME, RVO, epiretinal membrane, retinal artery occlusion, and vitreomacular interface disease. The VGG-16 network achieved the highest accuracy at 89.5%.
This study proposes a DL approach based on an ensemble technique known as stacking for the automated diagnosis of retinal diseases, namely DME, RVO-associated edema, and other generic retinal pathologies, using OCT images. These pathologies can be observed in Figure 1. The model is specifically evaluated for its ability to differentiate RVO-associated edema from DME, a critical distinction for guiding appropriate treatment strategies.

2. Materials and Methods

2.1. Ethical Approval

The Institutional Ethics Review Board of ULS São João approved this study. The protocol conformed with the canons of the Declaration of Helsinki for research involving human participants, as well as the EU’s General Data Protection Regulation.
Informed consent was waived in view of the retrospective nature of the study. The dataset includes OCT images with Normal, DME, RVO, and other pathologies. All personal identifiers were removed to ensure patient anonymity and data confidentiality.

2.2. Datasets

This work used two datasets: the publicly available Kermany dataset and a private dataset from ULS São João in Porto, Portugal. The Kermany dataset is a well-known public dataset for OCT images that consists of four distinct classes: Normal, Drusen, DME, and CNV. It includes over 100,000 images that were obtained with Heidelberg Spectralis, selected from retrospective cohorts of adult patients from multiple institutions.
The ULS São João dataset contains OCT and fundus images from four distinct classes: normal, RVO, DME, and other retinal pathologies.

2.3. Models Fine-Tuning

Three models were used in this study, VGG-16, VGG-19, and ResNet50 architectures. These models were first trained on the Kermany public OCT dataset so that the networks were trained on OCT images. Afterwards, the models learned weights through the Kermany dataset were used in the ULS São João dataset.

2.3.1. Kermany

The models were fine-tuned using images from the Kermany dataset. To achieve this, the feature extractor was unfrozen, and transfer learning (TL) was applied by loading pretrained weights of the respective models from the ImageNet dataset.
The Kermany dataset has over one hundred thousand images, although some of these images are duplicates. After removing the duplicates, the dataset had 101,444 images. A total of 250 images from each class from the Kermany dataset were used for the test set. The remaining images were split into training and validation sets, with a ratio of 80%/20%, respectively.
Due to the class imbalance of the Kermany dataset, a weighted random sampler was used during training to ensure equal sampling probability for each class during training. Table 1 shows the number of images for each class in the Kermany dataset.
The used loss function was the cross-entropy loss, defined in Equation (1), where N is the number of training images, θ represents the model’s trainable parameters, X i is the OCT image, y i is the image label, and y ^ i is the predicted probability after the Softmax activation function being applied.
C E L o s s ( θ ) = 1 N i = 1 N y i log ( y ^ i ( X i , θ ) )
In order to prevent overfitting, L2 regularization was employed and an additional term was added to the loss function:
L o s s ( θ ) = C E L o s s ( θ ) + α θ 2 ,
where α is the parameter that governs the regularization.
An early stop criterion was also employed to prevent overfitting during training by monitoring the validation F 2 -score metric. If the metric did not increase for 10 epochs, then the training would stop.
Two different TL strategies were analyzed: a strategy where only the classifier parameters were trained (feature extractor was frozen) and a strategy where all the parameters, including the feature extractor, were trained from the initialized weights of the pretrained model.
The VGG16 and VGG19 pretrained models have two hidden layers with 4096 neurons, while the ResNet50 pretrained model has an input layer with 2048 neurons and an output layer with 1000 neurons. In this work, a third hidden layer was added to the VGG16 and VGG19 classifiers, with 2048 neurons and an output layer with 4 neurons. The ResNet50 classifier also suffered changes, with a hidden layer with 1024 neurons added to the classifier and an output layer with 4 neurons. Table 2 shows a summary for the added hidden layer to the models.

2.3.2. ULS São João

The used ULS São João dataset comprises of 5360 images with DME, RVO, normal, and other pathologies. For the test set, 125 images of each class were used, meaning a total of 500 images. The remaining images were also split using the same ratio as used in the Kermany dataset, i.e., 80%/20%. The number of images by class and the demographic distribution is shown in Table 3. The other pathologies class was created to include other pathologies than DME and RVO to enhance the model’s generalization and applicability.
In order to improve the model generalization, the ULS São João dataset was divided using patient-wise separation, ensuring that images from the same patient were confined in the same set, avoiding data leakage, a strategy that was already carried out in previous works. The models were initialized with the weights that were previous learned by the respective model using the Kermany dataset.
No preprocessing was applied to the images from the ULS São João dataset since the models were trained with a Kermany dataset that also did not have any preprocessing.
Data augmentation techniques were employed when dealing with this dataset during the training to enhance the models’ generalization. The techniques used were a combination of Horizontal Flip, a Gaussian blur, and a combination of both techniques.
As in the Kermany dataset, L2 regularization was applied to the models to prevent overfitting. The loss function used was the cross-entropy loss function, the same as Equation (1). Cross-validation with five folds was also employed to prevent overfitting on the dataset. The early stop criterion was also maintained with a patience of 10 epochs.

2.4. Stacked Model

The stacked model is based on an ensemble technique called stacking, where the features of two or more models are extracted to an intermediate layer of the network and then concatenated together to form a large feature set. This feature set is then used to train a classifier. An overview of this model is represented in Figure 2.
In this work, the stacked model is based on the models that were previous studied using the ULS São João dataset. The model parameters with the best results for the Kermany dataset were combined, in order to initialize the weight in the new model.
The overview of the classifier used in the stacked model is present in Figure 3. The classifier architecture was based on the classifiers of the VGG models. The number of features differ from each combination of architectures used in the stacked model. The number of features of each combination is depicted in Table 4.
The dataset used for the stacked model was the ULS São João dataset. Similar to when only one model was used, a cross-validation with 5 folds was performed, L2 regularization was performed, and an early stopping criterion with a patience of 10 epochs was used in order to prevent overfitting on the dataset.

2.5. Metric Indices

The performance of the model in this study was assessed with various metrics, including accuracy, precision, recall, and F 2 -score.
Accuracy is a commonly used evaluation metric that reflects the overall performance of a classification model. It is defined as the ratio between the number of correct predictions and the total number of predictions, providing a general measure of how often the model is correct [18,19].
Precision is a quantitative metric that assesses the correctness of positive predictions by quantifying the proportion of true positive instances among all instances classified as positive. It is calculated by dividing the number of true positives by the sum of true positives and false positives [19,20].
Recall is a quantitative metric that evaluates the model’s ability to identify all relevant positive cases. It is defined as the ratio of true positives to the sum of true positives and false negatives. The concept of this metric emphasizes the model’s ability to correctly identify correct instances from the overall amount of actual true positive instances. This measure is crucial in scenarios where failing to detect positive instances can lead to significant consequences [19,21].
The F 2 -score metric comes from the general F β -score metric. If β is 1, then the metric is the F 1 -score, which achieves a fair assessment between precision and recall. However, in the F 2 -score metric, there is a higher emphasis on recall than prediction. This is because in medical scenarios, there is a higher weight in detecting positive instances, since it is essential to diagnose positive instances [22].
A c c u r a c y = T P + T N T P + T N + F P + F N × 100
P r e c i s i o n = T P T P + F P × 100
R e c a l l = T P T P + F N × 100
F 2 - score = ( 1 + 2 2 ) P r e c i s i o n × R e c a l l ( 2 2 × P r e c i s i o n ) + R e c a l l × 100
where T P = True positive, T N = True negative, F P = False positive, and F N = False negative.

3. Results

The Kermany dataset was only used to obtain initial weights on networks using OCT images. As such, the results will only focus on the dataset from ULS São João.
All models were trained for a maximum of 500 epochs. The ADAM optimizer was used and early stopping was used to prevent overfitting by monitoring the F 2 -score. If this metric did not increase for 10 epochs then the training would stop immediately. The training of all models was performed on an NVIDA RTX A6000 GPU workstation.
Since the Kermany dataset was only used to obtained initial weights for OCT images, and since the SNN was trained using the ULS São João dataset, only detailed results are shown for the ULS São João dataset.

3.1. Grad-CAM

Gradient-weighted Class Activation Mapping (Grad-CAM) is a widely used technique for interpreting deep convolutional neural networks by highlighting the image’s most relevant regions to a model’s decision. In this implementation, the process begins by selecting the last layer in the feature extractor, where the feature maps are captured via a forward hook during the model’s forward pass.
Following the prediction, the gradient of the target class score is back-propagated with respect to the final layer’s activations. These gradients are globally averaged to produce importance weights that reflect each feature map’s contribution to the class prediction. A weighted sum of the feature maps is then computed to generate a class activation map (CAM), which is subsequently rectified and normalized.
Finally, the CAM is resized and overlaid on the original input image using a heatmap. This visual representation reveals the spatial focus of the model’s attention, offering insight into which regions influenced the classification—supporting ophthalmologists in understanding model-driven diagnostic decisions.
In Figure 4, Grad-CAM visualizations generated using the VGG16, VGG19, and ResNet50 models on the ULS São João OCT dataset are presented. These heatmaps highlight the regions of the images that each model focuses on to make its predictions.
Although the images are different from each model, the Grad-CAM results for ResNet50 indicate a more concentrated focus on a smaller region of the image. In contrast, VGG16 and VGG19 are distributing their attention across broader areas of the input image.

3.2. ULS São João Dataset

The results obtained for the three selected models using the ULS São João dataset are present in Table 5. The confusion matrices for the best fold for the respective model are shown in Figure 5, and in Figure 6, the respective ROC curves are shown.

3.3. Stacked Neural Networks

Regarding the SNN, four different combinations were used (Table 4) with the three models selected. The confusion matrices for the best fold for the respective SNN is shown in Figure 7, and the respective ROC curve is shown in Figure 8. The metric results are present in Table 6, where: SNN 1—VGG16 & ResNet50; SNN 2—VGG19 & ResNet50; SNN 3—VGG16 & VGG19; and SNN 4—VGG16 & VGG19 & ResNet50.

4. Discussion

The usage of AI in the medical field, specifically, in retinal disease diagnosis, can significantly increase the capabilities of healthcare professionals, enabling them to deliver fast and accurate diagnosis. This improved efficiency leads to better patient outcomes through earlier detection and disease treatment, as well as allowing healthcare personnel to manage a higher volume of cases, thereby expanding their reach and overall impact.
The results from the public dataset by Kermany et al. [9] are not presented, as this dataset was used solely for pretraining the convolutional layers of the models applied to the OCT images.
Regarding the results from the private ULS São João dataset, the three models performed similarly, achieving an accuracy of 85% and an F 2 -score of 88%. Analysis of the confusion matrices and ROC curves reveals that the models were most effective at classifying normal eyes, while also demonstrating strong performance in identifying other ocular pathologies.
The confusion matrices reveal misclassifications between DME and RVO pathologies. Given the limited research on classifying RVO using OCT images, and the existing gap in distinguishing RVO from DME, such misclassifications are to be expected. There is also a recurring pattern of misclassifying pathological cases as normal. This can be attributed to the nature of the OCT images used, as not all scans from a patient with a given pathology visibly exhibit signs of the disease—leading to some being mistaken for images of healthy eyes.
Building on the previous observations, the SNNs used in this study show that combining two models, when one of the models is the ResNet50 model, yields results similar to those of a single-model approach. The recurring issue of misclassifying DME and RVO is also evident in these SNNs. Notably, the SNN that combines the VGG16 and VGG19 models demonstrates improved performance, with accuracy rising to 90% and the F 2 -score increasing to 91%, compared to single-model usage. It also has a better performance of classifying DME and RVO pathologies, although there are still multiple misclassifications. The SNN incorporating all three models achieves comparable results to the VGG16 and VGG19 combination, with slightly enhanced metrics.
A possible explanation for the similar performance between the SNNs using the ResNet50 model and single models, as well as between the SNNs combining VGG16 and VGG19 and that of using all three models, lies in the number of features fed into the classifier. The convolutional layers of ResNet50 extract 2048 features—significantly fewer than the 25,088 features extracted by both VGG16 and VGG19. As a result, incorporating the ResNet50 features does not substantially increase the total feature set, which may explain the lack of significant improvement in performance in both cases. This can be better perceived in Figure 9. It can also be perceived by observing the grad-CAM images from Figure 4. The grad-CAM images from the VGG-16 and VGG-19 networks show that these networks see a broader area of the map comparing with the grad-CAM images from the ResNet50 architecture.
The performance metrics were calculated using a weighted average across all categories to ensure a balanced evaluation. For consistency and comparability with existing literature, the metrics from previous studies were inferred from their respective confusion matrices as reported. It is important to note that discrepancies may exist between the figures reported here and those in the original publications due to differing methods used for computing average values. The top-performing model in this study demonstrated superior overall accuracy compared to the results of Abitbol et al. [12], achieved comparable accuracy to that of Pin et al. [15] and Kulyabin et al. [17], and performed slightly below the models proposed by Khan et al. [16] and Barbosa et al. [23]. In terms of individual class performance, particularly for RVO, the proposed model attained higher accuracy and recall than all referenced studies except for those by Khan and Barbosa, although, in the case of Barbosa et al., the model had more inputs than the OCT images, such as infrared images and clinical data from the patients.
This model provides a viable clinical tool for differentiating between RVO and DME and reinforces the importance of incorporating multimodal imaging data and patient-specific clinical information in the development of AI-based diagnostic systems. The use of stacked neural networks enhances the model’s ability to distinguish between OCT images of DME and RVO pathologies more effectively.
This study has several limitations. Notably, no image preprocessing was applied to the OCT scans used for training the model. Introducing preprocessing techniques—such as contrast enhancement—could potentially improve the model’s performance by highlighting relevant features more clearly. Additionally, while the model is capable of classifying patients into four categories, i.e., normal, DME, RVO, or other pathologies, it does not specify the nature of the “other” conditions. In such cases, clinical expertise is still required to determine the exact diagnosis. Future work should aim to expand the number of diagnostic categories included in the training process to enable more comprehensive and precise classification. Future studies should also think about removing the ResNet50 model form the SNN and add a fundus image classification, like the multimodal algorithm developed by Barbosa et al. [23], with hyperspectral imaging, like in the work performed by Wang et al. [24]. To improve the model’s generalization, future training with a dataset containing OCT images taken from different devices should be performed.

5. Conclusions

The integration of artificial intelligence into medical diagnostics, as illustrated in this study, presents significant advantages. AI-based models enable remote diagnostic capabilities, which can decrease the frequency of hospital visits and contribute to reducing overall healthcare costs. This approach not only improves access to specialist consultations but also supports the regular updating of medical records, ultimately enhancing the quality and continuity of patient care.
This study presents a novel approach for diagnosing retinal diseases using stacked neural networks. The results show that incorporating features from multiple models and stacking them enhances the model’s performance, especially in distinguishing between DME and RVO—two retinal diseases that share similar visual characteristics. Another important observation is that the use of the ResNet50 model in the stacking algorithm only marginally improves performance, primarily due to the limited number of features extracted by this model. Therefore, when employing a stacking algorithm, it is crucial that the model extracts a sufficient number of features to produce a meaningful impact on performance.
Although the four-class model introduces greater complexity by incorporating a category for other pathologies, it shows strong potential for clinical use. Including this additional class is essential, as it more accurately reflects real-world clinical settings, where patients may present with a variety of retinal conditions.

Author Contributions

Conceptualization, P.R. and M.P.; methodology, P.R., M.P., G.B. and E.C.; software, P.R.; validation, G.B., E.C., S.T.-C. and M.F.; writing—original draft preparation, P.R.; writing—review and editing, G.B., E.C., B.A., A.G., S.T.-C., N.R., M.F. and M.P.; Supervision, A.G., N.R., M.F. and M.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of the Centro Hospitalar Universitário of São João (protocol code number 329/2023, approval date 29 February 2024).

Informed Consent Statement

Due to the retrospective nature of this study and the use of deidentified data, the requirement for written informed consent was waived.

Data Availability Statement

The datasets presented in this article are not readily available because the data are part of an ongoing study.

Acknowledgments

The authors would like to acknowledge the support from LAETA.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AIArtificial Intelligence
AUCArea Under Curve
CADComputer-Assisted Diagnosis
CAMClass Activation Map
CNNConvolutional Neural Network
CNVChoroidal Neovascularization
DLDeep Learning
DMEDiabetic Macular Edema
DRDiabetic Retinopathy
OCTOptical Coherence Tomography
ROCReceiving Operating Curve
RVORetinal Vein Occlusion
SNNStacked Neural Network
TLTransfer Learning

References

  1. Nicholson, L.; Talks, S.J.; Amoaku, W.; Talks, K.; Sivaprasad, S. Retinal vein occlusion (RVO) guideline: Executive summary. Eye 2022, 36, 909–912. [Google Scholar] [CrossRef]
  2. Rogers, S.; McIntosh, R.L.; Cheung, N.; Lim, L.; Wang, J.J.; Mitchell, P.; Kowalski, J.W.; Nguyen, H.; Wong, T.Y. The prevalence of retinal vein occlusion: Pooled data from population studies from the United States, Europe, Asia, and Australia. Ophthalmology 2010, 117, 313–319.e1. [Google Scholar] [CrossRef]
  3. Khayat, M.; Williams, M.; Lois, N. Ischemic retinal vein occlusion: Characterizing the more severe spectrum of retinal vein occlusion. Surv. Ophthalmol. 2018, 63, 816–850. [Google Scholar] [CrossRef]
  4. Schmitz-Valckenberg, S.; Göbel, A.P.; Saur, S.C.; Steinberg, J.S.; Thiele, S.; Wojek, C.; Russmann, C.; Holz, F.G.; For The Modiamd-Study, G. Automated Retinal Image Analysis for Evaluation of Focal Hyperpigmentary Changes in Intermediate Age-Related Macular Degeneration. Transl. Vis. Sci. Technol. 2016, 5, 3. [Google Scholar] [CrossRef]
  5. Hussain, M.A.; Bhuiyan, A.; C, D.L.; Theodore Smith, R.; R, H.G.; Ishikawa, H.; J, S.S.; Ramamohanarao, K. Classification of healthy and diseased retina using SD-OCT imaging and Random Forest algorithm. PLoS ONE 2018, 13, e0198281. [Google Scholar] [CrossRef]
  6. Lee, C.S.; Baughman, D.M.; Lee, A.Y. Deep learning is effective for the classification of OCT images of normal versus Age-related Macular Degeneration. Ophthalmol. Retin. 2017, 1, 322–327. [Google Scholar] [CrossRef]
  7. Wang, D.; Wang, L. On OCT Image Classification via Deep Learning. IEEE Photonics J. 2019, 11, 1–14. [Google Scholar] [CrossRef]
  8. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
  9. Kermany, D.S.; Goldbaum, M.; Cai, W.; Valentim, C.C.S.; Liang, H.; Baxter, S.L.; McKeown, A.; Yang, G.; Wu, X.; Yan, F.; et al. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell 2018, 172, 1122–1131.e9. [Google Scholar] [CrossRef] [PubMed]
  10. Li, F.; Chen, H.; Liu, Z.; Zhang, X.; Wu, Z. Fully automated detection of retinal disorders by image-based deep learning. Graefes Arch. Clin. Exp. Ophthalmol. 2019, 257, 495–505. [Google Scholar] [CrossRef] [PubMed]
  11. Choi, J.Y.; Yoo, T.K.; Seo, J.G.; Kwak, J.; Um, T.T.; Rim, T.H. Multi-categorical deep learning neural network to classify retinal images: A pilot study employing small database. PLoS ONE 2017, 12, e0187336. [Google Scholar] [CrossRef] [PubMed]
  12. Abitbol, E.; Miere, A.; Excoffier, J.B.; Mehanna, C.J.; Amoroso, F.; Kerr, S.; Ortala, M.; Souied, E.H. Deep learning-based classification of retinal vascular diseases using ultra-widefield colour fundus photographs. BMJ Open Ophthalmol. 2022, 7, e000924. [Google Scholar] [CrossRef] [PubMed]
  13. Zhang, G.; Sun, B.; Zhang, Z.; Wu, S.; Zhuo, G.; Rong, H.; Liu, Y.; Yang, W. Hypermixed Convolutional Neural Network for Retinal Vein Occlusion Classification. Dis. Markers 2022, 2022, 1730501. [Google Scholar] [CrossRef]
  14. Xu, W.; Yan, Z.; Chen, N.; Luo, Y.; Ji, Y.; Wang, M.; Zhang, Z. Development and Application of an Intelligent Diagnosis System for Retinal Vein Occlusion Based on Deep Learning. Dis. Markers 2022, 2022, 4988256. [Google Scholar] [CrossRef] [PubMed]
  15. Pin, K.; Nam, Y.; Ha, S.; Han, J. Deep Learning Based on Ensemble to Diagnose of Retinal Disease using Optical Coherence Tomography. In Proceedings of the 2021 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 15–17 December 2021; pp. 661–664. [Google Scholar] [CrossRef]
  16. Khan, A.; Pin, K.; Aziz, A.; Han, J.W.; Nam, Y. Optical Coherence Tomography Image Classification Using Hybrid Deep Learning and Ant Colony Optimization. Sensors 2023, 23, 6706. [Google Scholar] [CrossRef]
  17. Kulyabin, M.; Zhdanov, A.; Nikiforova, A.; Stepichev, A.; Kuznetsova, A.; Ronkin, M.; Borisov, V.; Bogachev, A.; Korotkich, S.; Constable, P.A.; et al. OCTDL: Optical Coherence Tomography Dataset for Image-Based Deep Learning Methods. Sci. Data 2024, 11, 365. [Google Scholar] [CrossRef]
  18. Arisholm, E.; Briand, L.C.; Johannessen, E.B. A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. J. Syst. Softw. 2010, 83, 2–17. [Google Scholar] [CrossRef]
  19. Lin, T.L.; Lu, C.T.; Karmakar, R.; Nampalley, K.; Mukundan, A.; Hsiao, Y.P.; Hsieh, S.C.; Wang, H.C. Assessing the Efficacy of the Spectrum-Aided Vision Enhancer (SAVE) to Detect Acral Lentiginous Melanoma, Melanoma In Situ, Nodular Melanoma, and Superficial Spreading Melanoma. Diagnostics 2024, 14, 1672. [Google Scholar] [CrossRef]
  20. Gray, D.; Bowes, D.; Davey, N.; Sun, Y.; Christianson, B. Further thoughts on precision. In Proceedings of the 15th Annual Conference on Evaluation & Assessment in Software Engineering (EASE 2011), Durham, UK, 11–12 April 2011; IET: Hertfordshire, UK, 2011; pp. 129–133. [Google Scholar]
  21. Saito, T.; Rehmsmeier, M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 2015, 10, e0118432. [Google Scholar] [CrossRef]
  22. Sasaki, Y. The truth of the F-measure. Teach Tutor Mater 2007, 1, 1–5. [Google Scholar]
  23. Barbosa, G.; Carvalho, E.; Guerra, A.; Torres-Costa, S.; Ramião, N.; Parente, M.L.P.; Falcão, M. Deep Learning to Distinguish Edema Secondary to Retinal Vein Occlusion and Diabetic Macular Edema: A Multimodal Approach Using OCT and Infrared Imaging. J. Clin. Med. 2025, 14, 1008. [Google Scholar] [CrossRef] [PubMed]
  24. Wang, C.Y.; Mukundan, A.; Liu, Y.S.; Tsao, Y.M.; Lin, F.C.; Fan, W.S.; Wang, H.C. Optical Identification of Diabetic Retinopathy Using Hyperspectral Imaging. J. Pers. Med. 2023, 13, 939. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Example of OCT images of (a) a healthy patient, (b) a patient with DME, (c) a patient with RVO, and (d) a patient with other pathology.
Figure 1. Example of OCT images of (a) a healthy patient, (b) a patient with DME, (c) a patient with RVO, and (d) a patient with other pathology.
Information 16 00649 g001
Figure 2. Overview of the stacked model architecture.
Figure 2. Overview of the stacked model architecture.
Information 16 00649 g002
Figure 3. Overview of the classifier used for the stacked model.
Figure 3. Overview of the classifier used for the stacked model.
Information 16 00649 g003
Figure 4. Grad-CAM heatmaps overlaid on OCT images, highlighting the regions of interest used by each model for prediction. From left to right: VGG16, VGG19, and ResNet50.
Figure 4. Grad-CAM heatmaps overlaid on OCT images, highlighting the regions of interest used by each model for prediction. From left to right: VGG16, VGG19, and ResNet50.
Information 16 00649 g004aInformation 16 00649 g004b
Figure 5. Confusion matrices for the three models used: (a) VGG16; (b) VGG19; (c) ResNet50 (DME—diabetic macular edema; RVO—retinal vein occlusion).
Figure 5. Confusion matrices for the three models used: (a) VGG16; (b) VGG19; (c) ResNet50 (DME—diabetic macular edema; RVO—retinal vein occlusion).
Information 16 00649 g005
Figure 6. ROC curves for the three models used: (a) VGG16; (b) VGG19; (c) ResNet50 (0: Normal; 1: DME—diabetic macular edema; 2: RVO—retinal vein occlusion; 3: other).
Figure 6. ROC curves for the three models used: (a) VGG16; (b) VGG19; (c) ResNet50 (0: Normal; 1: DME—diabetic macular edema; 2: RVO—retinal vein occlusion; 3: other).
Information 16 00649 g006
Figure 7. Confusion matrices for the three stacked model networks: (a) VGG16 & ResNet50; (b) VGG19 & ResNet50; (c) VGG16 & VGG19; (d) VGG16 & VGG19 & ResNet50 (DME—diabetic macular edema; RVO—retinal vein occlusion).
Figure 7. Confusion matrices for the three stacked model networks: (a) VGG16 & ResNet50; (b) VGG19 & ResNet50; (c) VGG16 & VGG19; (d) VGG16 & VGG19 & ResNet50 (DME—diabetic macular edema; RVO—retinal vein occlusion).
Information 16 00649 g007
Figure 8. ROC curves for the three stacked model networks: (a) VGG16 & ResNet50; (b) VGG19 & ResNet50; (c) VGG16 & VGG19; (d) VGG16 & VGG19 & ResNet50 (0: Normal; 1: DME—diabetic macular edema; 2: RVO—retinal vein occlusion; 3: other).
Figure 8. ROC curves for the three stacked model networks: (a) VGG16 & ResNet50; (b) VGG19 & ResNet50; (c) VGG16 & VGG19; (d) VGG16 & VGG19 & ResNet50 (0: Normal; 1: DME—diabetic macular edema; 2: RVO—retinal vein occlusion; 3: other).
Information 16 00649 g008
Figure 9. Number of features extracted in the different combinations.
Figure 9. Number of features extracted in the different combinations.
Information 16 00649 g009
Table 1. Number of images by class of the Kermany dataset.
Table 1. Number of images by class of the Kermany dataset.
DMEDrusenCNVNormal
11,171811831,83850,317
Table 2. Added hidden layer parameters.
Table 2. Added hidden layer parameters.
ParameterVGG16 & VGG19ResNet50
Neurons20481024
Activation FunctionReLUReLU
Dropout0.50.5
Table 3. Number of images by class of the ULS São João dataset.
Table 3. Number of images by class of the ULS São João dataset.
DMERVONormalOther
Training set980948968992
Validation set245237242248
Age (years)65.14 ± 11.4170.13 ± 11.6056.16 ± 17.1459.93 ± 20.77
Gender, n (%)
   Male120 (55.05)104 (50.24)85 (41.67)101 (50.50)
   Female98 (44.96)103 (49.76)119 (58.33)99 (49.50)
Table 4. Number of features of the different combinations.
Table 4. Number of features of the different combinations.
CombinationNumber of Features
VGG16 & VGG1950,176
VGG16 & ResNet5027,136
VGG19 & ResNet5027,136
VGG16 & VGG19 & ResNet5052,224
Table 5. Average and best metrics for selected models using the ULS São João dataset.
Table 5. Average and best metrics for selected models using the ULS São João dataset.
VGG16VGG19ResNet50
Metric (%) Average Maximum Average Maximum Average Maximum
Accuracy84.57 ± 0.7285.6384.87 ± 0.7385.5485.71 ± 4.1988.90
Precision98.82 ± 0.3998.3599.04 ± 0.5898.6199.00 ± 0.7399.72
Recall84.57 ± 0.7285.6384.87 ± 0.7385.5485.71 ± 4.1988.90
F 2 87.08 ± 0.5687.9087.37 ± 0.6287.8788.05 ± 3.6590.70
Table 6. Average and best metrics for the stacked models using the ULS São João dataset.
Table 6. Average and best metrics for the stacked models using the ULS São João dataset.
Metric (%)SNN 1SNN 2SNN 3SNN 4
Accuracy86.585.789.690.7
Precision99.498.398.699.2
Recall86.585.789.690.7
F 2 88.888.091.392.3
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Rebolo, P.; Barbosa, G.; Carvalho, E.; Areias, B.; Guerra, A.; Torres-Costa, S.; Ramião, N.; Falcão, M.; Parente, M. Development of Stacked Neural Networks for Application with OCT Data, to Improve Diabetic Retinal Health Care Management. Information 2025, 16, 649. https://doi.org/10.3390/info16080649

AMA Style

Rebolo P, Barbosa G, Carvalho E, Areias B, Guerra A, Torres-Costa S, Ramião N, Falcão M, Parente M. Development of Stacked Neural Networks for Application with OCT Data, to Improve Diabetic Retinal Health Care Management. Information. 2025; 16(8):649. https://doi.org/10.3390/info16080649

Chicago/Turabian Style

Rebolo, Pedro, Guilherme Barbosa, Eduardo Carvalho, Bruno Areias, Ana Guerra, Sónia Torres-Costa, Nilza Ramião, Manuel Falcão, and Marco Parente. 2025. "Development of Stacked Neural Networks for Application with OCT Data, to Improve Diabetic Retinal Health Care Management" Information 16, no. 8: 649. https://doi.org/10.3390/info16080649

APA Style

Rebolo, P., Barbosa, G., Carvalho, E., Areias, B., Guerra, A., Torres-Costa, S., Ramião, N., Falcão, M., & Parente, M. (2025). Development of Stacked Neural Networks for Application with OCT Data, to Improve Diabetic Retinal Health Care Management. Information, 16(8), 649. https://doi.org/10.3390/info16080649

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop