Multi-View Machine Learning with an Optic Disc Localization for Glaucoma Diagnosis

Siying, Parichat; Muangphara, Thitima; Photun, Aphinan; Suppalap, Siwakon; Klinsuwan, Thitiphat; Phruancharoen, Chatmongkol; Treeyawedkul, Sirinan; Chira-adisai, Tanate; Supattanawong, Ying; Wangkeeree, Rabian

doi:10.3390/app16073158

Open AccessArticle

Multi-View Machine Learning with an Optic Disc Localization for Glaucoma Diagnosis

by

Parichat Siying

¹,

Thitima Muangphara

¹,

Aphinan Photun

¹,

Siwakon Suppalap

¹

,

Thitiphat Klinsuwan

¹

,

Chatmongkol Phruancharoen

²,

Sirinan Treeyawedkul

²,

Tanate Chira-adisai

²

,

Ying Supattanawong

² and

Rabian Wangkeeree

^1,*

¹

Department of Mathematics, Faculty of Science, Naresuan University, Phitsanulok 65000, Thailand

²

Department of Ophthalmology, Faculty of Medicine, Naresuan University, Phitsanulok 65000, Thailand

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(7), 3158; https://doi.org/10.3390/app16073158 (registering DOI)

Submission received: 28 January 2026 / Revised: 7 March 2026 / Accepted: 13 March 2026 / Published: 25 March 2026

Download

Browse Figures

Versions Notes

Abstract

Glaucoma affects a significant proportion of people worldwide, and if it progresses to a severe stage, it can lead to blindness. Furthermore, screening and accurately diagnosing glaucoma present a challenge for ophthalmologists. Early detection of glaucoma is crucial because it allows for timely treatment, potentially preventing severe complications that could lead to blindness. Typically, ophthalmologists diagnose glaucoma by analyzing eye fundus photographs to assess the ratio of the optic cup and optic disc (CDR). Machine learning algorithms can assist in glaucoma detection by classifying fundus images. This study introduces image preprocessing techniques for optic disc localization, combined with an integrating multi-view network for accurate glaucoma classification. The dataset used in this research was obtained from Naresuan University Hospital. The study found that EfficientNet underwent training using the Adam optimizer at a fixed learning rate of 0.0001. The multi-view network achieved Accuracy 90.48%, AUC 95.14%, Precision 81.95%, Recall 75.90%, and F1-score 78.72%. This study presents an effective approach to assist ophthalmologists in detecting early-stage glaucoma and glaucoma, thereby improving diagnostic efficiency.

Keywords:

machine learning; fundus image; optic disc localization; glaucoma classification; multi-view network

1. Introduction

The World Health Organization (WHO) reports documenting more than 2.2 billion individuals are visually impaired [1], with glaucoma being a major cause of global blindness. During the initial phase, glaucoma does not show symptoms, so people do not know they have glaucoma. If left unmanaged for an extended period, the disease will gradually become more severe until we eventually lose our vision. Glaucoma is related to the circulation of fluid within the eyeball. If this process malfunctions, it can increase intraocular pressure (IOP) [2]. Although IOP measurement is one method ophthalmologists use to diagnose glaucoma, this approach has limitations due to the high variability of IOP levels. Therefore, ophthalmologists make a diagnosis based on the evaluation of CDR. Healthy eyes are defined by the cup-to-disc ratio (CDR), which typically varies from 0.2 to 0.3. A higher value may indicate the presence of glaucoma [3]. However, while measuring intraocular pressure and evaluating the CDR are fundamental methods for diagnosing glaucoma, these approaches have limitations. They often require a long waiting time for the ophthalmologist’s diagnosis results, and may also depend on the level of expertise of each ophthalmologist. In the modern era, with rapid advancements in artificial intelligence technology, introducing machine learning to help with the preliminary diagnosis of glaucoma has emerged as a promising approach to reduce diagnosis time and alleviate the workload of ophthalmologists. Although machine learning is now widely used for glaucoma classification, most existing research still relies on single-view image approaches, which may not fully capture the complex structure of the optic disc region of interest (ROI) in fundus images. Recent deep learning approaches have achieved promising performance in glaucoma detection using convolutional neural networks on retinal fundus images. For example, Ref. [4] employed ConvNeXt with attention mechanisms for glaucoma detection in highly myopic populations, while ref. [5] developed a fully automated CNN-based system achieving high AUC values across multiple backbone architectures. Furthermore, Ref. [6] proposed a multimodal fusion framework integrating fundus and OCT images to enhance diagnostic performance. Several studies have also incorporated optic disc segmentation or focused on either full fundus images or cropped optic disc regions to improve classification accuracy. Despite these encouraging results, most existing approaches rely on a single-view representation, either global fundus images or localized optic disc regions independently. The complementary contribution between global retinal structures and localized optic disc features has not been systematically investigated. Moreover, the effectiveness of preprocessing techniques and optic disc localization strategies is rarely quantified through controlled ablation experiments and statistical significance analysis, leaving the relative contribution of different visual perspectives unclear. To address this limitation, the study [7] introduces a multi-view learning approach to enhance the detection of abnormalities related to glaucoma. Learning from a more diverse set of images or features can improve the accuracy and effectiveness of glaucoma classification. The most important structure of the retina for diagnosing glaucoma is this region in fundus images, which has a clear boundary. The effectiveness of the analysis heavily depends on the accuracy localization of the optic disc (OD). Research studies [8,9] used thresholding to identify the location of the OD, a segmentation technique that separates the ROI from other parts of a retinal imagery. However, Knowledge gaps persist about the accuracy of this optic disc localization. Some research focuses on traditional machine learning models, while others use handcrafted features. Conversely, research [10,11,12] has utilized deep learning to facilitate the extraction of fundus imagery. U-Net was implemented to identify the optic disc [13], and AU-Net was employed to separate the OD from uninterested areas within the fundus photograph. However, these models still require expert ophthalmologists to manually annotate the optic disc margins, which is time-consuming due to the substantial clinical demands of ophthalmologists and the varying levels of expertise and professional tenure across specialists.

Therefore, this research proposes a method for classifying glaucoma by combining original images (full retinal photographs) and cropped images of the optic disc, using a weighting strategy for both perspectives. This method is called multi-view machine learning. The proposed framework systematically evaluates the complementary contributions of each view through ablation analysis and statistical validation. Although the statistical improvement of the multi-view model over the cropped optic disc model alone is modest and does not reach significance (mean AUC difference

= 0.0061

,

p = 0.0707

), the proposed framework offers clear scientific contributions and practical advantages beyond the marginal AUC gain. By integrating two complementary visual perspectives—the global retinal context captured in the full fundus image and the localized fine-grained morphological features of the optic disc region—the multi-view framework constructs a more complete representation of glaucomatous pathology than either view can provide independently. From a practical standpoint, this design also reduces dependency on successful optic disc localization: when localization is imperfect due to poor image quality or atypical disc morphology, the original fundus image retains diagnostic value as a complementary source. Furthermore, the integrated framework better reflects actual ophthalmic practice, in which clinicians simultaneously consider both the overall retinal architecture and specific optic disc characteristics during glaucoma evaluation.

The cropping of the optic disc is performed using a traditional method because we consider the high clinical volume for ophthalmologists, which may make the process of delineating the OD from the fundus image labor-intensive. Since the original image may not reflect the details of the lesion as clearly as the cropped image of the optic disc, which may show the details of glaucoma damage more clearly, the leverage of a multi-view method helps to increase the model’s field of vision beyond a single perspective. This also improves the efficiency of glaucoma classification, and we expect the model to learn and understand the characteristics of glaucoma damage more comprehensively and accurately.

2. Methods

This research introduces a multi-view network approach that combines the original image and a cropped image of the optic disc using weighting methods. The multi-view network’s operation is shown in Figure 1. This will improve the efficiency of glaucoma diagnosis by incorporating multiple viewing perspectives rather than a single view, thus enhancing the ability to classify glaucoma types.

2.1. Data Description

This dataset comprises 14,255 retinal images with a resolution of 512 × 457 pixels in RGB color format, including both right and left retinal images. These retinal images were collected from Naresuan University Hospital, acquired via a solitary fundus photography device (EIDON, Centervue^®, Padova, Italy) between May 2021 and November 2023. Image annotation was performed by a team of five ophthalmologists (three glaucoma specialists and two retina specialists). Label Studio was used to standardize labeling. Unevaluable or ambiguous retinal images were excluded from this study. Of the remaining retinal image dataset, 11,071 images were labeled as non-glaucoma, while the remaining 3183 images were labeled as glaucoma. Figure 2 illustrates the difference between the two groups of retinal images. The first image (Figure 2a) shows an enlarged retinal head with a high CDR and thinning of the OD contours, a key characteristic commonly found in glaucoma. Conversely, Figure 2b shows a normal CDR and no cup excavation, indicating that the image is non-glaucoma. All screening and diagnosis were verified and confirmed by a licensed ophthalmologist to ensure the reliability and accuracy of labeling.

This study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board (IRB) of Naresuan University Hospital (approval number: [IRB No. P3-0101/2566]). Written informed consent was waived due to the retrospective and anonymized nature of the data.

2.2. Data Preprocessing

In this process, we perform optic disc localization, which is how ophthalmologists diagnose glaucoma. The ophthalmologist will assess the CDR. This ratio measures the proportion between CD anatomy. This step is illustrated as the first stage in Figure 3, beginning with the conversion of the original color fundus image (RGB: Red, Green, Blue) into the LAB color space. Following this, only the luminance (L) channel is extracted, which represents the brightness of the image in grayscale, as shown in Figure 4. We chose a channel capable of efficiently separating luminance from other color components, specifically the L channel. Since the optic disc is brighter than other areas in the retinal image, selecting the L channel enables efficient and accurate identification of the boundaries of the optic disc.

The L channel is subsequently processed with contrast enhancement via the Contrast Limited Adaptive Histogram Equalization (CLAHE) technique, as shown in Figure 5. This technique enhances image sharpness, allowing for clearer visualization of the retinal structures compared to images without this technique. Since the OD exhibits the most luminous area, CLAHE can further enhance its clarity [14,15].

Next, we utilized Principal Component Analysis (PCA) to extract key characteristics inherent in the processed fundus images. We focus on high-intensity pixel regions that can vary, as PCA helps to enhance the optic disc ROI, which is the brightest region within the retinal photograph. After the PCA transformation is complete [16], we use thresholding to select pixels with an intensity range of 99% to separate the brightest regions, setting the threshold at 99% intensity for these regions. Thereafter, refine the image by denoising [16,17] by applying a morphological closing process, as shown in Figure 6, to enhance the quality and clarity of the optic disc area in the retinal image. This procedure fills in small gaps within the thresholded region, resulting in a more complete and clearer retinal image.

Finally, as shown in Figure 7. Contour detection and centroid calculation using image moments are performed. The contours are first detected to detect the contours of the optic disc, and the largest contour is selected. Then, Image Moments, which are mathematical summaries of the geometric properties of the shape, are used to compute the centroid of the optic disc. The centroid calculation follows the mathematical formulation presented in the following equation:

M_{i j} = \sum_{x} \sum_{y} x^{i} y^{j} N (x, y)

(1)

where:

M_{i j}

is the moment value of order

(i, j)

, with x is the horizontal coordinate (abscissa) of a pixel within the image frame and y is the vertical coordinate (ordinate) of a pixel in the image. In addition,

i and j

are the orders of the moment, while

N (x, y)

is the intensity value of the pixel at position (x, y). The calculation of the centroid position is performed using the following equation:

c X = \frac{M_{10}}{M_{00}}, c Y = \frac{M_{01}}{M_{00}}

(2)

where:

M_{10}

is the summation of pixel values multiplied by the x-coordinate,

M_{01}

is the sum of pixel intensities multiplied by the y-coordinate, and

M_{00}

is the area of the contour.

By calculation, we can determine the centroid of the region with the highest brightness distribution, which is likely the optical disc. This process ensures precisely segment and identify our target area. We crop the optic disc after we have identified the center of the optic disc. The cropped image around the center will have a size of 224 × 224 pixels. If the cropped image falls outside or below 224 × 224 pixels, it will be immediately discarded. The goal of optic nerve cropping is to focus on the optic nerve area to enhance visibility of its internal features. Figure 8 shows the entire process and steps that result in the cropped image of the optic disc.

2.3. Data Partitioning

Section 2.1 involved selecting the retinal image dataset. After this step, we discarded retinal images due to a prevalent issue with images where the optic disc boundaries were not visible, resulting in cropped images that did not match the desired size. After filtering, the remaining dataset consisted of 12,165 images: 9332 images labeled as non-glaucoma and 2833 images with glaucoma. The dataset was then divided into two parts: 90% (10,948 images) for the training dataset and 10% (1217 images) for the model test set. A 5-fold cross-validation method was then used with the training dataset to adjust the model parameters. Each time, 20% of the training dataset was reserved as the validation partition, and the remaining 80% was used for model training.

To prevent potential data leakage arising from multiple images belonging to the same patient, the dataset was partitioned at the patient level rather than the image level. Since the dataset includes both right and left eye fundus images from the same individuals, all images from a given patient were assigned exclusively to either the training set or the test set, ensuring that no patient appeared in both partitions simultaneously. This patient-level stratification guarantees that the reported performance metrics genuinely reflect the model’s ability to generalize to unseen patients, and are not artificially inflated by correlations between images from the same individual.

Figure 8. Preprocessing process to obtain the cropped optic disc region images.

2.4. Multi-View Network

To enhance the efficiency in glaucoma classification from retinal images, we introduce a classification of glaucoma on a framework utilizing a Multi-View Network Architecture (see Figure 1). Our framework is informed by the GARDNet architecture [7], a research that presents the integration of data from multiple views. This research utilizes two image views after extracting the features of each view, specifically the original retinal image view, which is the entire retinal image showing the overall anatomical structure of the eye (refer to Section 2.4.1), and another view which is a cropped image of the optic disc, which is a zoomed-in view of the optic disc and a small area around it, as viewing the optic disc can indicate the CDR, which is a hallmark feature of glaucoma, Section 2.4.2). During training, each perspective is processed separately using pretrained deep neural networks. Model evaluation is performed using a systematic 5-fold cross-validation, extracting the probability of the class predicted by the model in each fold. The best model for each perspective in each fold is then selected based on the maximal mean validation accuracy or the minimal validation loss across the 5-fold iterations. After this step, the data from both perspectives are combined using a weighted probability combination calculation, as shown in the equation below.

P_{multiview} = \frac{(P_{original} \cdot W_{1}) + (P_{crop} \cdot W_{2})}{W_{1} + W_{2}}

(3)

As shown in the equation,

P_{original}

denotes the probability of the class obtained from the original image model, and

P_{crop}

signifies the probability of the class obtained from the cropped image model in the optic disc area. The weights

W_{1}

and

W_{2}

are assigned numerical values such that

W_{1}

+

W_{2}

= 2. In this research, we assign weights to each viewpoint using the traditional manual method. Weights are assigned based on the relative performance of each model. The view with higher classification performance receives a greater weight to influence the final decision more significantly. The weights are systematically determined based on classification results from all folds to ensure the combined probability (

P_{multiview}

) achieves maximum accuracy. Finally, the combined probability is passed via an argmax function to determine the classification result with the highest probability. This result is then used to diagnose the glaucoma status of patients from fundus images, thus increasing the precision and reliability of the automatic screening system (Figure 9 shows an example of applying weight from a multi-view network).

2.4.1. Original Fundus Image View

For the original fundus images, data augmentation was employed before training the model to enhance dataset diversity and improve the model’s generalization capabilities. The augmentation pipeline consisted of random horizontal flipping with a 50% probability and vertical flipping with a 20% probability. Additionally, the images were randomly rotated within a range of −15 to +15 degrees. Color-based augmentations were also applied, consisting of random adjustments to brightness (−0.2, 0.2), contrast (−0.2, 0.2), saturation (−0.1, 0.1), and hue (−0.05, 0.05). Following augmentation, we experimented with various candidate pre-trained models to identify the optimal framework. The best-performing model was EfficientNet_V2_M [18,19,20], utilizing weights pre-trained on IMAGENET1K_V1. The model was trained and implemented via Adam optimizer with an initial learning rate of

1 \times 10^{- 4}

and a dropout mechanism of 0.3 to mitigate overfitting. The binary cross-entropy loss was adopted as the optimization criterion. Training was carried out for 100 epochs, with an early stopping mechanism implemented to terminate training if failure to improve was observed for 3 successive epochs. The best model from each fold was saved, and the final architecture contained approximately 52,859,637 trainable parameters. The evaluation parameters as shown in Table 1.

2.4.2. Cropped Optic Disc View

The same augmentation techniques applied to the original images were used for the cropped optic disc region. After augmentation, the images were fed into the model. The best-performing pre-trained model for this task was EfficientNet_V2_S [18,19,20], utilizing IMAGENET1K_V1 weights. All training settings were identical to those used with the original images. The model had approximately 20,178,769 trainable parameters. The evaluation parameters as shown in Table 1.

2.4.3. Computing Environment

We conducted this research using a high-performance computing system equipped with an NVIDIA A100 GPU (x86_64 architecture) and an Intel(R) Xeon(R) Gold 5318Y CPU running at 2.10 GHz. The system had 10 GB of RAM and operated on Windows 11 Pro. All model development and experiments were implemented using the PyTorch version 2.5.1 + CUDA 12.4 framework.

3. Results

This section describes the experimental framework to evaluate the performance of the Multi-View neural framework using both the original fundus images and the cropped optic disc images. Different weights were assigned to the two perspectives as follows: the original image was weighted at 0.5, while the cropped image of the area of interest was weighted at 1.5. Because the cropped image is more accurate in evaluating the model than the original image, we gave it a higher weight. Table 1 details the criteria utilized to evaluate the performance of the model, including Loss, Accuracy, AUC-ROC, Recall, Precision, and F1-score. In the confusion matrix, we present the best-performing fold in Figure 10a, where we aim to minimize the misclassification of glaucoma cases as non-glaucoma. Meanwhile, the ROC curve in Figure 10b illustrates how accurately the model can distinguish between glaucoma and non-glaucoma cases. To enhance the effectiveness of the model’s results, Grad-CAM (Gradient-weighted Class Activation Mapping) [21] is applied as shown in Figure 11 to illustrate the areas of focus for glaucoma classification. The results show that, in cases where the model correctly predicts a healthy fundus image non-glaucoma, the heatmap appears around the outer region of the OD. In contrast, for fundus glaucomatous images, the heat map appears on the optic disk area, which is the focal point that ophthalmologists use for the diagnosis of glaucoma.

Table 1. Evaluation of model performance (mean ± SD) for original, cropped optic disc images and multiview network.

Views	Loss	Accuracy	AUC	Precision	Recall	F1-Score
Original	0.2633 ± 0.0084	0.8874 ± 0.0029	0.9391 ± 0.0027	0.7881 ± 0.0234	0.7081 ± 0.0312	0.7450 ± 0.0097
Cropped	0.2547 ± 0.0259	0.9011 ± 0.0071	0.9452 ± 0.0065	0.8064 ± 0.0231	0.7590 ± 0.0547	0.7802 ± 0.023
Multiview	0.2314 ± 0.0136	0.9048 ± 0.0053	0.9514 ± 0.0045	0.8195 ± 0.0156	0.7590 ± 0.0401	0.7872 ± 0.0178

Figure 10. (a) Confusion matrix and (b) ROC curve of the Multi-View Network.

Figure 11. Grad-CAM heatmaps for glaucoma detection on (a) original images and (b) cropped optic disc images.

Since our work involves multiple perspectives, we will demonstrate the contribution of each perspective as shown in Table 2. Ablation analysis, based on the AUC values from five-fold cross-validation, reveals that the multi-view model achieved the highest AUC value, with an average AUC of

0.9514 \pm 0.0049

when comparing the results of each viewpoint, the average AUC of the cropped optic disc image was

0.9452 \pm 0.0071

, and the original fundus image had an average AUC of

0.9391 \pm 0.0031

. Therefore, using only one viewpoint cannot be as efficient as combining both, demonstrating the effectiveness of the multi-view concept proposed in this research. We performed a paired t-test to assess the statistical significance of performance differences from a 5-round cross-examination. The results showed that the multi-view model significantly improved compared to the original image, with a mean AUC difference of

0.0123

and a p-value of

p = 0.0061

. However, although the multi-view model had a higher average AUC compared to the cropped image (with a mean difference of 0.0061), the improvement was not statistically significant (

p = 0.0707

), reflecting that the cropped image of the optic disc already provides strong glaucoma differentiation characteristics, as shown in Table 3. This research uses 5-fold cross-validation, where the resulting values are mean ± standard deviation (SD). The mean is the average of the values over 5 folds, using AUC (Average Universal Context), and SD is the standard deviation. SD indicates the dispersion of the results over each fold; a smaller SD suggests a more stable model. ± represents the dispersion of the 5-fold results. Compared to previous studies, as shown in Table 4, our research achieved comparable or superior AUC performance while maintaining methodological simplicity. Unlike other studies that only performed segmentation or cropping, or those using more than one view, our results indicate that combining individual image views through multi-view analysis significantly improves glaucoma screening efficiency.

4. Discussion

4.1. Analysis of PCA

For preprocessing, we employed Principal Component Analysis (PCA) as one of the processing steps. In this study, PCA was applied to fundus images to help identify and select the optic disc region based on changes in luminance that PCA can distinguish. Since the optic disc generally has high intensity values in the fundus image, it is saliently discernible. As shown in Figure 12, given by aspect ratios of 99% to 95% in cases 1 and 3, all configurations demonstrated cropping the optical disc area equally proficiently. In case 2, it can be seen that the 99–97% aspect ratio finds the optical disc area well and crops the image with the optical disc perfectly centered, and the 96% and 95% aspect ratios cannot crop the optical disc area. In case 4, each aspect ratio can crop the optical disc area, but some parts of the optical disc are missing. In case 5, all aspect ratios crop the optical disc accurately, but the 99% aspect ratio crops the optical disc to the center of the image. Therefore, we choose 99% for cropping the optical disc area because it is the most efficient. The data loss after cropping the optical disc area is attributable to localization failure, preventing the desired area from being cropped. There is also an image that can be cropped, but with incomplete captures.

4.2. Data Imbalanced

Our dataset contains approximately 22% glaucoma cases, indicating an imbalance in our study. Therefore, this research chose to use indicators to evaluate the model’s performance in addition to considering only Accuracy, which was 0.9048. The parameters used were AUC, Recall, Precision, and F1-score. The values obtained from this study were AUC 0.9514, Recall 0.7590, Precision 0.8195, and F1-score 0.7872. Incorporating additional metrics to enhance evaluation rigor, the model provides a more comprehensive assessment than using Accuracy in isolation. Because the dataset we are using is dominated by non-glaucoma retinal images, relying exclusively on accuracy may not be sufficiently precise and may fail to capture the model’s performance. AUC indicates the model’s discriminative power between the normal and glaucomatous sample groups. At the same time, Recall represents the model’s proficiency in detecting glaucoma patients, which is crucial in medicine, and Precision indicates the number of false positives. Finally, the F1-score reflects the balance between recall and precision. Even with imbalanced data, we can effectively use these indicators to evaluate our models. In medicine, Recall and F1-score are even more important than standard accuracy.

Figure 12. The experiment assessed the performance of PCA across variance retention thresholds of 95% to 99%.

4.3. Data Augmentation

The data augmentation settings we applied to this project consisted of 8 parameters: Vertical Flip (VF), Horizontal Flip (HF), Rotation (R), Brightness (B), Contrast (C), Saturation (S), Hue (H), and Dropout (D). Distinct scenarios were evaluated, and the AUC value served as the primary metric for the discriminative performance of each case. Data augmentation was performed on both the original and cropped images, as shown in Table 5. The experiment indicates that case 7 emerged as the optimal case relative to alternative configurations, using both the original and cropped images of the optic disc. The best performance in case 7 stems from this experiment, which was the combination of augmentation while maintaining the original structure, but with increased data variety. The image became sharper, revealing the distinct boundaries of the optic disc and its surrounding areas. Although this case was the best in our experiment, it is subject to certain constraints. When using heterogeneous datasets with varying image fidelity, fine-tuning may be necessary for the dataset being used in the experiment.

4.4. Multiview Weight

Table 6 shows a Multiview experiment where weighting factors were assigned to the original image and a cropped image of the optic disc. We gave more weight to the cropped image than the original image to find the optimal weighting ratio for our model to classify glaucoma effectively. Experimental results indicate that a weighting ratio of 0.5 for the original view and 1.5 for the cropped view provided the most robust results in Multi-view analysis, achieving Accuracy 0.9048, AUC 0.9514, Precision 0.8195, Recall 0.7590, and F1-score 0.7872 compared to other weighting methods. However, certain metrics with the 0.5 original image and 1.5 cropped image weights underperformed relative to those of alternative weighting schemes. Overall, we considered that all weighted values should exceed the performance evaluation of each viewing angle. This shows that giving optimal weights to each viewing angle allows our model to optimize efficacy than weights that are extreme disparities or excessive similarities. A limitation of our work is that we have manually assigned a weight to each viewing angle, which may have been automated to yield better optimal results. The recommendation of this study is to develop and evaluate a multi-view framework for detecting glaucoma, comprising 14,255 retinal images. A key difference from other approaches is the inclusion of both original images and images cropped to target the optic disc. Combining these two sets of data enhances the model’s ability to detect glaucoma in both the original image context and in images specifically cropped to target the optic disc, a region associated with glaucoma. This study utilized an experimental weighting method to predict overall results by combining each perspective. This weighting strategy considered the validation performance in each round of Folds, aiming to create a balance between the individual perspectives so that they reinforce each other when combined. Although this approach doesn’t utilize learned or optimized fusion, we’ve chosen it as a practical and interpretable data integration strategy. The use of fixed weights allows for clear interpretation of the contributions of each perspective. However, researchers recognize that adaptive or attention-based fusion strategies may be even more effective by dynamically learning the importance of each perspective. Future work will explore selecting the best perspective-based fusion strategies to enhance methodological strength and applicability. Although the dataset is relatively large, comprising 14,255 images, the data was collected from only one institution. As a result, it is not yet possible to fully confirm the model’s ability to diagnose glaucoma in other populations using different imaging equipment or in different environments. This is because retinal images from different institutions may have different disease prevalences and population distributions, which could affect the model’s performance. Future work will focus on testing the accuracy of this research against datasets from multiple institutions and publicly available standard datasets to assess its robustness and external validity better. Such evaluations will provide more relevant evidence for the model’s application in a broader clinical environment.

While the performance gain of the multi-view model over the cropped optic disc model alone was not statistically significant under the controlled experimental conditions of this study, the multi-view approach is expected to provide more meaningful benefits in specific clinical scenarios. First, in cases where optic disc localization fails or produces an imperfect crop—for example, in low-quality fundus images affected by poor illumination, motion blur, or media opacity—the original full fundus image provides complementary contextual information that preserves overall diagnostic utility. Second, when glaucomatous damage extends beyond the immediate optic disc boundary, such as diffuse retinal nerve fiber layer defects or peripapillary atrophy, the full fundus view captures these peripheral features that the tightly cropped image cannot include. Third, in heterogeneous real-world clinical environments where image quality is variable and optic disc boundaries may be indistinct, the multi-view framework is expected to demonstrate greater resilience and consistency than a single-view model. Therefore, even when the marginal statistical benefit is small under the relatively controlled conditions of this study, the combined framework provides a more robust, clinically comprehensive diagnostic tool that reduces single-point failure risk and better mirrors the holistic visual assessment performed by ophthalmologists in practice.

4.5. Fine-Tuning and Backbone

The models we used are EfficientnetV2-S and EfficientnetV2-M, with the architecture shown in Figure 13 and Figure 14) [24,25]. Our experimental protocol involved gradually unfreezing the models to determine which method yielded the most accurate results. We non-trainable all layers and then unfreezing process, beginning from the terminal layer of the model, at specific intervals: 5%, 10%, and 100% of all layers. Our findings reveal that total layer unfreezing was the most effective, but overfitting must be avoided. Freezing all layers and then unfreezing 5% might result in suboptimal performance, impeding the detection of subtle morphological features and the capacity for feature recalibration in our dataset. In the case of glaucoma, the optic disc would have abnormalities compared to a healthy eye since glaucoma can be diagnosed by examining an increased cup-to-disc ratio (the central indentation of the optic disc enlarges, resulting in a higher than normal cup-to-disc ratio), as well as observing thinning of the optic margin and indentation of the optic disc margin. Petechiae (hemorrhagic spots) at the edge of the optic disc may also be visible.

4.6. Model Training Setup

This training stage involves training the model. We considered key architectural variables that optimize model efficacy, including optimization, learning rate, activation functions, and loss functions. The optimization we chose is the Adam optimizer, which can independently adjust the rate of each parameter. We experimented with several learning rate levels, but found that the optimal learning rate was 0.0001, which helps reduce the loss oscillation problem caused by an excessively high learning rate. We also used Binary Cross-Entropy (BCE) along with Sigmoid activation in the final layer to predict whether the data is glaucoma or non-glaucoma. Sigmoid activation is suitable for predicting data that is divided into two classes. In our work, we also employed 5-fold cross-validation to evaluate the model on diverse datasets, which helps reduce bias from data splitting. Finally, we implemented early stopping to prevent overfitting. We added a dropout in the classifier layer.

4.7. Grad-CAM Analysis

To demonstrate which areas of the image the model focuses on, the GradCAM technique was used to increase the model’s confidence and reliability. After implementation, the resulting heatmaps indicated the areas that the model prioritized most in determining whether a person had glaucoma. Considering Figure 11, the original set of images (a) reveals heatmaps that are distributed across the ROI of the model and encompass the entirety of the retinal image. In contrast, image (b), cropped to the optic disc set of images, shows heatmaps that are clearly focused on the area of interest and exhibit specificity directly towards the optic disc. In glaucoma diagnosis, significant pathological features such as optic cup enlargement and neuroretinal rim thinning are considered. In this case, the heatmaps primarily show high-activation zones in the optic disc, aligning with the established diagnostic criteria for this condition. These features confirm the model’s learning efficiency and the use of clinically important structural information for accurate diagnosis. Conversely, heatmaps often show decreased color intensity or dispersion to areas other than the optic disc in non-glaucoma cases, which may not be directly related to glaucoma diagnosis. A comparison between the original images (a) and the cropped optic disc images (b) indicates that the original image necessitates a spatial search to locate the optic disc from a larger image, resulting in a wider heatmap distribution, while the cropped image facilitates a refined and more precise heatmap focusing on the anatomical markers of the optic disc. In summary, analysis of both heatmap formats not only confirms the model’s validity but also suggests that the use of cropped optic disc images may optimize predictive accuracy and the accuracy of Grad-CAM in focusing on critical pathological markers, leading to the development of more decision support systems in the future. Furthermore, the model can learn the characteristics of glaucoma that cause retinal damage by more accurately capturing the ratio between the optic disc and the optic cup. It can also consider the abnormal angles of blood vessels in the retina and the thickness of the optic margin.

5. Conclusions

This study introduces a Multi-view Framework that uses full fundus images and localized optic disc regions to classify glaucoma. Optic disc cropping was achieved through Lab color space transformation, CLAHE enhancement, PCA-based localization, and morphological processing. Experimental results demonstrated that EfficientNet_V2_M and EfficientNet_V2_S yielded the best results for the original and cropped views, respectively. By weighting the optic disc view more heavily in the combined model, the original fundus images were assigned a weight of 0.5, and the cropped optic disc views were assigned a weight of 1.5. The Multi-View Network achieved Accuracy 90.48%, AUC 95.14%, Precision 81.95%, Recall 75.90%, and F1-score 78.72%. Ablation experiments and statistical validation further confirmed that integrating complementary visual perspectives provides consistent and statistically significant performance improvements compared to single-view models. These results highlight the value of integrating complementary perspectives to enhance diagnostic reliability and support early glaucoma detection, potentially reducing the burden on clinical experts. However, this study has some limitations due to its moderate dataset size and the need for external validation with clinical datasets from multiple institutions to better assess its general applicability. Future work will focus on scaling up the dataset, exploring adaptive weighting strategies, and incorporating more multimodal information to further improve classification performance.

Author Contributions

Conceptualization, P.S. and R.W.; methodology, P.S., T.M. and A.P.; software, P.S. and A.P.; validation, P.S., R.W. and Y.S.; formal analysis, P.S.; investigation, P.S., T.M., A.P., S.S., T.K., C.P., S.T., T.C.-a. and Y.S.; resources, R.W. and Y.S.; data curation, P.S., T.M. and A.P.; writing—original draft preparation, P.S.; writing—review and editing, R.W. and Y.S.; visualization, P.S.; supervision, R.W.; project administration, R.W.; funding acquisition, R.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Council of Thailand (NRCT) in collaboration with Naresuan University (Grant No. N42A670566).

Data Availability Statement

The data presented in this study are available from the corresponding author upon reasonable request. The data are not publicly available due to privacy, ethical, and institutional restrictions.

Acknowledgments

This project was supported by the National Research Council of Thailand (NRCT) and Naresuan University under Grant No. N42A670566. This work was also partially supported by the Frontier Research and Innovation Cluster Fund, Naresuan University, Grant No. R2569C008.

Conflicts of Interest

The authors declare no conflicts of interest.

References

World Health Organization. World Report on Vision; World Health Organization: Geneva, Switzerland, 2019; Available online: https://www.who.int/publications/i/item/world-report-on-vision (accessed on 16 April 2025).
Lee, D.A.; Higginbotham, E.J. Glaucoma and its treatment: A review. Am. J. Health-Syst. Pharm. 2005, 62, 691–699. [Google Scholar] [CrossRef] [PubMed]
Gopi, V.P.; Anjali, M.S.; Niwas, S.I. PCA-based localization approach for segmentation of optic disc. Int. J. Comput. Assist. Radiol. Surg. 2017, 12, 2195–2204. [Google Scholar] [CrossRef] [PubMed]
Chiang, C.C.; Lin, P.Y.; Lin, Y.C.; Chen, H.Y. Deep learning-based glaucoma detection in highly myopic populations using fundus images. Biomedicines 2024, 12, 1394. [Google Scholar] [CrossRef] [PubMed]
Saha, S.; Islam, M.S.; Hossain, M.A.; Rahman, M.M. A fully automated deep learning-based system for glaucoma detection using fundus images. Sci. Rep. 2023, 13, 18607. [Google Scholar] [CrossRef] [PubMed]
Islam, M.M.; Rahman, M.A.; Saha, S.; Aktar, S. Multimodal deep learning framework for glaucoma detection using fundus and OCT images. Sci. Rep. 2025, 15, 12034. [Google Scholar]
Al-Mahrooqi, A.; Medvedev, D.; Muhtaseb, R.; Yaqub, M. Gardnet: Robust Multi-View Network for Glaucoma Classification in Color Fundus Images. In Proceedings of the International Workshop on Ophthalmic Medical Image Analysis; Springer: Cham, Switzerland, 2022; pp. 152–161. [Google Scholar]
Issac, A.; Sarathi, M.P.; Dutta, M.K. An Adaptive Threshold Based Image Processing Technique for Improved Glaucoma Detection and Classification. Comput. Methods Programs Biomed. 2015, 122, 229–244. [Google Scholar] [CrossRef] [PubMed]
Soorya, M.; Issac, A.; Dutta, M.K. An Automated and Robust Image Processing Algorithm for Glaucoma Diagnosis from Fundus Images Using Novel Blood Vessel Tracking and Bend Point Detection. Int. J. Med. Inform. 2018, 110, 52–70. [Google Scholar]
Shanmugam, P.; Raja, J.; Pitchai, R. An Automatic Recognition of Glaucoma in Fundus Images Using Deep Learning and Random Forest Classifier. Appl. Soft Comput. 2021, 109, 107512. [Google Scholar] [CrossRef]
Yin, P.; Xu, Y.; Zhu, J.; Liu, J.; Yi, C.A.; Huang, H.; Wu, Q. Deep Level Set Learning for Optic Disc and Cup Segmentation. Neurocomputing 2021, 464, 330–341. [Google Scholar] [CrossRef]
Hervella, Á.S.; Rouco, J.; Novo, J.; Ortega, M. End-to-end multi-task learning for simultaneous optic disc and cup segmentation and glaucoma classification in eye fundus images. Appl. Soft Comput. 2022, 116, 108347. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI 2015); Springer: Cham, Switzerland, 2015; Part III; pp. 234–241. [Google Scholar]
Mora, A.R. Deep Learning-Based Prediction of Multiple Ocular Diseases Using CLAHE Enhancement. Bacherlor’s Thesis, Tilburg University, Tilburg, The Netherlands, 2024. [Google Scholar]
Wanling, W.; Mohamed Shah, N. Fundus Image Enhancement Using CLAHE. New Explor. Electr. Eng. 2025, 1, 68–79. [Google Scholar] [CrossRef]
Mittapalli, P.S.; Kande, G.B. Segmentation of Optic Disk and Optic Cup from Digital Fundus Images for the Assessment of Glaucoma. Biomed. Signal Process. Control 2016, 24, 34–46. [Google Scholar] [CrossRef]
Berndt-Schreiber, M. Morphological Operations in Fundus Image Analysis. J. Med. Inform. Technol. 2007, 11, 79–86. [Google Scholar]
Tan, M.; Le, Q. EfficientNetV2: Smaller Models and Faster Training. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 10096–10106. [Google Scholar]
Tan, M.; Le, Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
Tan, M.; Le, Q.V. MixConv: Mixed Depthwise Convolutional Kernels. arXiv 2019, arXiv:1907.09595. [Google Scholar] [CrossRef]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 618–626. [Google Scholar]
Chakravarty, A.; Sivswamy, J. Joint optic disc and cup segmentation and glaucoma classification using multi-task convolutional neural networks. IEEE Trans. Med. Imaging 2018, 37, 2496–2507. [Google Scholar]
Hemelings, R.; Elen, B.; Barbosa-Breda, J.; Stalmans, I.; Van Keer, K. Deep learning for glaucoma detection and segmentation beyond the optic disc region. IEEE J. Biomed. Health Inform. 2021, 25, 1419–1427. [Google Scholar]
Pacal, I.; Celik, O.; Bayram, B.; Cunha, A. Enhancing EfficientNetv2 with global and efficient channel attention mechanisms for accurate MRI-Based brain tumor classification. Clust. Comput. 2024, 27, 11187–11212. [Google Scholar] [CrossRef]
Gang, S.; Fabrice, N.; Chung, D.; Lee, J. Character recognition of components mounted on printed circuit board using deep learning. Sensors 2021, 21, 2921. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Workflow of Multi-View Glaucoma Classification Network.

Figure 2. (a) Fundus image of a glaucoma-affected eye showing optic disc cupping and increased cup-to-disc ratio. (b) Fundus image of a normal eye with no signs of glaucoma.

Figure 3. Image preprocessing for cropping the optic disc region.

Figure 4. Example of transforming a fundus image from RGB to LAB color space (a) original RGB image, (b) LAB image, (c) L-channel (grayscale brightness).

Figure 5. Example of applying CLAHE (a) L-channel, (b) enhanced image with improved detail and contrast.

Figure 6. Example of applying PCA in the image pipeline (a) image after CLAHE, (b) image after PCA for spatial feature extraction.

Figure 7. Example of optic disc ROI extraction: (a) contour detection and center point, (b) cropped optic disc region.

Figure 9. Multi-view network framework.

Figure 13. EfficientnetV2-S architecture.

Figure 14. EfficientNetV2-M architecture.

Table 2. Ablation study based on five-fold cross-validation AUC scores for original, cropped optic disc images and multiview network.

Views	Fold 1	Fold 2	Fold 3	Fold 4	Fold 5	Mean ± SD
Original	0.9356	0.9409	0.9377	0.9378	0.9435	0.9391 ± 0.0028
Cropped	0.9516	0.9426	0.9341	0.9468	0.9510	0.9452 ± 0.0065
Multiview	0.9556	0.9489	0.9447	0.9508	0.9568	0.9514 ± 0.0045

Table 3. Statistical comparison of AUC values using paired t-test across five folds.

Comparison	Mean AUC Difference	p-Value	Significance
Multiview vs. Original	0.0123	0.0061	Significant ( $p < 0.01$ )
Multiview vs. Cropped	0.0061	0.0707	Not significant ( $p \geq 0.05$ )

Table 4. Comparison of recent glaucoma detection studies using fundus images.

Study	Method/Model	Dataset	AUC
Mahrooqi et al. (2022) [7]	Multi-view (GARDNet)	EyePACS/RIM-ONE DL	0.92–0.93
Chakravarty & Sivswamy (2018) [22]	Joint OD/OC segmentation + CNN	REFUGE	0.95
Hemelings et al. (2021) [23]	Cropping-based CNN	UZL/REFUGE	0.94
Chiang et al. (2024) [4]	Deep learning model	3088 clinical images	0.894
Proposed (This Work)	Multi-view network	14,255 clinical dataset	0.951

Table 5. Optimal augmentation parameter settings and AUC performance for each configuration.

Setting	VF	HF	R	B	C	S	H	D	Original AUC	Cropped AUC
	$p_{vf}$	$p_{hf}$	$θ$	$Δ b$	$Δ c$	$Δ s$	$Δ h$	$p_{drop}$
No Aug									$0.9130 \pm 0.0080$	$0.9213 \pm 0.0057$
1	0.5	0.5							$0.9250 \pm 0.0063$	$0.9356 \pm 0.0063$
2			$(- 15^{\circ}, 15^{\circ})$						$0.9211 \pm 0.0042$	$0.9366 \pm 0.0073$
3	0.5	0.5	$(- 15^{\circ}, 15^{\circ})$						$0.9297 \pm 0.0058$	$0.9392 \pm 0.0050$
4				$\pm 0.2$	$\pm 0.2$	$\pm 0.1$	$\pm 0.05$		$0.9130 \pm 0.0080$	$0.9177 \pm 0.0063$
5								0.3	$0.9074 \pm 0.0096$	$0.9246 \pm 0.0030$
6	0.5	0.5	$(- 15^{\circ}, 15^{\circ})$	$\pm 0.2$	$\pm 0.2$	$\pm 0.1$	$\pm 0.05$		$0.9111 \pm 0.0035$	$0.9447 \pm 0.0036$
7	0.5	0.5	$(- 15^{\circ}, 15^{\circ})$	$\pm 0.2$	$\pm 0.2$	$\pm 0.1$	$\pm 0.05$	0.3	0.9391 ± 0.0027	0.9452 ± 0.0065

Table 6. Experimental results for different multiview weight combinations.

Multiview Weights		Experimental Evaluation
Original	Cropped	Accuracy	AUC	Precision	Recall	F1-Score
0.9	1.1	0.9022 ± 0.0046	0.9518 ± 0.0042	0.8176 ± 0.0188	0.7477 ± 0.0402	0.7800 ± 0.0164
0.8	1.2	0.9032 ± 0.0051	0.9519 ± 0.0042	0.8186 ± 0.0143	0.7512 ± 0.0411	0.7825 ± 0.0180
0.7	1.3	0.9039 ± 0.0049	0.9519 ± 0.0043	0.8197 ± 0.0148	0.7534 ± 0.0409	0.7842 ± 0.0176
0.6	1.4	0.9047 ± 0.0043	0.9517 ± 0.0044	0.8214 ± 0.0147	0.7555 ± 0.0396	0.7861 ± 0.0163
0.5	1.5	0.9048 ± 0.0053	0.9514 ± 0.0045	0.8195 ± 0.0156	0.7590 ± 0.0401	0.7872 ± 0.0178
0.4	1.6	0.9034 ± 0.0044	0.9509 ± 0.0045	0.8150 ± 0.0153	0.7576 ± 0.0423	0.7842 ± 0.0174
0.3	1.7	0.9025 ± 0.0048	0.9502 ± 0.0046	0.8114 ± 0.0148	0.7583 ± 0.0434	0.7829 ± 0.0182
0.2	1.8	0.9021 ± 0.0063	0.9493 ± 0.0047	0.8099 ± 0.0177	0.7583 ± 0.0499	0.7818 ± 0.0222
0.1	1.9	0.9009 ± 0.0067	0.9480 ± 0.0050	0.8068 ± 0.0198	0.7509 ± 0.0537	0.7794 ± 0.0237

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Siying, P.; Muangphara, T.; Photun, A.; Suppalap, S.; Klinsuwan, T.; Phruancharoen, C.; Treeyawedkul, S.; Chira-adisai, T.; Supattanawong, Y.; Wangkeeree, R. Multi-View Machine Learning with an Optic Disc Localization for Glaucoma Diagnosis. Appl. Sci. 2026, 16, 3158. https://doi.org/10.3390/app16073158

AMA Style

Siying P, Muangphara T, Photun A, Suppalap S, Klinsuwan T, Phruancharoen C, Treeyawedkul S, Chira-adisai T, Supattanawong Y, Wangkeeree R. Multi-View Machine Learning with an Optic Disc Localization for Glaucoma Diagnosis. Applied Sciences. 2026; 16(7):3158. https://doi.org/10.3390/app16073158

Chicago/Turabian Style

Siying, Parichat, Thitima Muangphara, Aphinan Photun, Siwakon Suppalap, Thitiphat Klinsuwan, Chatmongkol Phruancharoen, Sirinan Treeyawedkul, Tanate Chira-adisai, Ying Supattanawong, and Rabian Wangkeeree. 2026. "Multi-View Machine Learning with an Optic Disc Localization for Glaucoma Diagnosis" Applied Sciences 16, no. 7: 3158. https://doi.org/10.3390/app16073158

APA Style

Siying, P., Muangphara, T., Photun, A., Suppalap, S., Klinsuwan, T., Phruancharoen, C., Treeyawedkul, S., Chira-adisai, T., Supattanawong, Y., & Wangkeeree, R. (2026). Multi-View Machine Learning with an Optic Disc Localization for Glaucoma Diagnosis. Applied Sciences, 16(7), 3158. https://doi.org/10.3390/app16073158

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-View Machine Learning with an Optic Disc Localization for Glaucoma Diagnosis

Abstract

1. Introduction

2. Methods

2.1. Data Description

2.2. Data Preprocessing

2.3. Data Partitioning

2.4. Multi-View Network

2.4.1. Original Fundus Image View

2.4.2. Cropped Optic Disc View

2.4.3. Computing Environment

3. Results

4. Discussion

4.1. Analysis of PCA

4.2. Data Imbalanced

4.3. Data Augmentation

4.4. Multiview Weight

4.5. Fine-Tuning and Backbone

4.6. Model Training Setup

4.7. Grad-CAM Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI