3.4. Classification Using Machine Learning Algorithms
To identify the most suitable classification strategy for our OCT-based multifractal feature set, we conducted a comparative evaluation of several conventional machine learning classifiers, including Support Vector Machine (SVM), k-nearest neighbors (k-NN), Decision Tree (DT), and Multi-Layer Perceptron (MLP). The choice of these models was motivated by the moderate size and low dimensionality of the extracted feature space, which comprised 9 multifractal descriptors per image. Deep learning approaches, such as convolutional neural networks (CNNs), were deliberately excluded due to their high data requirements and reliance on raw image inputs. In contrast, our handcrafted feature-based pipeline provides interpretable descriptors and enables efficient training with limited computational resources. This strategy ensures clinical scalability and maintains interpretability, which is essential for translational deployment in real-world ophthalmic workflows.
A comprehensive comparison of conventional and advanced machine learning classifiers was carried out to assess the discriminative power of multifractal descriptors extracted from OCT images. A 5-fold cross-validation procedure was employed to ensure robustness of evaluation and reduce variance due to data partitioning. The classification performance of each model was evaluated based on four key metrics: accuracy, precision, recall, and F1-score.
Table 6 summarizes the detailed results obtained across the five validation folds for each classifier.
The results presented in
Table 6 highlight several important observations. Firstly, all tested models significantly outperform the baseline random chance level (
), confirming that the multifractal descriptors contain relevant information for distinguishing between normal and non-proliferative diabetic retinopathy (NPDR) cases.
Among classical classifiers, the Support Vector Machine (SVM) and Decision Tree (DT) exhibit promising results, with average accuracies of and , respectively. These results underline the capability of basic discriminative and rule-based models to capture some degree of structure in the multifractal feature space.
Ensemble-based methods such as Random Forest (RF) and Gradient Boosting (GB) yield improved results, achieving average accuracies around and , respectively. These techniques are known for their robustness to overfitting and their ability to aggregate multiple weak learners, which explains their enhanced performance.
The most notable improvements come from state-of-the-art Gradient Boosting frameworks—XGBoost and LightGBM—which achieve accuracies of and , respectively. These models are particularly efficient in handling high-dimensional data and are optimized for speed and accuracy, making them suitable for complex medical classification tasks.
The Multi-Layer Perceptron (MLP), a type of feedforward artificial neural network, achieves the highest performance across all metrics, with an average accuracy of and an F1-score of . The superior results of the MLP suggest that neural networks are particularly adept at learning the complex, nonlinear relationships embedded in the multifractal descriptors. This underscores the relevance of deep learning architectures in medical image analysis and strengthens the case for their use in automated DR screening systems. While classical machine learning algorithms provide a solid baseline for classification, the use of advanced ensemble techniques and neural networks leads to significant improvements in detection accuracy. These findings support the feasibility of leveraging multifractal analysis combined with powerful classifiers such as MLP for the accurate diagnosis of NPDR.
Figure 6 illustrates the comparative performance of various classical and ensemble-based classifiers across five folds of cross-validation using four evaluation metrics: accuracy, precision, recall, and F1-score.
From
Figure 6a it is evident that the
MLP Neural Network and
LightGBM consistently outperformed other models in terms of accuracy, maintaining values close to 0.96 and 0.95, respectively. Conversely, traditional classifiers such as
Logistic Regression and
Decision Tree exhibited the lowest accuracy scores, suggesting limited capacity to capture the complexity of the multifractal features.
Figure 6b highlights the performance in terms of precision. Again, the
MLP Neural Network achieved near-perfect precision values across all folds, indicating its strong ability to avoid false positives. Other ensemble models like
XGBoost and
LightGBM also demonstrated relatively high and stable precision across folds, reinforcing their robustness.
In
Figure 6c which presents recall, the results show that
Random Forest slightly outperformed other models in some folds. However, the MLP and ensemble-based models (LightGBM, XGBoost, Gradient Boosting) generally maintained higher recall compared to traditional classifiers, indicating superior sensitivity in identifying true positives.
Figure 6d summarizes the balanced performance via the F1-score. The
MLP Neural Network consistently attained the highest F1-scores, followed by
LightGBM, showcasing their ability to maintain a good trade-off between precision and recall. In contrast, Logistic Regression and Decision Tree were less effective, displaying notable fluctuations and lower mean F1-scores.
In summary, the results suggest that while classical models provide a basic level of performance, ensemble learning techniques and neural network-based models significantly enhance classification performance when applied to multifractal features extracted from retinal images. The MLP Neural Network, in particular, stands out as the most reliable classifier in terms of all four metrics.
Figure 7 presents the confusion matrices for eight supervised learning models applied to the classification of diabetic retinopathy (DR) and normal retinal images. These matrices detail the counts of true positives (correctly identified DR), true negatives (correctly identified normal cases), false positives, and false negatives.
The MLP Neural Network (
Figure 7h) achieved the best classification results, with only seven DR cases misclassified as normal (false negatives) and three normal cases misclassified as DR (false positives). This indicates high accuracy, sensitivity, and specificity, reflecting the model’s capacity to generalize well when fed with discriminative features.
A qualitative analysis of the misclassified images revealed that false negatives often corresponded to OCT scans with minimal or subtle structural disruption, potentially below the detection threshold of multifractal descriptors. Conversely, the few false positives were frequently associated with normal images showing noise-like or irregular texture patterns that resembled early pathological changes, leading to over-sensitive detection.
Both XGBoost (
Figure 7f) and LightGBM (
Figure 7g) also demonstrated strong classification performance, with minimal misclassification (13 false negatives and 23–25 false positives). These gradient-boosting algorithms benefit from their ensemble nature, enabling them to capture complex patterns and interactions within the multifractal features.
The Random Forest model (
Figure 7b) also performed well, with 14 false negatives and 42 false positives, outperforming the Decision Tree (
Figure 7c), which showed higher misclassification rates. This highlights the benefit of ensemble methods in reducing overfitting and variance compared to a single tree classifier.
Although the SVM with a polynomial kernel (
Figure 7d) achieved reasonable results, it exhibited a relatively higher number of false positives (71). It is important to note, however, that this behavior is not due to poor parameter tuning—since the model was tested with the best-performing configuration after thorough optimization—but rather suggests a limitation of this particular kernel type in separating.
Logistic Regression (
Figure 7a) achieved balanced but modest results, with 50 false negatives and 53 false positives. This shows its limited ability to capture the data’s nonlinear structure despite the use of advanced features.
From a clinical standpoint, minimizing false negatives is crucial to prevent missing early-stage diabetic retinopathy (DR) cases, while false positives may lead to unnecessary referrals or patient anxiety. Despite the strong overall performance, the few remaining misclassified cases are likely attributable to ultra-early DR manifestations, where structural changes are minimal or subclinical. These borderline cases highlight the limitations of relying solely on OCT-based structural descriptors and underscore the potential benefit of integrating complementary functional modalities. As a future direction, combining our multifractal OCT approach with functional biomarkers such as Electroretinography (ERG) could enhance sensitivity to early neuroretinal dysfunction and reduce diagnostic uncertainty in ambiguous presentations. Overall, these findings emphasize the importance of using robust and expressive classifiers in conjunction with informative feature extraction techniques. In this study, multifractal analysis played a crucial role in characterizing the spatial and textural complexity of retinal images. By capturing multi-scale structural variations, this method provided a rich set of discriminative features that significantly enhanced classification performance across all models.
The consistently high accuracy obtained, particularly by ensemble and deep learning models, underlines the effectiveness of the multifractal analysis in preprocessing and transforming retinal data into a more learnable representation. This reinforces its necessity as a key step in the diagnostic pipeline, enabling better generalization and more reliable detection of diabetic retinopathy.
Table 7 presents the performance metrics obtained for each classification model, including accuracy, precision, sensitivity (recall), specificity, F1-score, and the 95% confidence interval for the accuracy.
The results in
Table 7 highlight the excellent discriminative power achieved by all classifiers. Among them, the MLP Neural Network attained the highest overall accuracy (
), demonstrating its superior ability to learn complex nonlinear representations from the multifractal features extracted. It also exhibited excellent precision (
), sensitivity (
), and specificity (
), though the confidence interval was not reported due to deterministic results on cross-validation folds. Tree-based ensemble methods also performed very competitively:
LightGBM achieved an accuracy of with a narrow confidence interval [–], indicating a robust and reliable performance.
XGBoost and Gradient Boosting followed closely, with accuracies of and , respectively.
Traditional classifiers such as Random Forest () and Logistic Regression () also delivered good results, but with slightly lower precision and sensitivity compared to boosting methods and the neural network. The Decision Tree and SVM with polynomial kernel achieved accuracies above , but they exhibited wider confidence intervals, suggesting less stability across different folds.
Overall, these findings confirm that multifractal-based feature extraction, combined with modern machine learning models, can achieve highly accurate and reliable diabetic retinopathy detection from OCT images. The use of ensemble learning and deep architectures provided notable improvements compared to simpler models, particularly in terms of generalization and sensitivity to pathological variations.
The experimental findings demonstrate the strong discriminative ability of multifractal features extracted from OCT images for the early detection of diabetic retinopathy. Among the various classifiers evaluated, the MLP Neural Network consistently outperformed other models across all evaluation metrics, highlighting the effectiveness of combining multifractal analysis with deep learning techniques in modeling the complex structural alterations associated with early retinal pathology.
Ensemble-based methods, such as LightGBM and XGBoost, also exhibited excellent performance, further validating the suitability of gradient-boosting strategies when working with multifractal descriptors. Overall, the proposed pipeline underscores the robustness, scalability, and practical applicability of multifractal analysis on OCT images. Unlike previous studies that primarily utilized OCTA datasets, this work demonstrates that standard OCT imaging—when combined with advanced mathematical modeling—can serve as an equally powerful and more accessible tool for early diabetic retinopathy screening.
3.5. Performance Comparison with State-of-the-Art Methods
To evaluate the effectiveness of the proposed multifractal-based MLP classification approach, we conducted a comparative analysis with several state-of-the-art machine learning and deep learning methods previously applied to retinal image classification.
Table 8 summarizes the key performance metrics—accuracy, sensitivity, specificity—as well as dataset characteristics and imaging modalities used in these studies.
As shown in
Table 8, our proposed method achieves competitive or superior performance compared to recent approaches, with 98.02% accuracy, 97.80% sensitivity, and 98.84% specificity on 6000 OCT images. While some methods demonstrate marginally higher sensitivity (e.g., SVM + GLCM [
41]), they often underperform in terms of overall accuracy or specificity. Additionally, approaches relying on OCTA [
12] or massive annotated datasets [
39] may not be feasible in routine clinical settings.
In contrast, our model operates on standard OCT images and uses mathematically grounded multifractal features, ensuring both performance and interpretability. This makes it especially suited for integration into real-world ophthalmic workflows, including in primary care or resource-limited settings where early intervention is critical. Unlike deep learning models that often act as “black boxes,” the proposed method provides transparent structural descriptors that facilitate clinical validation and enhance trust among ophthalmologists.
Moreover, the model’s low computational requirements enable deployment on existing diagnostic platforms without the need for specialized hardware or extensive data labeling. By detecting subtle structural changes in retinal layers—often invisible in fundus images—our framework supports truly early-stage diabetic retinopathy screening. Future development will focus on an integrated, user-friendly interface and explainable AI tools to visually highlight pathologically relevant regions, further bridging the gap between algorithmic decision-making and clinical practice.
While the present study establishes that multifractal features extracted from OCT images offer a robust structural biomarker framework for early diabetic retinopathy (DR) detection, future work should advance toward integrating complementary functional modalities to achieve a more comprehensive and biologically faithful diagnostic model. Optical Coherence Tomography (OCT) remains an indispensable structural imaging technique, providing high-resolution visualization of retinal microarchitecture, including the inner retinal layers and retinal pigment epithelium. However, as a purely anatomical modality, OCT cannot interrogate the functional state of retinal neurons, which may be compromised in the earliest neurodegenerative stages of DR, well before visible vascular or tissue disruption occurs. In contrast, Electroretinography (ERG) offers objective, layer-specific assessment of retinal function by recording bioelectrical responses to photic stimuli. Notably, recent studies have demonstrated that ERG abnormalities—particularly reduced oscillatory potentials and attenuated photopic negative response (PhNR)—can be detected in diabetic patients, thus revealing latent inner retinal dysfunction [
43,
44]. This structural–functional dissociation highlights a critical limitation of relying solely on OCT-based morphological features and underscores the need for multimodal approaches that capture both tissue integrity and neurophysiological performance. Future research should explore the fusion of OCT-derived multifractal descriptors with ERG-derived electrophysiological biomarkers within integrated machine learning pipelines. Such hybrid models may yield more sensitive and specific detection of subclinical DR, enabling earlier therapeutic intervention. Moreover, this multimodal paradigm aligns with the goals of precision ophthalmology, offering a path toward personalized screening, disease staging, and longitudinal monitoring in diabetic eye care.