Learning Dynamics Analysis: Assessing Generalization of Machine Learning Models for Optical Coherence Tomography Multiclass Classification

Michael Sher; David Remyes; Riah Sharma; Milan Toma

doi:10.3390/informatics12040128

,

and

Department of Osteopathic Manipulative Medicine, New York Institute of Technology, College of Osteopathic Medicine, Old Westbury, Nassau County, Long Island, NY 11568, USA

^*

Author to whom correspondence should be addressed.

Informatics2025, 12(4), 128;https://doi.org/10.3390/informatics12040128
(registering DOI)

Version Notes

Order Reprints

Review Reports

Abstract

This study evaluated the generalization and reliability of machine learning models for multiclass classification of retinal pathologies using a diverse set of images representing eight disease categories. Images were aggregated from two public datasets and divided into training, validation, and test sets, with an additional independent dataset used for external validation. Multiple modeling approaches were compared, including classical machine learning algorithms, convolutional neural networks with and without data augmentation, and a deep neural network using pre-trained feature extraction. Analysis of learning dynamics revealed that classical models and unaugmented convolutional neural networks exhibited overfitting and poor generalization, while models with data augmentation and the deep neural network showed healthy, parallel convergence of training and validation performance. Only the deep neural network demonstrated a consistent, monotonic decrease in accuracy, F1-score, and recall from training through external validation, indicating robust generalization. These results underscore the necessity of evaluating learning dynamics (not just summary metrics) to ensure model reliability and patient safety. Typically, model performance is expected to decrease gradually as data becomes less familiar. Therefore, models that do not exhibit these healthy learning dynamics, or that show unexpected improvements in performance on subsequent datasets, should not be considered for clinical application, as such patterns may indicate methodological flaws or data leakage rather than true generalization.

Keywords:

optical coherence tomography; retinal pathology; machine learning; deep neural networks; convolutional neural networks; generalization; learning dynamics; external validation; overfitting; medical image analysis

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Article metric data becomes available approximately 24 hours after publication online.