You are currently viewing a new version of our website. To view the old version click .
Informatics
  • This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
  • Article
  • Open Access

22 November 2025

Learning Dynamics Analysis: Assessing Generalization of Machine Learning Models for Optical Coherence Tomography Multiclass Classification

,
,
and
Department of Osteopathic Manipulative Medicine, New York Institute of Technology, College of Osteopathic Medicine, Old Westbury, Nassau County, Long Island, NY 11568, USA
*
Author to whom correspondence should be addressed.
Informatics2025, 12(4), 128;https://doi.org/10.3390/informatics12040128 
(registering DOI)

Abstract

This study evaluated the generalization and reliability of machine learning models for multiclass classification of retinal pathologies using a diverse set of images representing eight disease categories. Images were aggregated from two public datasets and divided into training, validation, and test sets, with an additional independent dataset used for external validation. Multiple modeling approaches were compared, including classical machine learning algorithms, convolutional neural networks with and without data augmentation, and a deep neural network using pre-trained feature extraction. Analysis of learning dynamics revealed that classical models and unaugmented convolutional neural networks exhibited overfitting and poor generalization, while models with data augmentation and the deep neural network showed healthy, parallel convergence of training and validation performance. Only the deep neural network demonstrated a consistent, monotonic decrease in accuracy, F1-score, and recall from training through external validation, indicating robust generalization. These results underscore the necessity of evaluating learning dynamics (not just summary metrics) to ensure model reliability and patient safety. Typically, model performance is expected to decrease gradually as data becomes less familiar. Therefore, models that do not exhibit these healthy learning dynamics, or that show unexpected improvements in performance on subsequent datasets, should not be considered for clinical application, as such patterns may indicate methodological flaws or data leakage rather than true generalization.

Article Metrics

Citations

Article Access Statistics

Article metric data becomes available approximately 24 hours after publication online.