Classification of OCT Images of the Human Eye Using Mobile Devices

Stankiewicz, Agnieszka; Marciniak, Tomasz; Budna, Nina; Chwałek, Róża; Dziedzic, Marcin

doi:10.3390/app15062937

Open AccessArticle

Classification of OCT Images of the Human Eye Using Mobile Devices

by

Agnieszka Stankiewicz

,

Tomasz Marciniak

^*

,

Nina Budna

,

Róża Chwałek

and

Marcin Dziedzic

Division of Electronic Systems and Signal Processing, Institute of Automatic Control and Robotics, Poznan University of Technology, 60-965 Poznan, Poland

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(6), 2937; https://doi.org/10.3390/app15062937

Submission received: 7 February 2025 / Revised: 25 February 2025 / Accepted: 6 March 2025 / Published: 8 March 2025

(This article belongs to the Special Issue Machine Learning for Object Detection and Scene Description in Images and Videos)

Download

Browse Figures

Versions Notes

Abstract

The aim of this study was to develop a mobile application for Android devices dedicated to the classification of pathological changes in human eye optical coherence tomography (OCT) B-scans. The classification process is conducted using convolutional neural networks (CNNs). Six models were trained during the study: a simple convolutional neural network with three convolutional layers, VGG16, InceptionV3, Xception, Joint Attention Network + MobileNetV2 and OpticNet-71. All of these models were converted to TensorFlow Lite format to implement them into a mobile application. For this purpose, three models with the best parameters were chosen, taking accuracy, precision, recall, F1-score and confusion matrix into consideration. The Android application designed for the classification of OCT images was developed using the Kotlin programming language within the Android Studio integrated development environment. With the application, classification can be performed on an image chosen from the user’s files or an image acquired using the photo-taking function. The results of the classification are displayed for three neural networks, along with the respective classification times for each neural network and the associated image undergoing the classification task. The mobile application has been tested using various smartphones. The testing phase included an evaluation of image classification times and score accuracy, considering factors such as image acquisition method, i.e., camera or gallery.

Keywords:

OCT; retina; smartphone app; Android

1. Introduction

Artificial intelligence solutions are gradually becoming permissible in medical diagnostics. For example, the Polish Code of Medical Ethics [1], in force since 2025, allows the use of artificial intelligence algorithms when certain conditions are met. The patient must be informed and must express informed consent to the use of artificial intelligence during the diagnostic and therapeutic process. Solutions approved and certified for medical use may be used. It is also important that the final diagnostic and therapeutic decision always be made by a doctor.

The use of artificial intelligence (AI) techniques allows for the analysis of the influence of various factors on the prediction of disease changes. An example of the possibilities of deep learning (DL) techniques is inference for the integration of multimodal data (radiology, pathology, epigenetics, and clinical data) to predict the overall survival of patients with glioma [2].

Machine learning, and especially deep neural networks (DNNs), can be widely used in ophthalmology OCT image analysis [3]. Several application areas can be distinguished: segmentation of retinal layers (using U-Net-type networks [4]); analysis of changes in time using hybrid models (e.g., combining solutions of CNN with sequential recursive NN-type models) [5]; detection and classification of eye diseases such as age-related macular degeneration (AMD), glaucoma, diabetic retinopathy, or macular holes [6].

OCT devices for ophthalmological diagnostics often include extensive software for the segmentation of retinal layers and parametric visualization to assess abnormal changes. The correctness of deep learning solutions is comparable to manual image analysis by an ophthalmology specialist. The use of innovative NN models such as the BioImagingLab/INESC TEC [7] allows precise segmentation of the retinal layer for AMD patients and has demonstrated a moderate to strong correlation between computer and human metrics [8].

Based on B-scan OCT of the macula area, an experienced ophthalmologist can easily classify retinal pathological changes. However, for screening tests, computer classification using artificial intelligence techniques can be helpful. Datasets of B-scan OCTs prepared in recent years allow the preparation of classifiers that achieve an accuracy above 96% [3]. It should be noted that the software of OCT devices does not yet typically have such functionality, assuming that the diagnosis will be performed by a specialist.

Modern smartphones have increasingly greater computing capabilities, and built-in cameras take high-resolution photos. We can observe the research progress of smartphone-based optical imaging biosensors in the fields of colorimetry, fluorescence, and microscopic imaging for genetic testing [9]. The widespread availability of smartphones and broadband transmission has enabled the development of telemedical consultations. This phenomenon can also be observed in the field of ophthalmology [10]. The ability to send self-taken photos of the anterior segment of the eye and eyelids, e.g., for conditions such as eyelid chalazion (called an eyelid cyst or a meibomian cyst), allows for a quick teleconsultation without traveling to an ophthalmological clinic or hospital. With the right smartphone camera attachments [11], it is also possible to take photos of the fundus of the eye, thus providing imaging used in diabetic retinopathy.

There are a number of different applications available for smartphones dedicated to diagnosis, treatment, and symptom management in ophthalmology. The review of solutions presented in [12] shows that mobile applications can be successfully used in the following issues: visual acuity assessment, dry eye diagnostics, color recognition, strabismus assessment, detection of metamorphopsia, pupillometry, or education for ophthalmologists and optometrists. More advanced diagnostics related to glaucoma and diabetes are also possible, in which case the parametric data are entered manually. In general, mobile applications do not use artificial intelligence, i.e., deep neural networks.

For OCT images, smartphone ophthalmic applications are emerging that can support the diagnostic process. Of course, a smartphone does not have the ability to acquire OCT images on its own, but it has enough computing power and can analyze B-scans saved in the device’s memory. The work in [13] presents a mobile application that allows the classification of four types of B-scan (CNV, DME, DRUSEN, NORMAL). The entire image and a relatively simple CNN network consisting of four sets of 2D convolution layers and max-pooling layers were used for classification. The authors also used YOLOv4 tiny to mark pathological changes in the image. The application and the source code are not available.

In the context of studies on the development of ophthalmic telediagnostics screening, our article presents a comprehensive study of the possibilities of using different neural network architectures. A publicly available application has also been prepared, which allows the user to precisely check the probability of belonging to five classes of pathological changes using three previously selected models. The B-scan was assumed to be available in the memory of the mobile device or would be taken using a smartphone camera from the screen of the OCT device or a printout received by the patient after OCT imaging. The presented analysis and solutions may be helpful for quick evaluation of OCT B-scans, including those archived on older-generation OCT devices.

The main contributions of the authors presented in the article are as follows:

a comparison of the effectiveness of six neural network architectures for automatic classification of OCT B-scans of five groups of lesions;
a comprehensive comparison of the advantages of three solutions for converting neural network models to the constraints of mobile devices;
source code of mobile app design for OCT B-scan classification with software release on GitHub (version 1.0).

The outline of the paper is as follows. Section 2 describes the dataset and neural networks used in the experiments. The advantages and disadvantages of various approaches for NN implementations in a mobile application are discussed in Section 3. This section also includes a description of the mobile application’s design. In Section 4, the experiment setup and classification results are presented.

2. Materials and Methods

2.1. Dataset

The OCT2023 dataset contains optical coherence tomography (OCT) tomograms categorized into five classes: healthy eyes (NORMAL) and four classes representing pathological changes: choroidal neovascularization (CNV), diabetic macular edema (DME), presence of drusen associated with dry AMD (DRUSEN), and vitreomacular traction (VMT). The dataset currently includes 4467 images in JPEG and BMP formats. All B-scans underwent ophthalmological review and classification verification. Figure 1 illustrates examples of images from each class.

The scans in the dataset were made using 3 different devices: Avanti RTvue XR (Optovue Inc., Fremont, CA, USA), Copernicus HR (Optopol Technology, Zawiercie, Poland), Spectralis (Heidelberg Engineering, Inc., Heidelberg, Germany). The distribution of images among the classes and devices is listed in Table 1.

The images in the dataset have varying dimensions, as listed in Table 2. As can be seen, the OCT2023 dataset contains many large images. The utilized neural network architectures require

224 \times 224

pixel input images. Therefore, the preprocessing of images requires scaling. Additionally, as the OCT images are single-channel gray-scale matrices, they were converted to 3-channel images to match the input expected by neural networks.

The database was randomly divided into 3 parts: training set (80%), validation set (10%) and test set (10%). To maintain consistency of results for all tested neural networks, the data division was the same for all experiments (i.e., a one-time initial random division of the dataset).

2.2. Selection of Neural Network Models for OCT Image Classification

Six different CNNs were selected for training and testing during the experiments:

CNN with 3 convolutional layers—A simple CNN with 3 convolutional layers was trained and tested, before analyzing more advanced, pre-trained neural networks with more convolutional layers. Its structure is listed in Table 3. Each 2D convolution layer had a $3 \times 3$ kernel, stride of 1, no padding, and a ReLU activation function. Additionally, after the first convolutional layer, a max pooling operation was applied to reduce dimensionality by half. The introduction of dropout after each convolutional layer is aimed at reducing overfitting. The final classification probability was obtained after applying flattening and two fully connected layers.
VGG16—A high-classification efficiency network, considered one of the better image processing models currently available [14]. The model creators increased the depth of this network by using small (3 × 3) convolutional filters, resulting in a model with multiple convolutional layers and input data with a size of 224 × 224 × 3.
InceptionV3—A deep CNN. It was created in 2015 as a variant of the GoogLeNet network, where subnetworks called inception modules were introduced, allowing for a greater network expansion compared to previous varieties. The network consists of 42 layers [15], uses the max-pooling operation, batch normalization, and the auxiliary classifiers method, i.e., additional classifiers that help in effectively transferring information during the learning process. The whole contributes to using a minimum number of parameters, resulting in reduced computational costs and thus achieving better performance.
Xception—Xception is a CNN from the GoogLeNet family, designed in 2016 [16]. It uses concepts derived from GoogLeNet and ResNet. However, the inception modules mentioned in the Inception architecture have been replaced with a unique type of layer called a depthwise separable convolution (DSC) layer, which has become a key element of the network. The DSC layer consists of two parts. In the first part, a single spatial filter is used for each input feature map, while in the second part, only inter-channel patterns are searched, which is the opposite of a classical convolutional network, where filters try to detect spatial and inter-channel patterns in parallel.
Joint Attention Network + MobileNetV2—The Joint Attention Network (JAN) consists of an encoder connected to a MobileNetV2 architecture classifier and an unsupervised decoder that does not have access to the true image classes from the training set. This structure achieves high network accuracy for validation and test data [17].
OpticNet-71—A neural network (first described in 2019 [18]) designed specifically for analyzing OCT images, especially in the context of diagnosing eye diseases such as glaucoma and diabetic retinopathy. Despite a large number of parameters, the network does not require high computing power and gives the potential for more effective classification of retinal diseases.

3. Android-Based Application for OCT Classification

The human eye OCT image classification was implemented in a mobile application for Android phones. The development was performed in Android Studio version 2023.1.1 and the Kotlin programming language.

3.1. Implementing Neural Networks in a Mobile Application

Three different methods of model conversion were analyzed to implement neural networks for image classification on mobile devices: TensorFlow Lite (TF Lite), ONNX, and Firebase ML. Their advantages and disadvantages are discussed below.

TensorFlow Lite is a solution that converts TensorFlow models to a smaller, more efficient machine learning format. In this solution, one can use:

pre-trained models from TensorFlow Lite;
modify existing models;
build one’s own TensorFlow models, then convert them to the TensorFlow Lite format.

The TensorFlow Lite [19] model is represented in an efficient, portable FlatBuffers format (files with the .tflite extension). This allows for reduced size and faster inference (via direct access to data without additional analysis/unpacking steps). This solution enables efficient operation on devices with limited computational and memory resources. The advantages and disadvantages of this method are discussed in Table 4.

The generation of a network model in the .tflite format can be performed using the TensorFlow Lite Model Maker library. This library, using Transfer Learning, reduces the amount of training data required and shortens training time. Another solution is to build a TensorFlow model and use the TensorFlow Lite Converter. In this case, it is possible to optimize the model, for example, through quantization, to adapt the model for operation on a mobile device.

ONNX (Open Neural Network Exchange) [20] is an open standard platform for representing machine learning models, used to export or convert models from multiple platforms (such as TensorFlow 2.0, PyTorch 2.1, Keras 1.6, MATLAB 7, etc.) to the standard ONNX format, which can be run on different platforms and devices (clouds, peripherals, CPUs/GPUs, etc.). The advantages and disadvantages of this method are discussed in Table 5.

Firebase ML [21] is a mobile Software Development Kit (SDK) that can be used for Android and Apple iOS apps. In this solution, TensorFlow Lite models are used for inference. Inference is the phase in which the deployed model makes predictions, most often on production data. Firebase hosts and serves the model to the app. Inference can be performed in the cloud (Google Cloud) or on the user’s mobile device. The advantages and disadvantages of this method are discussed in Table 6.

After analyzing the advantages and disadvantages of the discussed methods for implementing neural networks for image classification on mobile devices, TensorFlow Lite was chosen. The prepared TensorFlow Lite model should be integrated with the application interface that will run it on the device to make predictions based on input data. For neural networks requiring significant resources, hardware acceleration via the Android Neural Networks API is possible.

3.2. Graphical User Interface

When launching the mobile application, the user sees the application title OCT Classifier on the screen. Below the title, two buttons offer the following options: taking a picture using the phone’s rear camera or selecting an image from the gallery (screenshot in Figure 2):

Touching the button labelled Take a photo takes the user to the application module that allows taking a picture, editing it, and, later, classifying it (this interface is shown in Figure 3).
Touching the button labelled Choose image from gallery navigates to a module (whose interface is shown in Figure 4), where the user can select an image from the gallery and then submit the selected image for classification.

The application must be granted access to the phone’s photos to allow the user to select photos from the phone’s gallery. At this stage, the application also needs permission to write files to save the bitmap of the classified image in the user’s gallery. Additionally, the user has the option of editing a photo captured using the mobile device’s built-in camera before sending it for classification.

Part of the application where the main image classification occurs can be seen in Figure 5. At the top of the screen, the user can see an image selected from the gallery or a photo taken earlier with the phone’s camera. Below the image is a button, Classify image. Touching it runs the built-in NN models converted to TF Lite to classify the selected image into one of five categories: CNV, DME, DRUSEN, NORMAL or VMT. The classification results (class labels with probabilities) and classification times for each network are presented below the button in a scrollable text format. The result with the highest probability for each network is written in bold to immediately draw the user’s attention to the most likely outcome.

4. Experimental Selection of Neural Networks

4.1. Training and Test of the Selected NN Models

The CNN was run with the data split described in Section 2.1, and training was performed for 100 epochs. The run utilized the Google Colab platform, Google Compute Engine backend with Python 3.10, and a V100 GPU with 16 GB RAM. Table 7 lists setup parameters for the conducted initial experiments. Furthermore, the following data augmentation techniques (applied randomly) were proven to have a positive impact during training only for Joint Attention Network + MobileNetV2 and OpticNet-71 architectures: horizontal flip, rotation (

\pm 40

°), translation (

\pm 20 %

), shear (

\pm 20 %

), and zoom (

\pm 20 %

).

The results of this experiment are shown in Table 8. The best result of 89% for all metrics is achieved for the Joint Attention Network with the MobileNetV2 model. The lowest performance of 64% was obtained with the three-layer CNN architecture. Its low efficiency, resulting from the simple and uncomplicated model structure, confirms the need for advanced methods for the task of classifying human eye OCT images. All models had the highest prediction accuracy for the VMT class and the lowest for the DRUSEN and DME classes. The obtained models’ overall insufficient accuracy can be attributed to the small size and imbalance of the dataset (only 4467 images) and data characteristics differing between the OCT manufacturers (e.g., image brightness and contrast).

4.2. Network Tests After Conversion to the TensorFlow Lite Model

All networks were converted to the TensorFlow Lite format and tested on the same set of images. The resulting Accuracy, Precision, Recall, and F1-Score are presented in Table 9. Figure 6 illustrates the obtained confusion matrix for all network architectures.

Comparing the results for the network with three convolutional layers before and after conversion, the accuracy dropped by 10%, and the F1-score dropped by as much as 13%. Analyzing the confusion matrix in Figure 6a, it can be observed that the VMT class has the greatest number of correct predictions. However, classes CNV and DRUSEN (which are assigned to the DME category) have a significant number of incorrect predictions. The DRUSEN class remained with the fewest correct predictions.

For the VGG16 network, a 13% decrease in both accuracy and F1-score can be observed. After conversion to the TensorFlow Lite format, it can be seen in Figure 6b that the NORMAL class obtained the largest number of correct predictions, while before the conversion, it was the VMT class. The number of correct predictions for the DME class increased compared to the results before conversion. The VMT and DRUSEN classes resulted in the smallest number of positive predictions.

When comparing the results for both formats, the accuracy and F1-score of the InceptionV3 network dropped by 7% and 8%, respectively. The confusion matrix in Figure 6c shows that the network in the TensorFlow Lite format obtained the largest number of correct predictions for the VMT class and performed significantly worse with the NORMAL and DRUSEN classes.

In the case of the Xception network, both accuracy and F1-score decreased by 11%. Analyzing the confusion matrix in Figure 6d, it can be observed that the greatest number of correct predictions were observed for the VMT classes and the fewest for the DRUSEN class. It is worth noting that before conversion, the DME class obtained the smallest number of correct predictions, while after conversion to the TensorFlow Lite format, the number of correct predictions increased.

The results before and after conversion for the Joint Attention Network + MobileNetV2 and OpticNet-71 networks remained the same. Such results may stem from the architecture of these artificial neural networks. Like other networks, OpticNet-71 recognizes VMT disease with the greatest accuracy while having difficulty distinguishing between the NORMAL and DRUSEN classes, as seen in the confusion matrix in Figure 6f. The confusion matrix for Joint Attention Network + MobileNetV2 shown in Figure 6e confirms the best classification performance obtained with this model.

In summary, the results from testing the networks before and after conversion to the TensorFlow Lite format allowed the selection of three networks with the highest scores for implementation in the mobile application: the Joint Attention Network + MobileNetV2, OpticNet-71, and VGG16 networks.

4.3. Prediction on Mobile Device

This part of the study examined the performance of the mobile application on three different mobile devices listed in Table 10.

The aim of the research was to observe the influence of a given mobile device’s camera and its image display method during image acquisition on the results and time of image classification. For each device, application tests were performed under the following three conditions:

OCT scan selected directly from the mobile device gallery—control test.
An OCT scan displayed from a projector, followed by image acquisition using the mobile device’s rear camera; image editing involved cropping to show only the OCT scan, excluding non-tomogram elements of the photo.
An OCT scan displayed on a matte monitor, followed by image acquisition using the mobile device’s rear camera; image editing consisted of cropping, to show only the OCT scan, excluding any non-tomogram elements of the photo.

For each class to which a given image can be assigned, one OCT scan was selected, giving five images in total. These scans were used during tests under three different conditions on all mobile devices. The selected images were not used during the training of the neural networks. The obtained results are listed in Table 11.

The above classification results show that the control sample (i.e., scans selected from the gallery), provided identical results regardless of the device. Only the duration of the classification varied depending on the available computing power. In all situations, the Joint Attention Network + MobileNetV2 network operated fastest, and VGG16 slowest.

For photos taken with a phone camera, a decrease in classification effectiveness was observed in most cases compared to the control group. Significant differences sometimes occurred between images displayed on the projector and the monitor; however, no common dependencies could be found. A potential reason for the better classification results of OCT scans displayed on the monitor compared to OCT scans displayed on the projector is the better contrast of retina layers.

Although the test sample was small, it can be assumed that the type and resolution of the phone camera should not have a significant impact on the results, since all images are resized to dimensions of

224 \times 224

pixels. Disturbances occurring in images, resulting from reflections or shadows created during the photographing of the image, seem to be of greater importance.

5. Conclusions

Diagnostic solutions based on artificial intelligence can support doctors in interpreting medical data (for example, assessing and detecting pathological changes in the image) and at the same time be a digital second opinion. Of course, the final decision is always made by the doctor. Modern smartphones have sufficient computing power to implement software using deep neural network architectures.

The presented solution, available in open source form, can be further developed. The user interface in the B-scan classification process shows decision probabilities for three types of DNN architects. This type of solution is helpful in further evaluation of the selection of solutions. The achieved values of the correctness metrics depend on the type and size of the training data. However, it should be noted that the simplified architectures operating on smartphones using TensorFlow Lite do not cause changes in the values of such metrics.

Author Contributions

Conceptualization, T.M.; methodology, A.S. and T.M.; software, R.C., N.B. and M.D.; validation, T.M. and A.S.; formal analysis, T.M. and A.S.; investigation, T.M. and A.S.; resources, T.M. and A.S.; data curation, A.S.; writing—original draft preparation, T.M. and A.S.; writing—review and editing, T.M., A.S., R.C., N.B. and M.D.; visualization, R.C., N.B. and M.D.; supervision, A.S. and T.M.; funding acquisition, T.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was prepared within Poznan University of Technology project number 0211/SBAD/0225.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The Python code supporting the reported results can be found in the GitHub repository under the link https://github.com/Marcin-Dziedzic/OCT-Classifier (accessed on 31 January 2025). The OCT2023 dataset utilized in the study is available on request.

Acknowledgments

We would like to thank a team of ophthalmology experts from the Department of Ophthalmology, Chair of Ophthalmology and Optometry, Heliodor Swiecicki University Hospital, Poznan University of Medical Sciences for validation of the OCT B-scan dataset used for the experiments.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
AMD	Age-related Macular Degeneration
CCE	Categorical Cross-Entropy
CNN	Convolutional Neural Network
CNV	Choroidal NeoVascularization
DME	Diabetic Macular Edema
JAN	Joint Attention Network
JPEG	Joint Photographic Experts Group image format
LOCT	Labeled Optical Coherence Tomography
MSE	Mean Squared Error
NN	Neural Network
OCT	Optical Coherence Tomography
ReLU	Rectified Linear Unit

References

Code of Medical Ethics (Kodeks Etyki Lekarskiej). Available online: https://nil.org.pl/dokumenty/kodeks-etyki-lekarskiej (accessed on 24 January 2025).
Yuan, Y.; Zhang, X.; Wang, Y.; Li, H.; Qi, Z.; Du, Z.; Chu, Y.-H.; Feng, D.; Hu, J.; Xie, Q.; et al. Multimodal data integration using deep learning predicts overall survival of patients with glioma. View 2024, 5, 20240001. [Google Scholar] [CrossRef]
Akpinar, M.H.; Sengur, A.; Faust, O.; Tong, L.; Molinari, F.; Acharya, U.R. Artificial intelligence in retinal screening using OCT images: A review of the last decade (2013–2023). Comput. Methods Programs Biomed. 2024, 254, 0169–2607. [Google Scholar] [CrossRef] [PubMed]
Mares, V.; Nehemy, M.B.; Bogunovic, H.; Frank, S.; Reiter, G.S.; Schmidt-Erfurth, U. AI-based support for optical coherence tomography in age-related macular degeneration. Int. J. Retin. Vitr. 2024, 10, 31. [Google Scholar] [CrossRef] [PubMed]
Dahrouj, M.; Miller, J.B. Artificial Intelligence (AI) and Retinal Optical Coherence Tomography (OCT). Semin. Ophthalmol. 2021, 36, 341–345. [Google Scholar] [CrossRef] [PubMed]
Moraru, A.D.; Costin, D.; Moraru, R.L.; Branisteanu, D.C. Artificial intelligence and deep learning in ophthalmology - present and future (Review). Exp. Ther. Med. 2020, 20, 3469–3473. [Google Scholar] [CrossRef] [PubMed]
Melo, T.; Carneiro, Â.; Campilho, A.; Mendonça, A.M. Retinal layer and fluid segmentation in optical coherence tomography images using a hierarchical framework. J. Med. Imaging 2023, 10, 014006. [Google Scholar] [CrossRef] [PubMed]
Miranda, M.; Santos-Oliveira, J.; Mendonça, A.M.; Sousa, V.; Melo, T.; Carneiro, Â. Human versus Artificial Intelligence: Validation of a Deep Learning Model for Retinal Layer and Fluid Segmentation in Optical Coherence Tomography Images from Patients with Age-Related Macular Degeneration. Diagnostics 2024, 14, 975. [Google Scholar] [CrossRef] [PubMed]
Zong, H.; Zhang, Y.; Liu, X.; Xu, Z.; Ye, J.; Lu, S.; Guo, X.; Yang, Z.; Zhang, X.; Chai, M.; et al. Recent trends in smartphone-based optical imaging biosensors for genetic testing: A review. View 2023, 4, 20220062. [Google Scholar] [CrossRef]
Mohan, A.; Kaur, N.; Sharma, V.; Sen, P.; Jain, E.; Gajraj, M. Ophthalmologists on Smartphones: Image-Based Teleconsultation. Br. Ir. Orthopt. J. 2019, 15, 3–7. [Google Scholar] [CrossRef] [PubMed]
Raju, B.; Raju, N.S.; Akkara, J.D.; Pathengay, A. Do it yourself smartphone fundus camera—DIYretCAM. Indian J. Ophthalmol. 2016, 64, 663–667. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Nagino, K.; Sung, J.; Midorikawa-Inomata, A.; Eguchi, A.; Fujimoto, K.; Okumura, Y.; Miura, M.; Yee, A.; Hurramhon, S.; Fujio, K.; et al. Clinical Utility of Smartphone Applications in Ophthalmology: A Systematic Review. Ophthalmol. Sci. 2024, 4, 100342. [Google Scholar] [CrossRef]
Rao, A.; Fishman, H.A. OCTAI: Smartphone-based Optical Coherence Tomography Image Analysis System. In Proceedings of the 2021 IEEE World AI IoT Congress (AIIoT), Seattle, WA, USA, 10–13 May 2021; pp. 0072–0076. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. arXiv 2015, arXiv:1512.00567. [Google Scholar]
Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. arXiv 2017, arXiv:1610.02357. [Google Scholar]
Kamran, S.A.; Tavakkoli, A.; Zuckerbrod, S.L. Improving Robustness Using Joint Attention Network for Detecting Retinal Degeneration From Optical Coherence Tomography Images. In Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 25–28 October 2020; pp. 2476–2480. [Google Scholar] [CrossRef]
Kamran, S.A.; Saha, S.; Sabbir, A.S.; Tavakkoli, A. Optic-Net: A Novel Convolutional Neural Network for Diagnosis of Retinal Diseases from Optical Tomography Images. In Proceedings of the 2019 18th IEEE International Conference on Machine Learning and Applications (ICMLA), Boca Raton, FL, USA, 16–19 December 2019; pp. 964–971. [Google Scholar] [CrossRef]
TensorFlow Lite. Available online: https://ai.google.dev/edge/litert (accessed on 4 February 2025).
ONNX Github Version 1.17. Available online: https://github.com/onnx/onnx (accessed on 4 February 2025).
Firebase ML. Available online: https://firebase.google.com/docs/ml (accessed on 4 February 2025).

Figure 1. Examples of OCT image classes.

Figure 2. Application start menu interface (with the application title visible and buttons allowing you to choose how to obtain the image).

Figure 3. The interface of the application fragment for acquiring an image with the phone’s rear camera.

Figure 4. Interface of the application fragment for selecting a photo from the gallery.

Figure 5. Application screen interface where image classification takes place.

Figure 6. Confusion matrices for reduced models (Tensorflow Lite).

Table 1. Distribution of images between the devices and categories.

Device Name	CNV	DME	DRUSEN	NORMAL	VMT	Total
Avanti RTvue XR	0	0	54	514	860	1428
Copernicus HR	0	0	0	112	84	196
Heidelberg Spectralis	891	858	801	293	0	2843
Total	891	858	855	919	944	4467

Table 2. Sizes of images in OCT2023 dataset.

Resolution [px]	CNV	DME	DRUSEN	NORMAL	VMT	Total
384 × 496	2	-	-	-	-	2
512 × 496	296	121	403	120	-	940
512 × 512	-	571	-	77	-	648
583 × 1010	-	-	-	-	2	2
600 × 1009	-	-	-	1	-	1
768 × 496	380	122	310	62	-	874
799 × 1010	-	-	-	-	3	3
800 × 1009	-	-	-	-	2	2
800 × 1010	-	-	-	111	54	165
900 × 1010	-	-	-	-	23	23
998 × 960	-	-	-	2	-	2
1020 × 768	-	-	-	326	456	782
1020 × 960	-	-	54	186	404	644
1024 × 496	6	-	-	-	-	6
1536 × 496	207	44	88	34	-	373

Table 3. Structure of tested 3-layer CNN.

Layer Type	Output Shape	Filter Size	Drop Fraction	Activation	Parameters Number
Image	$224 \times 224 \times 3$	—	—	—	—
Convolution 2D	$222 \times 222 \times 32$	$3 \times 3$	—	ReLU	896
MaxPooling2D	$111 \times 111 \times 32$	$2 \times 2$	—	—	—
Dropout	$111 \times 111 \times 32$	—	0.25	—	—
Convolution 2D	$109 \times 109 \times 64$	$3 \times 3$	—	ReLU	18,496
Dropout	$109 \times 109 \times 64$	—	0.25	—	—
Convolution 2D	$107 \times 107 \times 128$	$3 \times 3$	—	ReLU	73,856
Dropout	$107 \times 107 \times 128$	—	0.4	—	—
Flatten	$1 \times 1, 465, 472$	—	—	—	—
Dense	$1 \times 128$	—	—	ReLU	187,580,544
Dropout	$1 \times 128$	—	0.4	—	—
Dense	$1 \times 5$	—	—	Softmax	645

Table 4. Advantages and disadvantages of TensorFlow Lite.

Advantages

Disadvantages

low processing time (no data transfer to the server, no waiting for results to be returned)
privacy (no personal data leave the device, can be performed on sensitive data)
no internet connection is required
smaller size (reduced model)
lower power consumption due to efficient inference and no network connections
support for multiple platforms, including Android and iOS devices
support for various languages, including C++, Python (version 3.5 and higher)
high performance thanks to hardware acceleration and model optimization
comprehensive examples for typical machine learning tasks, e.g., image classification

limited computing capabilities—lower computing power compared to full-featured multi-processor servers
limited model size—the size of the model (its overall complexity, including data preprocessing logic and the number of layers in the model) must be limited to avoid excessively slow performance, and the models fit within the limited memory of the mobile or edge device
limited data size—with large data libraries, the data may need to be stored and accessed off-device
fewer supported TensorFlow operations—TensorFlow Lite has a smaller subset of model operations compared to TensorFlow (requires ensuring original model compatibility with the capabilities of TensorFlow Lite runtimes)

Table 5. Advantages and disadvantages of ONNX.

Advantages

Disadvantages

support for models from multiple platforms, such as TensorFlow, PyTorch, SciKit-Learn, Keras, Chainer, MXNet, MATLAB, and SperkML
the ability to benefit from extensive production-grade optimizations, testing, and continuous improvements
on average, twofold performance increase when using GPUs
the ability to run on various platforms and devices (optimized for cloud and peripheral devices, works on Linux, Windows, and macOS)

code adaptation may be needed to enable conversion to the ONNX format for more complex models
ONNX Runtime has worse support than other well-known libraries

Table 6. Advantages and disadvantages of Firebase ML.

Advantages	Disadvantages
no network connection is required when inference is performed on the user’s mobile device very fast operation (fast enough to allow for real-time video frame processing) access to a set of ready-made models on a device for tasks such as text recognition	real-time database—the main data storage area has difficulties enforcing relationships between data a cloud-based solution requires payment less support for iOS applications

Table 7. Training parameters for the utilized NN models.

Parameter	3-Layer CNN	VGG16	InceptionV3	Xception	JAN + MobileNetV2	OpticNet-71
Input image size	$224 \times 224$	$224 \times 224$	$150 \times 150$	$150 \times 150$	$224 \times 224$	$224 \times 224$
Learning rate	0.00005	0.00005	0.00001	0.00001	0.0001	0.0001
Batch size	64	32	64	64	8	8
Loss function	Categorical Cross-Entropy (CCE)				CCE and MSE	CCE
Optimizer	RMSprop				Adam	Adam

Table 8. Test results for original-size models trained on OCT2023 dataset.

Model	Accuracy	Precision	Recall	F1-Score
3-layer CNN	0.64	0.64	0.64	0.64
VGG16	0.83	0.83	0.82	0.83
InceptionV3	0.72	0.72	0.72	0.72
Xception	0.77	0.77	0.77	0.77
JAN + MobileNetV2	0.89	0.89	0.89	0.89
OpticNet-71	0.80	0.80	0.79	0.80

Bold represents model with best results.

Table 9. Test results for models converted to TensorFlow Lite.

Model	Accuracy	Precision	Recall	F1-Score
3-layer CNN	0.54	0.58	0.53	0.51
VGG16	0.70	0.75	0.70	0.70
InceptionV3	0.65	0.65	0.65	0.64
Xception	0.66	0.67	0.66	0.66
JAN + MobileNetV2	0.89	0.89	0.89	0.89
OpticNet-71	0.80	0.80	0.79	0.80

Bold represents model with best results.

Table 10. Prediction times and accuracy for Motorola Moto G7 Play.

Phone Model	OS Version	Rear Camera	RAM
Motorola Moto G7 Play	Android 10	13 Mpx	2 GB
Samsung Galaxy S8	Android 9	12 Mpx	4 GB
Oneplus 8 Pro	Android 13	12 Mpx	12 GB

Table 11. Prediction times and accuracy for tested mobile phones.

Phone Model	Image Origin	VGG16		JAN + MobileNetV2		OpticNet-71
Phone Model	Image Origin	ACC	Time [ms]	ACC	Time [ms]	ACC	Time [ms]
Motorola Moto G7 Play	Phone gallery	80%	5099.4	60%	412.6	80%	3187.2
	Matte monitor photo	100%	5221.2	100%	405.4	60%	3220.8
	Projection photo	80%	4970.4	60%	350.8	40%	3264.0
Samsung Galaxy S8	Phone gallery	80%	2980.8	60%	365.0	80%	1515.2
	Matte monitor photo	80%	2515.6	100%	330.2	60%	1493.4
	Projection photo	60%	2407.0	80%	275.4	40%	1480.8
Oneplus 8 Pro	Phone gallery	80%	931.4	60%	90.8	80%	570.6
	Matte monitor photo	60%	916.2	80%	88.6	20%	579.2
	Projection photo	80%	961.2	80%	92.2	40%	589.4

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Stankiewicz, A.; Marciniak, T.; Budna, N.; Chwałek, R.; Dziedzic, M. Classification of OCT Images of the Human Eye Using Mobile Devices. Appl. Sci. 2025, 15, 2937. https://doi.org/10.3390/app15062937

AMA Style

Stankiewicz A, Marciniak T, Budna N, Chwałek R, Dziedzic M. Classification of OCT Images of the Human Eye Using Mobile Devices. Applied Sciences. 2025; 15(6):2937. https://doi.org/10.3390/app15062937

Chicago/Turabian Style

Stankiewicz, Agnieszka, Tomasz Marciniak, Nina Budna, Róża Chwałek, and Marcin Dziedzic. 2025. "Classification of OCT Images of the Human Eye Using Mobile Devices" Applied Sciences 15, no. 6: 2937. https://doi.org/10.3390/app15062937

APA Style

Stankiewicz, A., Marciniak, T., Budna, N., Chwałek, R., & Dziedzic, M. (2025). Classification of OCT Images of the Human Eye Using Mobile Devices. Applied Sciences, 15(6), 2937. https://doi.org/10.3390/app15062937

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Classification of OCT Images of the Human Eye Using Mobile Devices

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset

2.2. Selection of Neural Network Models for OCT Image Classification

3. Android-Based Application for OCT Classification

3.1. Implementing Neural Networks in a Mobile Application

3.2. Graphical User Interface

4. Experimental Selection of Neural Networks

4.1. Training and Test of the Selected NN Models

4.2. Network Tests After Conversion to the TensorFlow Lite Model

4.3. Prediction on Mobile Device

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI