Early Detection of Diabetic Retinopathy Using PCA-Fireﬂy Based Deep Learning Model

: Diabetic Retinopathy is a major cause of vision loss and blindness affecting millions of people across the globe. Although there are established screening methods - ﬂuorescein angiography and optical coherence tomography for detection of the disease but in majority of the cases, the patients remain ignorant and fail to undertake such tests at an appropriate time. The early detection of the disease plays an extremely important role in preventing vision loss which is the consequence of diabetes mellitus remaining untreated among patients for a prolonged time period. Various machine learning and deep learning approaches have been implemented on diabetic retinopathy dataset for classiﬁcation and prediction of the disease but majority of them have neglected the aspect of data pre-processing and dimensionality reduction, leading to biased results. The dataset used in the present study is a diabetes retinopathy dataset collected from the UCI machine learning repository. At its inceptions, the raw dataset is normalized using the Standardscalar technique and then Principal Component Analysis (PCA) is used to extract the most signiﬁcant features in the dataset. Further, Fireﬂy algorithm is implemented for dimensionality reduction. This reduced dataset is fed into a Deep Neural Network Model for classiﬁcation. The results generated from the model is evaluated against the prevalent machine learning models and the results justify the superiority of the proposed model in terms of Accuracy, Precision, Recall, Sensitivity and Speciﬁcity.


Introduction
There is also another group of 325 million with risk of Type II diabetes in 2017 and the numbers of these patients are also progressively increasing throughout the world [1].Most of the people in this category belonged to the age group between 40 to 59 years of age wherein 1 out of 2 people among 212 million people are completely ignorant and uninformed of their disease.Hence it is quite evident that diabetic retinopathy has the probability to soon become a major health issue throughout the world.Obesity, unhealthy diet and physical inactivity are the primary factors responsible for Type 2 diabetes.But it is important to understand that diabetic retinopathy gets developed only when a patient has diabetes for at least 10 years and remains unaware and untreated without proper eye examination.Diabetic Retinopathy can always be prevented if it is detected early enough by conducting health check-ups and systematic treatment of Diabetes [2,3].Duration of Diabetes acts as one of the primary cause for patients getting affected with retinopathy wherein with longer duration, the probability of occurrence of the disease gets enhanced.Hence it is evident that retinopathy initiates when the patient has diabetes for a longer timeframe being completely unaware, ignorant and untreated on the possibilities of diabetic retinopathy as a natural progression from diabetes [4].The onset of diabetes happens when there is an abnormal fluctuation in the blood sugar level.Normally the glucose in the body gets transformed into energy helping to conduct regular human activities.But in an adverse situation when there is abnormal shooting up of blood sugar level, the excess blood sugar generated finds no other option but to get accumulated in blood vessels of various organs in the body including human eye [3].This phenomenon is called hyperglycemia.Diabetic eye disease or retinopathy has two stages namely Non-proliferative Diabetic Retinopathy (NPDR) and Proliferative Diabetic Retinopathy (PDR).In NPDR the retina gets swollen, a case of macular edema, due accumulation of glucose leading to blood vessel leakages in the eyes.The swelling could be so worse that the vessels could get completely blocked resulting, in macular ischemia.In all of the instances the patient loses vision partially, completely or sometimes suffers from blurred vision.
The PDR occurs at a much advanced stage of diabetes when new blood vessels start growing in the retina, a case known as neovascularization.The new blood vessels are extremely thin and fragile being more prone to haemorrhage.The blood from the haemorrhage leads to partial or complete vision loss.Also the newly created blood vessels form scar tissues which lead to detachment of retina resulting in loss of central or peripheral vision.The symptoms of NPDR and PDR include blurred vision, haemorrhages, wool spots, double vision, corneal abnormalities, intra-retinal abrasions, microvascular abnormalities, microaneurysms and increase in retinal permeability [5].
The popular diagnosis of diabetic retinopathy includes fluorescein angiography and optical coherence tomography.In fluorescein angiography, the physician injects a dye in the patient's arm vein and pictures are taken as the dye flows through the blood vessels in the eyes detecting cases of blockages, leakages and haemorrhage.In optical coherence tomography, tests are conducted to take cross sectional images of the retina which helps to identify issues pertinent to fluid leakages or damages in the retinal tissue [6].
It is thus evident from the fact that detection of the disease plays a major role in saving patients from vision loss.The more time the disease gets lingered being ignorant or untreated, the consequences could be severe.Machine learning algorithms have been a prevalent choice in the prediction of various diseases.The concept of machine learning was framed by Arthur Samuel in 1959 as a technique for computers to automatically learn without programming interventions and further make decisions from the experiences of learning.Deep neural network is based on the concept of machine learning and artificial neural networks [7][8][9][10].DNN has successfully contributed towards analysis and decision making in the fields of computer vision, speech recognition, drug design, medical image processing and many others.The implementation of such advanced machine learning approaches as DNN had significantly contributed towards pathological screening and disease predictions thereby reducing the burden of human interpretations.With such glorifying results of application of DNN and machine learning approaches in various other fields of healthcare, application of the same in the detection of diabetic retinopathy was a natural point of interest with an objective to reduce occurrence of this disease [11,12].Thus motivation of the present study was: 1. Early detection of the diabetic retinopathy disease giving opportunity for medical practitioners to treat and cure the same at an early stage with higher accuracy.2. Focus on the most significant factors of the disease eliminating the irrelevant ones ensuring more accurate classification.
A Deep neural network model is used in the present study in convergence with Principal Component Analysis (PCA) and firefly algorithm for the classification of diabetic retinopathy set.The dataset is collected from the publicly available UCI machine learning repository.The data being collected from the public domain includes attributes which are irrelevant and inclusion of the same would only increase burden on the ML model.Hence Principal component analysis (PCA) algorithm is implemented for feature extraction from DR image dataset.To further improve on the classification results, firefly algorithm is implemented for dimensionality reduction.The resultant reduced dataset is fed into the deep neural network model generating enhanced classification of the diabetic retinopathy dataset.The result of the proposed model is evaluated against the traditional state of the model to establish its superiority in terms of the accuracy, specificity, precision, recall and sensitivity.
The following sections of the paper are organized as follows: Section 2 presents an explicit literature review, Section 3 discusses the preliminaries and experimental setting, Section 4 describes the methodology, Section 5 highlights the results and Section 6 provides conclusion and scope of future work.

Related Work
The study in [13] developed a deep learning system for identification of diabetic retinopathy with enhanced accuracy than existing studies.The analysis was performed on a small percentage of images with higher resolutions.The results highlighted the ability of deep learning models to diagnose the disease achieving the desired performance level considering limitations in cost as well.
The study in [14] has implemented adjudication for the quantification of errors in diabetic retinopathy (DR) focusing on grading of the disease using a deep learning algorithm.The kappa score was measured and performance of the model was compared based on sensitivity, accuracy and area under the curve (AUC).
The research work conducted in [15] developed a data oriented deep learning model for DR detection wherein the coloured fundus images [16] of the disease were processed and the classification model helped to segregate the diseased ones from the healthy images.
In [17] a deep learning model was designed to detect diabetic retinopathy and macular edema from images of retinal fundus.A deep convolutional neural network [18] was used to train a retinal image dataset consisting of 128175 images.The sensitivity and specificity scores in the study helped to detect referable diabetic retinopathy (RDR among diabetic patients using a deep neural network model. The study by Swapna et.al [19] designed a deep learning model for the classification of diabetes using HRV data.The dynamic features relevant to the HRV data were extracted using a combination of long-short-term memory (LSTM) and convolutional neural network.The output of the model achieved prominent accuracy in detection of diabetes in HRV data.
The study in [20] presented a hybrid technique incorporating image processing and deep learning for detection and classification of diabetic retinopathy.The model was validated using the retinal fundus dataset consisting of 400 images of the MESSIDOR database yielding good results.
The study in [21] developed a computer-aided screening system that helped to analyse fundus images having different illumination and views.The study basically helped to detect the severity level of DR using ML models.The model used AdaBoost classifier for feature extraction and analysed the data using Gaussian Mixture Model, KNN, SVM to classify retinopathy lesion cases from nonlesions.
The study in [22,23] developed an FFBAT-based algorithms for classification of diabetes.The unique contribution of the study was its use of LPP algorithm using fuzzy rules for feature extraction and the FFBAT-ANN model for classification.This combination helped to achieve better classification results yielding improve accuracy in the results.
Various studies have also used Probabilistic Neural Network (PNN), Bayesian Classification and Support Vector Machines (SVM) for the classification NPDR and PDR types of diabetic retinopathy.The images of haemorrhages [24] of blood vessels and analysed using image processing techniques and the extracted features when fed into the classifiers help to classify the types of DR diseases [25][26][27].
It is quite evident from the related work that majority of the work in diabetic retinopathy detected revolves around use of various machine learning models and comparison of the performance of these models.It is also observed that less emphasis has been given on improvement of quality of the diabetic retinopathy dataset which could lead to more accurate results.It is important to highlight the fact that the reliability of results generated from the machine learning model depends on the features of the dataset.Extraction of the most significant features in the dataset and use of appropriate dimensionality reduction techniques help to enhance accuracy of the prediction results of the machine learning models.The present focuses on this aspect and adopts a two layered dimensionality reduction approach followed by the use of deep neural network model for classification.The unique contributions of the proposed work include: 1.A three layered rigorous pre-processing approach is adopted to enhance the quality of the dataset and include relevant and significant attributes alone for training the proposed model.2. The implementation of PCA+Firefly is used to significantly reduce the training time of the ML based models.

Preliminaries and Experimeental Setting
This section discusses the methodologies used in the proposed model namely, PCA and Firefly algorithms.The detailed architecture of the proposed model is also presented.

Principal Component Analysis
The concept of PCA is based on the objective of reduction in dimensionality of a data set consisting of multiple variables which are correlated with each other while retaining maximum variability in the data set [28,29].The algorithm transforms the variables in the data set to a new set of orthogonal principal components ordered in a manner such that the retention of variation in the original variables decreases while traversing down the order.Hence the first principal component retains maximum variation present in the original components.The principal components are the eigenvectors in covariation matrix which are orthogonal.The dataset to be used in PCA needs to be a scaled one and the method summarizes the data generating results which are also sensitive to relative scaling.The principal component is defined as a "linear combination of optimally weighted observed variables".The output generated from PCA are such principal components whose numbers are either lesser or equal to the original variables.The steps involved in implementing PCA on a two dimensional data set starts with Normalization of the data.This is done by subtracting the respective means from each of the respective columns in the data set computing a data set with mean of zero.The second step involves calculation of the covariation matrix.Then the Eigen values and Eigen vectors are calculated for the covariance matrix.The Eigen values are then ordered in a descending order to provide the order of significance for the components and the dimensionality is reduced by choosing first set of Eigen values and ignoring the rest.A matrix of vectors is formed to create a feature vector.In the final step the principal components are formed by considering the transpose of the feature vector and computing the left multiplication with the transpose of scaled version of the data set.The concept of dimensionality reduction in PCA pitches its use in facial recognition, computer vision and image compression.It has also wide spectrum of applications in pattern identification of high dimensional data pertaining to the field of finance, datamining, bio-informatics and psychology [30][31][32][33].

Firefly Algorithm
Firefly algorithm is a "nature-inspired" algorithm based on the behaviour of flies.Nature inspired algorithms are extensively used in several stages of machine learning process [34,35].The fireflies have natural lights emitting from their body that help them to attract or find other flying mates [36][37][38].It also helps them to catch their prey and protect themselves from predators.The algorithm is based on three primary assumptions [39]: 1.The artificial fireflies are unisex and their attraction are not dependent on gender.2. The attractiveness of a firefly is proportional to the brightness of the lights emitted and hence it decreases as they move away from each other due to absorption of the light by air.Since all fireflies emit light, the one emitting the brightest one attracts most of its neighbours.On the contrary, if there is a situation of no such bright firefly, all the fireflies move around in a haphazard fashion in any direction.3. The brightness of the flashing light being the criteria for attraction is the objective function to be optimized in the algorithm.

Experimental Setting
The experimental setting of proposed methodology is illustrated in Figure 1.The dataset used in this work has 19 contributing attributes.The values of these attributes are of different range.This variation in the range of the values of the attributes may lead to varied weights of some instances which may results in biased prediction results.In order to avoid such heterogeneity, as part of pre-processing, a StandardScaler method is used in the proposed work.Standardscaler method normalizes the data converting it to a common range to eliminate bias in the prediction results.The Principal Component Analysis algorithm is then applied on this normalized data.The main reason for using PCA is to eliminate the insignificant attributes from consideration for training the DNN.To further strengthen the feature engineering process, one of the popular nature inspired algorithms, Firefly Optimization Algorithm, is used in this work.The main strength of Firefly algorithm is that it tunes the parameters in such a way that this algorithm chooses the optimal parameters, whose convergence rate would be very fast avoiding local minima.This property of Firefly algorithm makes it an ideal choice for feature engineering to choose optimal parameters which influence the classification in a positive way thereby reducing training time.The dimensionally reduced dataset is then fed to DNN for classification of diabetic retinopathy datasets.Adam optimizer and Softsign activation function was used at each layer except the output layer.The output layer used sigmoid activation function to classify the Diabetic Retinopathy dataset, since it is a binary classifier.For backpropagation the Root mean square propagation (RMSprop) error was used.The dataset was split into 8:2 ratio to train and test respectively.Instead of training entire 80 percent of data and then testing the model on remaining 20 percent of data at one go, for every epoch a batch of 64 records were fed to the model, out of which 80 percent of the records were used to train the model and remaining 20 percent of those records were used to test the model.The proposed model is summarized as follows:

Results and Discussion
This section discusses about the dataset used, experimental framework, metrics used and the experimental results.
The diabetic retinopathy dataset used in this study had 1151 instances and 20 attributes.The attributes of the dataset used in this paper are discussed in Table 1.Softsign activation fnction was used in all the layers except at the output layer.
The experimentation was carried out on Diabetic Retinopathy Debrecen dataset from UCI machine learning repository [40].The attributes in this dataset were the features extracted from the Messidor image dataset.A personal computer with 8 GB RAM was used for performing the experimentation using Python.The binary value of amplitude-modulation-frequency-modulation(AM/FM)-based classification.20 Class label. 1 -signs of DR and 0 -no signs of DR.

Metrics for Evaluation of the Model
The following metrics are used to evaluate the proposed model.Accuracy: It is the percentage of correct predictions that a classifier has made when compared to the actual value of the label in the testing phase.Also, it can be said as the ratio of Number of correct assessments to the Number of all assessments.Accuracy can be calculated using the following Equation (1).
Where, TP is true positives, TN is false negatives, FP is false positives, FN is false negatives.
If the class label of a record in a dataset is positive, and the classifier predicts the class label for that record as positive, then it is called as true positive.If the class label of a record in a dataset is negative, and the classifier predicts the class label for that record as negative, then it is called as true negative.If the class label of a record in a dataset is positive, but the classifier predicts the class label for that record as negative, then it is called as false negative.If the class label of a record in a dataset is negative, but the classifier predicts the class label for that record as positive, then it is called as false positive.
Sensitivity:It is the percentage of true positives that are correctly identified by the classifier during testing.It is calculated using the following Equation (2).
Specificity: It is the percentage of true negatives that are correctly identified by the classifier during testing.It is calculated using the following Equation (3).

Speci f icity
Precision: Precision is a significant measure for determining exactness, it states that how much percentage of instances the classifier labelled as positive, with respect to the total predictive positive instances as shown in Equation (4).
Recall: Recall determines completeness i.e, the percentage of positive instances identified by the classifier as positive.The recall is a performance metric used to select the best model when there is a high cost associated with False Negative as shown in Equation (5).
F1-measure: (F1 or F1-score) represents the harmonic mean of precision and recall as shown in Equation (6).
F1 Score is required to find a balance between Precision and Recall.Accuracy is mainly contributed by a large number of True Negatives whereas False Negative and False Positive usually have business costs (tangible & intangible).Thus F1 Score might be a better measure when a balance between Precision and Recall is needed with an uneven class distribution (large number of Actual Negatives).

Performance Analysis
For evaluating the proposed model, a sequential model was used to build the DNN-PCA model.For the purpose of cross validation, the dataset was split into two parts, 80% of the dataset was used for training and 20% for validating/testing for every 64 records (batchsize).To identify the activation function best suited for the dataset, experimentation was performed on several activation functions like relu, elu, tanh, softmax, selu, softplus and softsign on the dataset with 50 epochs and batchsize of 64.The results of the experimentation are shown in Figure 2. As observed from Figure 2, Softsign activation function gave best average training and testing accuracy.Hence Softmax activation function is chosen on the dense layers for evaluating the model.

Figure 2. Performance Evaluation of Activation Functions
To choose the best optimizer in the layers of deep neural networks, experimentation was conducted on the dataset using several optimizers like Adam, Nadam, SGD, rmsprop, adagrad, adadelta, and adamax with 50 epochs and batch size of 64.The results of this experimentation are shown in Figure 3.As per Figure 3, adam optimizer provided best accuracy.Hence, adam optimizer is chosen for experimentation at input and also other dense layers.Sigmoid optimizer is chosen for the output layer.To choose the number of layers in deep neural networks for experimentation, the DR dataset was experimented using several layers with softsign activation function, adam optimizer at input and dense layers, sigmoid optimizer at the output layer, 50 epochs and batch size of 64.As shown in Figure 4, the model had best training and testing accuracy with 5 layers with accuracy level starting to dip with 6 layers.Hence a deep neural network with 5 layers was used for the experimentation.To choose the number of epochs, the DR data set was experimented using 5 intermediate layers with softsign activation function, adam optimizer at input and dense layers, sigmoid optimizer at the output layer with batchsize of 64.As shown in Figure 5, the model was successful in providing best average training and testing accuracy with 600 epochs with the testing accuracy starting to dip with 650 epochs.Hence, a deep neural network was trained with 600 epochs.In the experimental work, the number of components chosen for the PCA was 0.9 percent i.e., to retain 99 percent of the information.Figures 6-11 illustrate the performance evaluation of the ML models based on the measures accuracy, precision, recall, sensitivity and specificity.It is evident from these figures that PCA-Firefly based ML models outperform the other two cases -ML with PCA and ML without dimensionality reduction.Considering inclusion and noninclusion of dimensionality reduction and feature engineering concepts with ML algorithms, it is observed that the proposed model-DNN-PCA-Firefly performs better than the other hybrid ML algorithms considered.The results obtained based on the experimentation are tabulated in Table 2.The highlights of the results pertinent to the proposed model are: 1. DNN-PCA-Firefly model outperforms other popular ML hybrid models considered for comparision.2. Application of PCA alone on DNN and other ML algorithms results in slight deterioration in the performance measures.But the training time gets reduced.3. The implementation of PCA+Firefly on the contrary enhances the performance of the ML algorithms with further reduction in training time as illustrated in Figure 12. 4. The original dataset when used was succumbed to over fitting having an negative effect on the testing accuracy.However, when the number of records in the dataset was increased by resampling, the performance enhanced with higher testing accuracy.

Conclusions and Future Work
In the present study a hybrid principal component analysis (PCA) -firefly based deep neural network model is used for the classification of diabetic retinopathy dataset.The dataset is collected from the publicly available UCI machine learning repository which at its raw state had redundant and irrelevant attributes.Rigorous pre-processing was the prime focus of the study and hence a three layered pre-processing framework was initiated.At the outset, standardscalar technique was employed to normalize the dataset and then Principal Component Analysis (PCA) was used for feature selection.Further, Firefly algorithm was used for the purpose of dimensionality reduction.This reduced dataset was fed into the deep neural network (DNN) which generated classifications results with enhanced accuracy.The results of the model were also evaluated with the predominant machine learning approaches wherein the results defended the superiority of the model in terms of the Accuracy, Precision, Recall, Sensitivity and Specificity.The major benefits of the model as highlighted, includes its potential to be implemented on any high dimensional dataset in various other domains.However, the same performance may not be observed in case of low domensional dataset with possibilities of the model being overfitted acting as its limitation.As part of the future study, the proposed model could be utilized for data sets in other domains.The performance of the proposed model therefore motivates to conduct similar studies in various other domains having high dimensional data.This approach can also be used for eliminating noisy data in Magnetoencephalography (MEG) data analysis contributing towards better prediction in healthcare.

Input:
Diabetic Retinopathy Dataset Output: Classification of class label 1.Data Transformation: Normalization of the input dataset is done using StandardScaler.2. Dimensionality Reduction: Input the transformed dataset to the PCA for dimensonality reduction.To further refine feature engineering use firefly optimization algorithm.

3 .
Classification: Feed the extracted features to the DNN for classifying the Diabetic Retinopathy dataset.4. Evaluation: Evaluate the performance of the model using several measures like Accuracy, Precision, Recall, Sensitivity and Specificity.5. Comparison: Comparison of the experimental results of the proposed model with traditional ML algorithms.

Figure 3 .
Figure 3. Performance Evaluation of Optimizers

Figure 4 .
Figure 4. Performance Evaluation Based on Number of Layers

Figure 5 .
Figure 5. Performance Evaluation Based on Number of Epochs

Figure 6 .
Figure 6.Performance Evaluation of DNN Based Models

Figure 7 .Figure 8 .
Figure 7. Performance Evaluation of Decision Tree Based Models

Figure 9 .Figure 10 .
Figure 9. Performance Evaluation of Naive Bayes Based Models

Figure 11 .
Figure 11.Performance Evaluation of XGBoost Based Models

Figure 12 .
Figure 12.Comparison of Training Time for DNN based models

Table 2 .
Summary of the Experimental Results.