Diabetic Retinopathy Classiﬁcation Using CNN and Hybrid Deep Convolutional Neural Networks

: Diabetic Retinopathy (DR) is an eye condition that mainly affects individuals who have diabetes and is one of the important causes of blindness in adults. As the infection progresses, it may lead to permanent loss of vision. Diagnosing diabetic retinopathy manually with the help of an ophthalmologist has been a tedious and a very laborious procedure. This paper not only focuses on diabetic retinopathy detection but also on the analysis of different DR stages, which is performed with the help of Deep Learning (DL) and transfer learning algorithms. CNN, hybrid CNN with ResNet, hybrid CNN with DenseNet are used on a huge dataset with around 3662 train images to automatically detect which stage DR has progressed. Five DR stages, which are 0 (No DR), 1 (Mild DR), 2 (Moderate), 3 (Severe) and 4 (Proliferative DR) are processed in the proposed work. The patient’s eye images are fed as input to the model. The proposed deep learning architectures like CNN, hybrid CNN with ResNet, hybrid CNN with DenseNet 2.1 are used to extract the features of the eye for effective classiﬁcation. The models achieved an accuracy of 96.22%, 93.18% and 75.61% respectively. The paper concludes with a comparative study of the CNN, hybrid CNN with ResNet, hybrid CNN with DenseNet architectures that highlights hybrid CNN with DenseNet as the perfect deep learning classiﬁcation model for automated DR detection.


Introduction
People with diabetes are prone to an eye disease called "Diabetic retinopathy".Diabetic retinopathy is considered as a deadly eye condition as it can cause a loss of vision and blindness in people who have diabetes.The very high blood sugar levels cause significant damage to the blood vessels in the retina.Blood vessels in the eye begin to leak fluid causing the macula to swell or thicken, preventing blood from passing through.Sometimes, there is an abnormal growth of new blood vessels on the retina.All of the mentioned conditions can cause permanent loss of vision.Diabetic retinopathy doesn't show up with symptoms at first but eventually can worsen things up by causing vision loss.Diagnosing at an early stage can help oneself save their vision.One might not experience symptoms in the early stages of diabetic retinopathy.It might cause trouble reading or seeing faraway objects.As the infection becomes worse or progresses, the symptoms include: Spots floating in your vision-floaters, an increased number of floaters, cloudy vision, Poor night vision, Fluctuating vision, Impaired color vision-unable to distinguish colors, Dark or empty areas in your vision-shadows cast by specks floating in the eye and Vision loss-complete loss of vision.

Non-Proliferate Diabetic Retinopathy
A person diagnosed with non-proliferate diabetic retinopathy (NPDR) is said to have tiny blood vessels leak that makes the retina swell.Macular edema is a condition where the macula swells is the main cause of loss of vision in diabetic patients.The other condition that can affect vision is macular ischemia-a condition that causes the blood vessels in the retina to close off.This stops blood from reaching the macula leading to the formation of tiny particles called exudates.NPDR can further be classified into three types based on the severity of symptoms.

•
Severe NPDR-Presence of intra retinal hemorrhaging in four quadrants of the eyetwo with venous beading or one with intra retinal micro vascular abnormality.

Proliferative Diabetic Retinopathy
When left untreated, diabetic retinopathy progresses to a more serious stage, called proliferative diabetic retinopathy (PDR).In this type, new blood vessels start growing in the retina at an abnormal pace.Pressure is built in the eyeball when the new blood vessels grown interfere with the fluid flow causing the retina to uncouple itself from behind the eye.Blood also leaks into the jelly like substance present in the center of the eye-vitreous.As a result of the above factors, the optic nerve is damaged, the nerve that passes through the blind spot carrying inverted images from the eye to the brain resulting in vision loss.
For the purpose of speeding up the process and precise predictions CNN approach is developed.CNN has already been applied for effective predictions in various fields like healthcare [1,2] and intelligent automation [3].By understanding its strength, CNN is applied in this work to diagnose diabetic retinopathy from eye images and classify them accurately based on the severity.This system shall diagnose diabetic retinopathy automatically without user intervention.The proposed models are analysed on the publicly available Kaggle dataset [4] to demonstrate its impressive results.
The objectives of the paper are: i.
The automated model for diabetic retinopathy detection have proven to be time saving and also efficient as compared to the manual method.Hence a custom CNN model and transfer learning are analysed to automate the process of predicting DR. ii.An enhanced hybrid CNN with DenseNet is developed to detect blood vessels and to efficiently identify the hemorrhages and exudates.iii.The proposed model performed image augmentation to solve the problem of class imbalance inorder to attain a high accuracy.

Related Work
H. Jiang [5] used three deep learning models named Inception V3, ResNet151, Inception-ResNet-V2.They individually performed with an accuracy of 87.91%, 87.20% and 86.18% respectively.When all these models were integrated using the AdaBoost algorithm, it performed with a better accuracy of 88.21%.A filter based retinal vessel extraction method-"fuzzy C means" for exudates detection, "Convex Hull" used for detection and removal of optical disk was proposed by A. Roy and D. Dutta [6].Support Vector Machines (SVM) algorithm was used for the classification of images into NDPR and PDR.An efficacy rate of 91.23% was achieved in the system proposed."AD2Net" which was built by Z. Qian [7], where the main advantage of this system is that it speeds up the diagnosis process and also improved the efficiency of treatment.The "AD2Net" model combines the advantages of ResNet and DenseNet.Further, it uses the attention mechanism method that encourages the model to pay more attention to the useful features so as to improve classification to a considerable extent.This model achieved an accuracy of 83.2%.The paper [8], described a hybrid, deep learning technique called the E-DenseNet model for diagnosing different DR stages.E-DenseNet model is a combination of EyeNet and DenseNet and makes use of transfer learning.By combining the two models and customizing the embedded dense blocks of the EyeNet architecture, the researchers have obtained a lot of benefits.The main advantage of the model was that it could accurately classify images with less time(training) and memory.This model achieved an accuracy of 91.6% and a Kappa score of 0.883.The approach proposed in [9] achieved an area under the curve of 93.4% using "SOFT-MAX BoVW" method.S. Dua [10] focused on a blood vessel detection technique called quadtrees and post-filtration of edges.Anomalies were detected by comparing the information on retinal blood vessel morphology to the diameters of blood vessels in a normal eye.Various fusion techniques have been discussed in [11] to integrate different classifiers to accurately classify images with diabetic retinopathy.This method proved to be advantageous.
An SVM-based kernel combined with a finite mixture of SDD (Scaled Dirichlet Distributions) was developed by Bourouis, Sami [12].The model offered flexibility in classification.Support Vector Machine (SVM) and KNN classifier were used in [13] for the classification of images into the two classes NPDR and PDR by detecting the presence of micro-aneurysms and lesions.It was observed that the SVM algorithm performed better than the KNN algorithm.A. P. Bhatkar and G. U. Kharat [14] focused on building a Multi-Layer Perception Neural Network to detect diabetic retinopathy in retinal images.The classifier classified retinal images into two categories (DR and No DR) using a feature vector formed with the help of Discrete Cosine Transform (DCT) but couldn't predict the severity of diabetic retinopathy.A comparative study between two CNN architectures-DenseNet and VGG16 was made in [15].It was observed that the DenseNet model performed with an accuracy of 96.11%.In [16], back ground subtraction methodology was used to detect lesions and de-correlation stretch based method was used to remove falsely detected lesion.When tested on the DiaretDB database, the algorithm performed with a sensitivity of 0.87 and F-Score of 0.78.To detect the stage of diabetic retinopathy, one must detect exudates and microaneurysms.
Prasad et al. [17] used various morphological and segmentation techniques to detect blood vessels, exudates and microaneurysms.The image was divided into four sub images.Haar wavelet transformations are applied on the features extracted.Techniques like principal component analysis and linear discriminant analysis were applied to select important features.Back propagation neural network was used to classify the images as diabetic or non-diabetic.Deep learning models like Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) were used in [18].CNN was used to detect lesions and the LSTM was used to generate descriptive sentences based on those lesions.The output from the CNN was fed as input to LSTM.This algorithm achieved an accuracy of 90%. A. T. Nagi [19] proposed a novel technique, the two stage classifier.It is an ensemble technique that combines various machine learning algorithms for classification.It was observed that it performed better in terms of parallelism and accuracy.
Jayant Yadav [20] has used computer vision to detect diabetic retinopathy and has also used neural networks to provide satisfactory results.In [21], principal component analysis based deep neural network model using Grey Wolf Optimization (GWO) algorithm was used to classify the extracted features of diabetic retinopathy dataset.D. Jude Hemanth [22] has proposed a Modified Hopfield Neural Network (MHNN) model to classify abnormalities in retinal images.Unlike the traditional algorithm, the weights in the proposed system keeps changing.A Multipath Convolutional Neural Network (M-CNN) was used for the extraction of features from images in [23].The above algorithm (M-CNN) gave better accuracy with the J48 classifier.A comparative study between various algorithms like SVM, AlexNet, VGG16 and LSTM was made in [24] and LSTM proved to give comparatively more accurate results.

Convolutional Neural Network
A convolutional neural network is one that follows the feed forward mechanism and is generally used to analyze images.It is quite useful for object detection and classification.In CNN, every image is represented as an array of pixels.The convolution operation is the main operation performed and it forms the basis of the convolutional neural network.The layers present in a deep CNN network are detailed as follows: Convolutional layer: It is the first layer of every CNN model [25].A convolution layer has several filters of size MxM each of which perform convolution operation.The dot product is taken between the filter and the parts covered by the filter on the input image by sliding the filter over the input image.The resulting output is called as the feature map.Through this process, corners and edges could be identified.Later on, this feature map is carried forward and is fed to other layers to learn other complicated features which weren't identified initially.
Pooling layer: The convolution layer is followed by the pooling layer [25].This layer is introduced to reduce the size of the feature map.This in turn reduces the computational costs.There are a number of pooling operations.Max pooling takes the largest element from the portion of the feature map overlapped by the filter.The average of elements overlapped by the filter is calculated in the method of average pooling.This layer acts as a bridge between the convolution layer and the fully connected layers.An example of max pooling layer with sample values are shown in Figure 1: In CNN, every image is represented as an array of pixels.The convolution operation is the main operation performed and it forms the basis of the convolutional neural network.The layers present in a deep CNN network are detailed as follows: Convolutional layer: It is the first layer of every CNN model [25].A convolution layer has several filters of size MxM each of which perform convolution operation.The dot product is taken between the filter and the parts covered by the filter on the input image by sliding the filter over the input image.The resulting output is called as the feature map.Through this process, corners and edges could be identified.Later on, this feature map is carried forward and is fed to other layers to learn other complicated features which weren't identified initially.
Pooling layer: The convolution layer is followed by the pooling layer [25].This layer is introduced to reduce the size of the feature map.This in turn reduces the computational costs.There are a number of pooling operations.Max pooling takes the largest element from the portion of the feature map overlapped by the filter.The average of elements overlapped by the filter is calculated in the method of average pooling.This layer acts as a bridge between the convolution layer and the fully connected layers.An example of max pooling layer with sample values are shown in Figure 1: Fully Connected Layer: The Fully Connected (FC) layer is made up of weights, biases and neurons.It acts as a bridge and connects the neurons that are present between the hidden layers and the output layers.These layers are placed just a few layers before the output layer.The feature map from the previous layer is flattened to create a single long feature vector.The flattened vector is passed onto to other fully connected layers wherein many mathematical operations are performed on the vector.The classification process begins in this stage.An example of flattening process that converts the features to single dimension is shown in Figure 2. Dropout Layer: Overfitting is caused when all the features are connected to the fully connected layers.Overfitting occurs when a model is trained so well that it fails miserably on the test data.To solve this issue of overfitting, a dropout layer is introduced.In order to reduce the size along with the complexity of the model, neurons are dropped according to the parameter passed for dropout rate.In the proposed model, a dropout rate to 0.2 is set for attaining better accuracy.This implies that 20% of the neurons are dropped during the training process.
Activation function: The activation function is considered to be one of the necessary parameters of the CNN model.They are used to understand the correlation existing between the input variables of the network.It is the activation function that makes decisions on what information obtained should be carried forward and also differentiates it from Fully Connected Layer: The Fully Connected (FC) layer is made up of weights, biases and neurons.It acts as a bridge and connects the neurons that are present between the hidden layers and the output layers.These layers are placed just a few layers before the output layer.The feature map from the previous layer is flattened to create a single long feature vector.The flattened vector is passed onto to other fully connected layers wherein many mathematical operations are performed on the vector.The classification process begins in this stage.An example of flattening process that converts the features to single dimension is shown in Figure 2.
In CNN, every image is represented as an array of pixels.The convolution operation is the main operation performed and it forms the basis of the convolutional neural network.The layers present in a deep CNN network are detailed as follows: Convolutional layer: It is the first layer of every CNN model [25].A convolution layer has several filters of size MxM each of which perform convolution operation.The dot product is taken between the filter and the parts covered by the filter on the input image by sliding the filter over the input image.The resulting output is called as the feature map.Through this process, corners and edges could be identified.Later on, this feature map is carried forward and is fed to other layers to learn other complicated features which weren't identified initially.
Pooling layer: The convolution layer is followed by the pooling layer [25].This layer is introduced to reduce the size of the feature map.This in turn reduces the computational costs.There are a number of pooling operations.Max pooling takes the largest element from the portion of the feature map overlapped by the filter.The average of elements overlapped by the filter is calculated in the method of average pooling.This layer acts as a bridge between the convolution layer and the fully connected layers.An example of max pooling layer with sample values are shown in Figure 1: Fully Connected Layer: The Fully Connected (FC) layer is made up of weights, biases and neurons.It acts as a bridge and connects the neurons that are present between the hidden layers and the output layers.These layers are placed just a few layers before the output layer.The feature map from the previous layer is flattened to create a single long feature vector.The flattened vector is passed onto to other fully connected layers wherein many mathematical operations are performed on the vector.The classification process begins in this stage.An example of flattening process that converts the features to single dimension is shown in Figure 2. Dropout Layer: Overfitting is caused when all the features are connected to the fully connected layers.Overfitting occurs when a model is trained so well that it fails miserably on the test data.To solve this issue of overfitting, a dropout layer is introduced.In order to reduce the size along with the complexity of the model, neurons are dropped according to the parameter passed for dropout rate.In the proposed model, a dropout rate to 0.2 is set for attaining better accuracy.This implies that 20% of the neurons are dropped during the training process.
Activation function: The activation function is considered to be one of the necessary parameters of the CNN model.They are used to understand the correlation existing between the input variables of the network.It is the activation function that makes decisions on what information obtained should be carried forward and also differentiates it from Dropout Layer: Overfitting is caused when all the features are connected to the fully connected layers.Overfitting occurs when a model is trained so well that it fails miserably on the test data.To solve this issue of overfitting, a dropout layer is introduced.In order to reduce the size along with the complexity of the model, neurons are dropped according to the parameter passed for dropout rate.In the proposed model, a dropout rate to 0.2 is set for attaining better accuracy.This implies that 20% of the neurons are dropped during the training process.
Activation function: The activation function is considered to be one of the necessary parameters of the CNN model.They are used to understand the correlation existing between the input variables of the network.It is the activation function that makes decisions on what information obtained should be carried forward and also differentiates it from the ones that aren't very useful.It contributes for the non-linearity of the network and can therefore making the model learn complex functions.Examples of activation functions include: ReLU, softmax, tanH and sigmoid functions each designed for a specific purpose.Sigmoid and softmax functions are used for a binary classification whereas softmax is preferred for datasets that contain more than two classes.

Transfer Learning
Transfer learning is an advanced concept in the domain of machine learning method.In this method, a model that is already trained for a particular task is again used as the starting point for the second model.It is commonly used along with deep learning architectures where the weights from the pre-trained models are used as the starting point.This concept is mainly used in computer vision and image processing tasks.There are two popular approaches: develop model approach and the pre-trained model approach.The proposed work included two hybrid models namely CNN with ResNet and CNN with DenseNet in diabetic retinopathy classification.
Pre-Trained Model Approach: Source Model: A pre-trained model is chosen from available pool of candidate models.

Reuse Model:
The pre-trained model can then be used as the starting point for a second model that requires training.Depending on the requirements and the techniques used, one can choose to use all or parts of the model.
Tune Mode: The model might demand to be tuned or refined slightly to accommodate for the output labels in the dataset that we are trying to use.
Advantages of transfer learning: 1.
Higher start: The skill present initially on the source model is higher as compared to the model where transfer learning isn't used.2.
Higher slope: The skill improves at a faster rate while training the model which means that the performance is better.

3.
Higher asymptote: The convergence of skill is better than the one that doesn't make use of transfer learning.

DenseNet
DenseNet aims at making the deep learning networks go deeper and deeper.It is also taken care that the training process is efficient by ensuring that short connections are used between the layers.Each layer in the model is connected to every other layer.As all the layers are connected to each other, flow of information is maximum between the layers of the network.The feed-forward mechanism of the neural network is preserved by seeing that each layer has obtained its inputs from all the preceding layers.The feature map of each layer is passed on to all the layers.DenseNet concatenates features whereas ResNets combine features through summation.So, the "ith" layer has "I" inputs and consists of concatenated feature maps obtained from the blocks beneath it.The feature maps obtained from the "ith" layer are carried forward to the remaining "I-I" layers where "I" is the total number of layers [26].As all layers are strongly connected, there is a total of "(I (I + 1))/2" connections in the network unlike the traditional deep learning architectures.
The two important blocks of a DenseNet network are the Dense Blocks and the Transition layers.DenseNet begins with a convolution and pooling layer followed by a dense block which in turn is followed by a transition layer.Every dense block is followed by a transition layer.Finally, there is a dense layer followed by a classification layer where classification sets in.
Each convolutional block follows a particular sequence: Batch Normalization -> ReLU activation -> Conv2D layer.X0, X1, X2, X3 are the feature maps taken into consideration.They are transformed into X4 by performing the processes as shown in the Figure 3. where k is the growth rate.The size of a DenseNet is smaller than ResNet.3. Low complexity features: In DenseNet, features of all complexity levels are used.This gives smooth decision boundaries.This is the reason for DenseNet performing well even when training data is insufficient.

Proposed System
Image augmentation is a technique that applies optical and grid distortion, piecewise affine transform, horizontal and vertical flip, rotation through an angle, linear shift, random scale, shift from one color space to another, bringing uniformity in brightness and contrast, additive gaussian noise, blurring, smoothing and sharpening, grey scaling to create robustness by creating new images instead of just over sampling the ones already present.This was done to make the model generalize and classify without over fitting on new data which was generated by flipping, cropping and padding using the kerasImageDat-aGenerator class.Image resizing and cropping were used to expose the other intricate features of the eye images apart from microaneurysm.The problem of class imbalance is solved by using image augmentation.Transfer learning wherein the weights obtained from a pre-trained model are used to train data where CNN with DenseNet and CNN with ResNet are the models used.The trained model will then classify the fed image into one of the five categories (no DR, mild DR, moderate DR, severe DR, proliferate DR).Low complexity features: In DenseNet, features of all complexity levels are used.This gives smooth decision boundaries.This is the reason for DenseNet performing well even when training data is insufficient.

Proposed System
Image augmentation is a technique that applies optical and grid distortion, piecewise affine transform, horizontal and vertical flip, rotation through an angle, linear shift, random scale, shift from one color space to another, bringing uniformity in brightness and contrast, additive gaussian noise, blurring, smoothing and sharpening, grey scaling to create robustness by creating new images instead of just over sampling the ones already present.This was done to make the model generalize and classify without over fitting on new data which was generated by flipping, cropping and padding using the kerasImage-DataGenerator class.Image resizing and cropping were used to expose the other intricate features of the eye images apart from microaneurysm.The problem of class imbalance is solved by using image augmentation.Transfer learning wherein the weights obtained from a pre-trained model are used to train data where CNN with DenseNet and CNN with ResNet are the models used.The trained model will then classify the fed image into one of the five categories (no DR, mild DR, moderate DR, severe DR, proliferate DR). Figure 4 shows the overview of the proposed Diabetic Retinopathy Classification Framework.A comparative study of models with and without transfer learning is performed to show which model outperformed the same.

Image Pre Processing
Taking a look at one image from each class, we see that the blood vessels, exudates and cotton wool spots that are the important features for classification aren't highlighted.It is also observed that the images are of different size, contrast and brightness.Image preprocessing becomes a must to solve these issues.Figure 5 shows a peek into one image from each class before pre-processing To highlight the features and also to bring uniformity in contrast and brightness, various image processing techniques like cropping, resizing, applying masks, image smoothing and blending were applied.Wiener filter provides an optimal tradeoff in restoring the effects of unequal brightness and contrast as well as noise smoothing in the DR images [27].a look at the same images in each class, it is seen that the features are visible clearly.The unwanted features are cropped off. Figure 6 gives a peek into one image from each class after pre-processing.

Image Pre Processing
Taking a look at one image from each class, we see that the blood vessels, exudates and cotton wool spots that are the important features for classification aren't highlighted.It is also observed that the images are of different size, contrast and brightness.Image pre-processing becomes a must to solve these issues.Figure 5 shows a peek into one image from each class before pre-processing.

Image Pre Processing
Taking a look at one image from each class, we see that the blood vessels, exudates and cotton wool spots that are the important features for classification aren't highlighted.It is also observed that the images are of different size, contrast and brightness.Image preprocessing becomes a must to solve these issues.Figure 5 shows a peek into one image from each class before pre-processing To highlight the features and also to bring uniformity in contrast and brightness, various image processing techniques like cropping, resizing, applying masks, image smoothing and blending were applied.Wiener filter provides an optimal tradeoff in restoring the effects of unequal brightness and contrast as well as noise smoothing in the DR images [27].a look at the same images in each class, it is seen that the features are visible clearly.The unwanted features are cropped off. Figure 6 gives a peek into one image from each class after pre-processing.To highlight the features and also to bring uniformity in contrast and brightness, various image processing techniques like cropping, resizing, applying masks, image smoothing and blending were applied.Wiener filter provides an optimal tradeoff in restoring the effects of unequal brightness and contrast as well as noise smoothing in the DR images [27].a look at the same images in each class, it is seen that the features are visible clearly.The unwanted features are cropped off. Figure 6 gives a peek into one image from each class after pre-processing.

Convolutional Nueral Network Model
The size of each input image is (50, 50, 3).Keras generally processes images in batches of fixed size.So, an extra dimension is added for this purpose.Since batch size is treated as a variable, the value changes depending on the size of the dataset.So, its size is represented by None.Therefore, the input shape becomes (None, 50, 50, 3).Convolving a (50, 50) image with a (2, 2) filter, with strides and dilation rate of 1, and 'same' padding, results in an output of size (50, 50).Since there are 16 such filters, the output shap becomes (50, 50, 16).The MaxPooling layer with stride as 2 takes the output of the convolution layer as input.The pooling layer divides the size of the image by 2 thus leaving the output shape of this layer to (25,25,16).This pattern can be extended to all Conv2D and MaxPooling layers.The Flatten layer converts the pixels into a long one-dimensional vector.Therefore, an input of (6, 6, 64) is flattened to (6 * 6 * 64) resulting 2304 parameters.The number of parameters for a Conv2D layer is given by the following Equation ( 1 Figure 7 shows the model summary of the created CNN network without any transfer learning.

Convolutional Nueral Network Model
The size of each input image is (50, 50, 3).Keras generally processes images in batches of fixed size.So, an extra dimension is added for this purpose.Since batch size is treated as a variable, the value changes depending on the size of the dataset.So, its size is represented by None.Therefore, the input shape becomes (None, 50, 50, 3).Convolving a (50, 50) image with a (2, 2) filter, with strides and dilation rate of 1, and 'same' padding, results in an output of size (50, 50).Since there are 16 such filters, the output shap becomes (50, 50, 16).The MaxPooling layer with stride as 2 takes the output of the convolution layer as input.The pooling layer divides the size of the image by 2 thus leaving the output shape of this layer to (25,25,16).This pattern can be extended to all Conv2D and MaxPooling layers.The Flatten layer converts the pixels into a long one-dimensional vector.Therefore, an input of (6, 6, 64) is flattened to (6 * 6 * 64) resulting 2304 parameters.The number of parameters for a Conv2D layer is given by the following Equation ( 1): parameter = (kernel height * kernel width * input channel * output channel) + (output channels that use bias) (1)   Figure 7 shows the model summary of the created CNN network without any transfer learning.

Convolutional Nueral Network Model
The size of each input image is (50, 50, 3).Keras generally processes images in batches of fixed size.So, an extra dimension is added for this purpose.Since batch size is treated as a variable, the value changes depending on the size of the dataset.So, its size is represented by None.Therefore, the input shape becomes (None, 50, 50, 3).Convolving a (50, 50) image with a (2, 2) filter, with strides and dilation rate of 1, and 'same' padding, results in an output of size (50, 50).Since there are 16 such filters, the output shap becomes (50, 50, 16).The MaxPooling layer with stride as 2 takes the output of the convolution layer as input.The pooling layer divides the size of the image by 2 thus leaving the output shape of this layer to (25,25,16).This pattern can be extended to all Conv2D and MaxPooling layers.The Flatten layer converts the pixels into a long one-dimensional vector.Therefore, an input of (6, 6, 64) is flattened to (6 * 6 * 64) resulting 2304 parameters.The number of parameters for a Conv2D layer is given by the following Equation ( 1 Figure 7 shows the model summary of the created CNN network without any transfer learning.The CNN model is trained for 10 epochs with 46 observations in each epoch.Accuracy and loss for each epoch is calculated.After the training process is done, the model is tested on the test images.A confusion matrix with true positives, true negatives, false positives and false negatives is created.Figure 8 shows the obtained confusion matrix.The accuracy of this CNN model is found to be 75.61% which is not very impressive.
Symmetry 2022, 14, x FOR PEER REVIEW 9 of 13 tested on the test images.A confusion matrix with true positives, true negatives, false positives and false negatives is created.Figure 8 shows the obtained confusion matrix.The accuracy of this CNN model is found to be 75.61% which is not very impressive.

CNN with ResNet Model
The model summary of the hybrid CNN with ResNet neural network can be viewed in Figure 9.The model is trained on a dataset of size 3662 for 20 epochs with 366 observations in each epoch, which results in accuracy of 93.18%.The confusion matrix was also generated to show the accuracy obtained of the proposed analysis.

with DenseNet Model
The model summary of the hybrid CNN with DenseNet neural network can be viewed in Figure 10.

CNN with ResNet Model
The model summary of the hybrid CNN with ResNet neural network can be viewed in Figure 9.The model is trained on a dataset of size 3662 for 20 epochs with 366 observations in each epoch, which results in accuracy of 93.18%.The confusion matrix was also generated to show the accuracy obtained of the proposed analysis.
Symmetry 2022, 14, x FOR PEER REVIEW 9 of 13 tested on the test images.A confusion matrix with true positives, true negatives, false positives and false negatives is created.Figure 8 shows the obtained confusion matrix.The accuracy of this CNN model is found to be 75.61% which is not very impressive.

CNN with ResNet Model
The model summary of the hybrid CNN with ResNet neural network can be viewed in Figure 9.The model is trained on a dataset of size 3662 for 20 epochs with 366 observations in each epoch, which results in accuracy of 93.18%.The confusion matrix was also generated to show the accuracy obtained of the proposed analysis.

CNN with DenseNet Model
The model summary of the hybrid CNN with DenseNet neural network can be viewed in Figure 10.

CNN with DenseNet Model
The model summary of the hybrid CNN with DenseNet neural network can be viewed in Figure 10.
Transfer learning is applied and the model is built using DenseNet-121.The dropout rate is set to 0.

CNN with DenseNet Model
The model summary of the hybrid CNN with DenseNet neural network can be viewed in Figure 10.Transfer learning is applied and the model is built using DenseNet-121.The dropout rate is set to 0.  Transfer learning is applied and the model is built using DenseNet-121.The dropout rate is set to 0.  Transfer learning is applied and the model is built using DenseNet-121.The dropout rate is set to 0.  Transfer learning is applied and the model is built using DenseNet-121.The dropout rate is set to 0.   CNN with ResNet ACC: 93.18%

Comparison of Performance
Diabetic retinopathy disease is predicted using deep learning, machine learning and image processing algorithms.Various functional classifiers were used.To improve the brightness and contrast of the images, image processing and augmentation techniques were used.The comparison and analysis of the performance is done by tabulating the accuracies obtained by implementing various methods.The accuracy obtained on using deep learning classifiers after applying transfer learning techniques gives a better accuracy for the prediction of diabetic retinopathy that has minimal number of records.The accuracies obtained compared with the existing research work are shown in Table 2.

Conclusions
The goal of the work is to analyze various machine learning algorithms and deep learning to detect diabetic retinopathy along with its stage.Extensive image processing has helped highlighting the exudates, blood vessels and the cotton wool spots.. Based on the analysis and comparison of various methods, it can be concluded that deep learning algorithms along with transfer learning has a huge scope in predicting diabetic retinopathy.Traditional machine learning classifiers like Support vector machine (SVM), Decision tree (DT), Naïve Bayes (NB), Random Forest (RF) have failed to classify images accurately.Though CNN worked better than the traditional algorithms, it was only when transfer learning algorithms like ResNet and DenseNet were used, the desired accuracy was achieved without any overfitting.Therefore, this model built using custom CNN with pretrained models along with proper image processing, image augmentation achieved predicting the presence of diabetic retinopathy.This hybrid CNN with DenseNet attained an

Comparison of Performance
Diabetic retinopathy disease is predicted using deep learning, machine learning and image processing algorithms.Various functional classifiers were used.To improve the brightness and contrast of the images, image processing and augmentation techniques were used.The comparison and analysis of the performance is done by tabulating the accuracies obtained by implementing various methods.The accuracy obtained on using deep learning classifiers after applying transfer learning techniques gives a better accuracy for the prediction of diabetic retinopathy that has minimal number of records.The accuracies obtained compared with the existing research work are shown in Table 2. Also, the pre-trained architecture DenseNet network outperformed other state-of-theart networks like AdaBoost algorithm (combination of Inception V3, ResNet151, Inception-ResNet-V2) (accuracy 88.21%), CNN with LSTM (accuracy 90%), E-DenseNet (accuracy 91.6%) and proposed CNN with DenseNet model (accuracy 96.22%).Our proposed CNN with DenseNet architecture is simultaneously very simple, accurate, and efficient concerning computational time.

Conclusions
The goal of the work is to analyze various machine learning algorithms and deep learning to detect diabetic retinopathy along with its stage.Extensive image processing has helped highlighting the exudates, blood vessels and the cotton wool spots.Based on the analysis and comparison of various methods, it can be concluded that deep learning algorithms along with transfer learning has a huge scope in predicting diabetic retinopathy.Traditional machine learning classifiers like Support vector machine (SVM), Decision tree (DT), Naïve Bayes (NB), Random Forest (RF) have failed to classify images accurately.Though CNN worked better than the traditional algorithms, it was only when transfer learning algorithms like ResNet and DenseNet were used, the desired accuracy was achieved without any overfitting.Therefore, this model built using custom CNN with pre-trained models along with proper image processing, image augmentation achieved predicting the presence of diabetic retinopathy.This hybrid CNN with DenseNet attained an accuracy of 96.22% outperforming the others.With the help of the model developed, doctors can suggest preventive measures at a much earlier stage which would in fact prevent people from losing their eye sight.

Future Enhancement
The fact is that in future there will be advancements of technology in medical field.In the present time, people cannot get medication and treatment at an early stage.When prediction of diseases is automated, time will be minimized so that people can take preventive measures beforehand.In the future the parameters of the algorithms can be finely tuned to get better results and the accuracy of the model can be improved using other efficient optimization techniques.Also, it is planned to apply these methods for predicting some other diseases like brain tumor detection which also requires clinicians to study the scanned report of the brain.These deep learning algorithms could be integrated with electronic health record systems in clinics.This would ease the burden of the doctors.The project can be enhanced with the user interface implementation for making it available in real time for the users.

Figure 1 .
Figure 1.Max pooling layer of the CNN.

Figure 1 .
Figure 1.Max pooling layer of the CNN.

Figure 1 .
Figure 1.Max pooling layer of the CNN.

13 Figure 3 .
Figure 3. Composition Layer resulting in the transformation of features.Advantages of DenseNet: 1.Strong Gradient flow: The propagation of error is easier down the lane of the Dense-Net neural network.This is because the earlier layers are directly connected to the final classification layer.2. Parameters: Number of parameters in DenseNet is directly proportional to l × k × kwhere k is the growth rate.The size of a DenseNet is smaller than ResNet.3. Low complexity features: In DenseNet, features of all complexity levels are used.This gives smooth decision boundaries.This is the reason for DenseNet performing well even when training data is insufficient.
Figure 4 shows the overview of the proposed Diabetic Retinopathy Classification Framework.A comparative study of models with and without transfer learning is performed to show which model outperformed the same.

Figure 3 .
Figure 3. Composition Layer resulting in the transformation of features.Advantages of DenseNet:1.Strong Gradient flow: The propagation of error is easier down the lane of the DenseNet neural network.This is because the earlier layers are directly connected to the final classification layer.2.Parameters: Number of parameters in DenseNet is directly proportional to l × k × k where k is the growth rate.The size of a DenseNet is smaller than ResNet.3.Low complexity features: In DenseNet, features of all complexity levels are used.This gives smooth decision boundaries.This is the reason for DenseNet performing well even when training data is insufficient.

Figure 5 .
Figure 5. Image from each class before pre-processing.

Figure 5 .
Figure 5. Image from each class before pre-processing.

Figure 5 .
Figure 5. Image from each class before pre-processing.

Figure 6 .
Figure 6.Image from each class after pre-processing.

Figure 7 .
Figure 7. Model summary of the CNN Network.The CNN model is trained for 10 epochs with 46 observations in each epoch.Accuracy and loss for each epoch is calculated.After the training process is done, the model is

Figure 6 .
Figure 6.Image from each class after pre-processing.

Figure 7 .
Figure 7. Model summary of the CNN Network.The CNN model is trained for 10 epochs with 46 observations in each epoch.Accuracy and loss for each epoch is calculated.After the training process is done, the model is

Figure 7 .
Figure 7. Model summary of the CNN Network.

Figure 8 .
Figure 8. Confusion matrix of the proposed CNN architecture.

Figure 9 .
Figure 9. Accuracy after training the hybrid CNN with ResNet model for 20 epochs.

Figure 10 .
Figure 10.Model summary of the hybrid CNN with DenseNet Network.

Figure 8 .
Figure 8. Confusion matrix of the proposed CNN architecture.

Figure 8 .
Figure 8. Confusion matrix of the proposed CNN architecture.

Figure 9 .
Figure 9. Accuracy after training the hybrid CNN with ResNet model for 20 epochs.

Figure 10 .
Figure 10.Model summary of the hybrid CNN with DenseNet Network.

Figure 9 .
Figure 9. Accuracy after training the hybrid CNN with ResNet model for 20 epochs.
5 and so 50% of the neurons are dropped.The model is trained for 15 epochs with 97 observations in each epoch.On training the model on a dataset of size 3662 for 15 epochs with 97 observations in each epoch, accuracy on the validation set was found to be 96.22%.The models is trained with almost 6,958,981 trainable parameters and 83,648 non trainable parameters.The model was tested in the dataset of size 1928.The model classified the fed image into one of the five categories (No DR, Mild DR, Moderate DR, Severe DR, Proliferate DR).Few predictions made by the model on the testing dataset is given in Figure 11.The learning parameters with respect to loss and accuracy of the proposed CNN, hybrid CNN with ResNet and hybrid CNN with DenseNet is shown in the Table 1.

Figure 9 .
Figure 9. Accuracy after training the hybrid CNN with ResNet model for 20 epochs.

Figure 10 .
Figure 10.Model summary of the hybrid CNN with DenseNet Network.Figure 10.Model summary of the hybrid CNN with DenseNet Network.

Figure 10 .
Figure 10.Model summary of the hybrid CNN with DenseNet Network.Figure 10.Model summary of the hybrid CNN with DenseNet Network.

Table 1 .
Plot between Epoch vs. Loss and Epoch vs. Accuracy for CNN, CNN with DenseNet and CNN with ResNet models.

Table 1 .
Plot between Epoch vs. Loss and Epoch vs. Accuracy for CNN, CNN with DenseNet and CNN with ResNet models.

Table 1 .
Plot between Epoch vs. Loss and Epoch vs. Accuracy for CNN, CNN with DenseNet and CNN with ResNet models.

Table 1 .
Plot between Epoch vs. Loss and Epoch vs. Accuracy for CNN, CNN with DenseNet and CNN with ResNet models.

Table 1 .
Plot between Epoch vs. Loss and Epoch vs. Accuracy for CNN, CNN with DenseNet and CNN with ResNet models.

Table 1 .
Plot between Epoch vs. Loss and Epoch vs. Accuracy for CNN, CNN with DenseNet and CNN with ResNet models.

Table 1 .
Cont.Comparison of the proposed with existing research works.
parison of Performanceiabetic retinopathy disease is predicted using deep learning, machine learning and processing algorithms.Various functional classifiers were used.To improve the ness and contrast of the images, image processing and augmentation techniques used.The comparison and analysis of the performance is done by tabulating the cies obtained by implementing various methods.The accuracy obtained on using earning classifiers after applying transfer learning techniques gives a better accuor the prediction of diabetic retinopathy that has minimal number of records.The cies obtained compared with the existing research work are shown in Table2.2.lped highlighting the exudates, blood vessels and the cotton wool spots.. Based on alysis and comparison of various methods, it can be concluded that deep learning thms along with transfer learning has a huge scope in predicting diabetic retinoparaditional machine learning classifiers like Support vector machine (SVM), Decision T), Naïve Bayes (NB), Random Forest (RF) have failed to classify images accurately.h CNN worked better than the traditional algorithms, it was only when transfer ng algorithms like ResNet and DenseNet were used, the desired accuracy was ed without any overfitting.Therefore, this model built using custom CNN with pred models along with proper image processing, image augmentation achieved preg the presence of diabetic retinopathy.This hybrid CNN with DenseNet attained an Symmetry 2022, 14, x FOR PEER REVIEW 11 of 13

Table 2 .
Comparison of the proposed with existing research works.

Table 2 .
Comparison of the proposed with existing research works.