Deep Learning for the Detection and Classification of Diabetic Retinopathy with an Improved Activation Function

Diabetic retinopathy (DR) is an eye disease triggered due to diabetes, which may lead to blindness. To prevent diabetic patients from becoming blind, early diagnosis and accurate detection of DR are vital. Deep learning models, such as convolutional neural networks (CNNs), are largely used in DR detection through the classification of blood vessel pixels from the remaining pixels. In this paper, an improved activation function was proposed for diagnosing DR from fundus images that automatically reduces loss and processing time. The DIARETDB0, DRIVE, CHASE, and Kaggle datasets were used to train and test the enhanced activation function in the different CNN models. The ResNet-152 model has the highest accuracy of 99.41% with the Kaggle dataset. This enhanced activation function is suitable for DR diagnosis from retinal fundus images.


Introduction
Blood sugar levels that are abnormal in the human body accumulate in blood vessels as glucose is converted into energy. Diabetic retinopathy (DR) develops when a patient has had diabetes for more than ten years. DR occurs due to high blood pressure and causes damage to the retina, and it damages the retinal vascularization, which may cause blindness and death. Ophthalmologists can only observe retinal vascular swelling by conducting fundoscopy tests, but these are time-consuming and expensive. By 2030, there are estimated to be 552 million diabetic patients worldwide, and DR is a leading cause of blindness [1,2].
Detecting and treating visual loss early is the key to preventing visual loss [3]. In severe cases, the vessels swell, leak fluid, or block blood vessels, which results in abnormal blood vessel growth and complete blindness. Microaneurysms, hemorrhages, and exudates are the main symptoms of DR on the retina. A lesion's shape, size, and overall appearance determine its severity. Fundus photography is an ophthalmologic screening method for DR [4]. Preventing diabetes-related blindness is clinically effective and cost-effective with an automated assessment technique [5].
Ophthalmologists diagnose the presence and the severity of the DR through a visual assessment by direct examination and evaluation of the eyes. For large numbers of diabetic patients globally, this process is expensive and time-consuming [6]. DR severity and early diagnosis of the disease remain a challenge, with statistics among trained ophthalmologists varying substantially [7]. Moreover, 75% of DR patients live in underdeveloped regions where sufficient ophthalmologists and the infrastructure for detection are unavailable [8]. Global screening activities have been created to counter the proliferation of preventable eye diseases, but DR exists at too large a scale to detect and treat DR efficiently on an individual basis.
DR occurs due to high blood pressure and causes retina damage. It damages retinal vascularization, which may cause blindness and even death. Ophthalmologists can only observe retinal vascular swelling by conducting fundoscopy tests, but these are timeconsuming and expensive. There is a need to automatically identify DR by examining retinal fundus images. It is reported that deep learning models are a practical approach for DR detection, which can better identify DR compared to ophthalmologists [9].
The convolutional neural network (CNN) is one of the main models of deep learning used to detect, predict, and classify medical images. This study aims to automatically detect DR by implementing the updated activation function for the CNN model. The proposed new activation function is compared with other activation functions on the publicly available datasets DIARETDB0, DRIVE, CHASE, and Kaggle. The current CNN version has been improved by adding a unique activation function, which provides excellent results.
Our contribution is to identify DR by examining retinal fundus images efficiently and accurately. In addition, the enhanced CNN model will be evaluated and demonstrated for its performance. The proposed model does not require any specialized, inaccessible, or costly equipment to grade the fundus images; it can be run on a PC or laptop with average processors. In addition to detection and classification, the proposed model accurately visualizes abnormal regions in the fundus images enabling a clinical review and verification of the automated diagnosis. Microaneurysm detection is difficult for ophthalmologists because of its small appearance.

Research Background
Millions of individuals worldwide experience vision impairment without proper predictive diagnosis and eye care. To address the shortfalls of the current diagnostic task, an automated solution for retinal disease diagnosis from fundus images is proposed [10]. This technique could alleviate the workloads of trained ophthalmologists, allowing untrained technicians to screen and process DR patients without dependence on clinicians.
Some studies adopted the CNN model with dropout regularization, augmentation, and preprocessing with different datasets and achieved a 94% accuracy [11]. In another study, the CNN model classifies the five-stage DR on the publicly available dataset and achieved high specificity and low sensitivity [12]. Three networks categorize the DR images as normal or abnormal and referable or nonreferrable DR. The first network implements the Inception model; the second recognizes the lesions, and the third network crops the DR images [13].
The CNN models, such as Inception V3, Dense 121, Xception, Dense 169, and ResNet 50, can automatically diagnose DR and corresponding phases [14,15]. In [16], the authors highlighted that the VGGNet model has the highest accuracy in DR classification. By adopting the EYEPACS dataset, three additional deep-learning models successfully classified DR [17]. In addition, the other CNN models, namely AlexNet and VGGNet-16, achieved an 83.68% accuracy with DR stage classification that was not explicitly classified [18].
The activation functions in the neural network activate the neurons of the neural network, and these mathematical functions, which are attached to the neurons, decide whether to fire the current neuron. The activation function introduces nonlinearity into the output neurons. A model without the activation function behaves like a linear regression. The activation function transforms the nonlinear input and makes it capable to learn and perform complex datasets with high accuracy. There are many existing activation functions in neural networks, which are further explained in Table 1.
Based on the different activation functions mentioned in the above table, we aim to implement the new activation function for CNN. A performance comparison was carried out with the achieved performance of the proposed new activation function with the other activation functions in the publicly available dataset using DIARETDB0. The goal is to provide a highly effective, low-cost solution to DR detection without depending on clinicians to examine and grade images manually.

Function Definition Equation Limitations
Linear type The final activation function of the last layer is just a linear function of the first layer of the input, and it can be used in the output layer.
Y = x; −∞ to +∞ Nonlinearity is difficult to achieve.

Binary type
The binary classification is used mainly when inputs exceed thresholds, otherwise, outputs are zero.

Swish
It deals with the vanishing gradient problem. It helps in normalizing the output. The output does not saturate to a maximum value, i.e., the gradient does not become zero.

Mish
It is continuously differentiable and nonmonotonic. It is used in the hidden layer.
A fully automated CNN model could process thousands of heterogeneous fundus images accurately for DR detection. In other words, it eliminates the need for resourceintensive manual fundus image analysis across clinical settings and guides high-risk patients to further care. We present an improved activation function-based CNN model applied to the publicly available diabetic retinopathy datasets DIARETDB0, DRIVE, CHASE, and Kaggle diabetic retinopathy.

Dataset
In this study, we used the datasets DIARETDB0, DRIVE, CHASE, and Kaggle. There are 130 images in the DIARETDB0 [19] dataset, 110 of which are used for training, while 20 images are used for testing. In DRIVE [20], we selected 34 images for training and eight for testing from a set of 40 color fundus images. This included 33 images without DR and seven with early DR signs. In CHASE [21], 28 retinal fundus images were used for training, and four were used for testing. There are 88,702 images in the Kaggle [22] dataset. We used 75,397 images for training and 13,305 images for testing. We found 25,810 with no DR, 2443 with mild DR, 5292 with moderate DR, 873 with severe DR, and 708 with proliferative DIARETDB0 consisting of five different classes that are visualized in Figure 1.
for testing from a set of 40 color fundus images. This included 33 images without DR and seven with early DR signs. In CHASE [21], 28 retinal fundus images were used for training, and four were used for testing. There are 88,702 images in the Kaggle [22] dataset. We used 75,397 images for training and 13,305 images for testing. We found 25,810 with no DR, 2443 with mild DR, 5292 with moderate DR, 873 with severe DR, and 708 with proliferative DIARETDB0 consisting of five different classes that are visualized in Figure 1. DR fundus images will be classified into various severity levels with high accuracy in the present study. DR severity can be assessed using an automated model, and the modified CNN architecture increases the accuracy of categorizing diabetic retinopathy. The experimental framework can be observed in Figure 2.

Image Preprocessing
Four datasets, namely DIARETDB0, DRIVE, CHASE, and Kaggle, were considered to classify DR images. The preprocessing phase removes imperfections from retinal images, improves image quality, and allows spatial domain techniques to operate on pixels. In addition to their efficiency in computation, spatial domain techniques require less processing power. The pixel values were directly used as input information in pixel-based approaches. This enhancement technique relies on the grey levels to enhance the highcontrast image produced by the pixel-based approach. To effectively process the image in the next stage, spatial domain techniques were used in the preprocessing phase [23]. To DR fundus images will be classified into various severity levels with high accuracy in the present study. DR severity can be assessed using an automated model, and the modified CNN architecture increases the accuracy of categorizing diabetic retinopathy. The experimental framework can be observed in Figure 2.
seven with early DR signs. In CHASE [21], 28 retinal fundus images were used for training, and four were used for testing. There are 88,702 images in the Kaggle [22] dataset. We used 75,397 images for training and 13,305 images for testing. We found 25,810 with no DR, 2443 with mild DR, 5292 with moderate DR, 873 with severe DR, and 708 with proliferative DIARETDB0 consisting of five different classes that are visualized in Figure 1. DR fundus images will be classified into various severity levels with high accuracy in the present study. DR severity can be assessed using an automated model, and the modified CNN architecture increases the accuracy of categorizing diabetic retinopathy. The experimental framework can be observed in Figure 2.

Image Preprocessing
Four datasets, namely DIARETDB0, DRIVE, CHASE, and Kaggle, were considered to classify DR images. The preprocessing phase removes imperfections from retinal images, improves image quality, and allows spatial domain techniques to operate on pixels. In addition to their efficiency in computation, spatial domain techniques require less processing power. The pixel values were directly used as input information in pixel-based approaches. This enhancement technique relies on the grey levels to enhance the highcontrast image produced by the pixel-based approach. To effectively process the image in the next stage, spatial domain techniques were used in the preprocessing phase [23]. To

Image Preprocessing
Four datasets, namely DIARETDB0, DRIVE, CHASE, and Kaggle, were considered to classify DR images. The preprocessing phase removes imperfections from retinal images, improves image quality, and allows spatial domain techniques to operate on pixels. In addition to their efficiency in computation, spatial domain techniques require less processing power. The pixel values were directly used as input information in pixel-based approaches. This enhancement technique relies on the grey levels to enhance the high-contrast image produced by the pixel-based approach. To effectively process the image in the next stage, spatial domain techniques were used in the preprocessing phase [23]. To improve image quality, fuzzy set type II was applied in the preprocessing step, and the image was fuzzified by the equation given as The upper and the lower ranges of type II fuzzy membership functions were assessed as follows: The upper membership functions Healthcare 2023, 11, 97 5 of 13 The lower membership functions where ∝ is the image color level ranges from 0 to max-1. g max and g min are the maximum and minimum image color levels. The contrast-enhanced image depends on the value of ∝; when ∝ increases, then, the image contrast also increases. If ∝ = 0.9, the lesions are brighter, and the enhanced image has a shadier background. It is possible to achieve these goals with higher values and membership values, and the enhanced image is improved as a result. To find the membership values, Hamacher T co-norm was applied (Equation (4)) where λ = average of the image.

Improved CNN Model Training
To improve the contrast of the retina fundus image, we resized it to 32 × 32 pixels to reduce the complexity of the image. Following feature extraction, the CNN will be trained until convergence, and then the DR classification will be tested to determine its accuracy. Based on lesion detection and segmentation, convolution layers extracted features for correlated tasks and improved DR classification performance [24,25]. Figure 3 shows the improved CNN model architecture.
The upper and the lower ranges of type II fuzzy membership functions were assessed as follows The upper membership functions The lower membership functions where ∝ is the image color level ranges from 0 to max-1. gmax and gmin are the maximum and minimum image color levels. The contrast-enhanced image depends on the value of ∝; when ∝ increases, then, the image contrast also increases.
If ∝ =0.9, the lesions are brighter, and the enhanced image has a shadier background. It is possible to achieve these goals with higher values and membership values, and the enhanced image is improved as a result. To find the membership values, Hamacher T conorm was applied (Equation (4)) where = average of the image.

Improved CNN Model Training
To improve the contrast of the retina fundus image, we resized it to 32 × 32 pixels to reduce the complexity of the image. Following feature extraction, the CNN will be trained until convergence, and then the DR classification will be tested to determine its accuracy. Based on lesion detection and segmentation, convolution layers extracted features for correlated tasks and improved DR classification performance [24,25]. Figure 3 shows the improved CNN model architecture. When training the DR fundus image, it is necessary to adjust the hyperparameters to enhance performance. Layer one of the DR learns the edges of the fundus image, while layer two learns the classification of the fundus image. Using the updated, improved activation function, the max pooling layer reduces overfitting with a kernel size of 3 × 3 and a stride of 1 × 1 on dense layers. By applying the convolution layer to the different spatial positions, each convolution layer generates a single-feature map using backpropagation during training. When training the DR fundus image, it is necessary to adjust the hyperparameters to enhance performance. Layer one of the DR learns the edges of the fundus image, while layer two learns the classification of the fundus image. Using the updated, improved activation function, the max pooling layer reduces overfitting with a kernel size of 3 × 3 and a stride of 1 × 1 on dense layers. By applying the convolution layer to the different spatial positions, each convolution layer generates a single-feature map using backpropagation during training.
By using the average coefficient in the subsampling layer, we trained the bias and weight. Since the CNN has so many free parameters, and because distortion has invariant characteristics, it is suitable for DR classification because of its low computational time during the training phase. For testing, we applied four convolutions and four pooling layers and two fully connected layers with improved activation functions. Several filters with specific coefficient values were employed in every convolution layer, and maximum pooling was used in the pooling layer. By default, the CNN extracts implicit and invariant features of distortion, so the CNN is suitable for DR classification.

Convolution Layer
The fundus image matrix and filter are inputs to the convolution layer. Receptive fields and shared weights are used by CNNs to recognize images. By extracting parts of the fundus image and invoking receptive fields, a convolution layer detects it. Although CNN feature maps share the same weights and biases, the way they are generated differs from application to application, and these shared values represent the same features in fundus images. To extract the features of the fundus images, the activation map was used.

Pooling Layer
A max-pooling layer was applied, which is a nonlinear down-sampling technique that divides the activation map in half and collects the maximum value in each half. This layer removes information in the appropriate areas of the image based on the generated features found in the image. The pooling layer reduces parameters and computation in the network to prevent overfitting.

Activation Function
The proposed improved activation function has more sparsity in the hidden units; by this feature, the CNN can be trained efficiently and compared to the Sigmoid and the remaining activation functions. During the testing phase, we observed more of a loss reduction and a lower processing time than the standard activation functions. The proposed activation and its first derivative are presented in Equation (5). d dx

Fully Connected Layer
A fully connected layer exists after all the convolution and the pooling layers. This layer takes all the neurons from the last pooling layer and converts them into a onedimensional layer. After multiple layers, the final layer is the proposed activation function, followed by the fully connected layer. The properties of the proposed activation function are listed as follows: The proposed improved activation function has more sparsity in the hidden units, and by this feature, the CNN can be trained efficiently compared to the Sigmoid and the remaining activation functions. The improved activation function avoids the saturation conditions and normalizes the input during the training. During the testing phase, the loss and the processing time were reduced more than the standard activation functions. The improved activation function avoids the saturation conditions, and the gradient does not become zero and normalizes the input during the training.

Accuracy Comparison of Different Activation Functions
We tested using various activation functions, such as ReLu, SoftMax, Swish, and Mish, on the diabetic retinopathy DiaretDB0 dataset with epochs 5000, learning rate = 0.01 and batch size 64, and Nadam Optimiser. Experimentation was conducted on different hidden layers with the proposed activation function, Nadam optimizer, and dense layers, with a batch size of 64. As shown in Tables 2-4, we tested the different activation functions with the proposed one in terms of epochs, learning rates, and batch sizes. We implemented the updated activation function using a Keras backend. With the suggested activation function, experiments were conducted on different epoch numbers. With high epochs, the proposed activation function provided the highest accuracy. Even for many epochs, the suggested activation function performs well.  Table 3. Accuracy comparison of proposed function with others on different learning rates.

Activation Function
Learning Rates With a fixed learning rate, the performance of the proposed model was tabulated in terms of accuracy. With a learning rate of le-2, Tanh yields a value of 91%. As a result of setting the learning rate to l × 10 −2 , the ReLU value is 93%. When l × 10 −3 is set for ELU, a 95% accuracy is achieved. SELU recorded 97% for a learning rate of l × 10 −3 , while Sigmoid recorded 91%. The activation function used in this study had an epoch size of 5000, a learning rate of 1 × 10 −2 , and batch sizes of 8, 16, 32, 64, 128, 256, 512, 1024, and 2048 (Table 4). Multiple experiments were conducted with different hyperparameters on the dataset during the training process. The accuracy comparison of different activation functions is displayed in Figure 4. In the diabetic retinopathy model, the proposed activation function gives the most accurate results for the dense layers. Compared to ReLu, LReLu, Sigmoid, and Softplus, Mish and Swish's activation functions provide a near-consistent improvement. With a fixed learning rate, the performance of the proposed model was tabulated in terms of accuracy. With a learning rate of le-2, Tanh yields a value of 91%. As a result of setting the learning rate to l × 10 −2 , the ReLU value is 93%. When l × 10 −3 is set for ELU, a 95% accuracy is achieved. SELU recorded 97% for a learning rate of l × 10 −3 , while Sigmoid recorded 91%.
The activation function used in this study had an epoch size of 5000, a learning rate of 1 × 10 −2 , and batch sizes of 8, 16, 32, 64, 128, 256, 512, 1024, and 2048 (Table 4). Multiple experiments were conducted with different hyperparameters on the dataset during the training process. The accuracy comparison of different activation functions is displayed in Figure 4. In the diabetic retinopathy model, the proposed activation function gives the most accurate results for the dense layers. Compared to ReLu, LReLu, Sigmoid, and Softplus, Mish and Swish's activation functions provide a near-consistent improvement.

CNN Model Performance Evaluations
As mentioned, five-class DR images based on grading were fed to CNN models including Inception-v3, VGG-19, ResNet-50, AlexNet, GoogleNet, SqueezeNet, and ResNet-152. The performance of the enhanced CNN with the activation function was compared to other adopted models. Table 5 presents the assessment of the distinct model performance metrics on four adopted datasets. The proposed model outperforms the others in

CNN Model Performance Evaluations
As mentioned, five-class DR images based on grading were fed to CNN models including Inception-v3, VGG-19, ResNet-50, AlexNet, GoogleNet, SqueezeNet, and ResNet-152. The performance of the enhanced CNN with the activation function was compared to other adopted models. Table 5 presents the assessment of the distinct model performance metrics on four adopted datasets. The proposed model outperforms the others in terms of testing accuracy. The number of layers in VGG-19 is 19; ResNet-50 is 50 layers; SqueezeNet is 18; GoogleNet is 22; AlexNet is 8, and Inception V3 is 48. For the benchmark datasets DiaretDB0, DRIVE, CHASE, and Kaggle, our proposed model had the lowest model loss. Based on the results, the enhanced CNN can detect and classify DR with an appropriate testing loss. Five-class DR images based on grading were fed to CNN models including Inception-v3, VGG-19, ResNet-50, AlexNet, GoogleNet, SqueezeNet, and ResNet-152 using the existing activation functions SELU, ReLu, Sigmoid, and ELU. The performance of the existing activation function over different topologies was compared and tabulated in Table 5. The proposed model outperforms the others in terms of testing accuracy, model loss, and processing time for the Kaggle dataset. Based on the results, the proposed activation function can detect with a low loss and less processing time. Table 6 tabulates the comparison of the loss and the processing time for the proposed activation function with the different CNN models using the different activation functions.
Based on the enhanced CNN, the prediction output reflects the probability and accuracy of the correct predictions. In Figure 5, the ground-truth images are shown along with enhanced CNN predictions. Different activation functions were tested on the DR datasets with epochs 5000, learning rate = 1 × 10 −2 and batch size 64, and Nadam Optimizer. The proposed activation function outperforms existing activation functions with an accuracy of 96.64%, a sensitivity of 97.96%, and a specificity of 98.79% on the DIARETDB0 dataset. In terms of loss, the proposed activation function achieved a reduced loss of 0.0010. Table 4 tabulates the loss values from the experiments using the proposed activation function with the various pre-trained networks on the DIARETDB0, DRIVE, CHASE, and Kaggle datasets. From the experimental results, the ResNet-152 network performs better, achieving 0.0013 in the DIARETDB0 dataset, 0.0015in DRIVE, 0.0017in CHASE, and 0.0010 in the Kaggle dataset.
We compared our proposed model with some existing methodologies on the DiaretDB0 dataset. The proposed activation function achieved the highest 0.93 AUC score compared with existing works [26,27]. On the DRIVE dataset, it achieved a 0.94 AUC score, which is better than the functions described in [28,29]. Additionally, the CHASE dataset had a higher AUC score of 0.97 than [30,31]. The Kaggle dataset achieved a maximum AUC of 0.99, which is more than previous studies [32,33]. On the Kaggle dataset, the proposed activation function achieved the highest accuracy of 99.41%. According to the experimental results, the proposed model has a better accuracy of 99.41%, a sensitivity of 98.28%, and a specificity of 99.94% of the Kaggle dataset. This is followed by AlexNet with an accuracy of 96.27, a sensitivity of 87.64, and a specificity of 96.89, while SqueezeNet has the least accuracy of 87.85. Different activation functions were tested on the DR datasets with epochs 5000, learning rate = 1 × 10 −2 and batch size 64, and Nadam Optimizer. The proposed activation function outperforms existing activation functions with an accuracy of 96.64%, a sensitivity of 97.96%, and a specificity of 98.79% on the DIARETDB0 dataset. In terms of loss, the proposed activation function achieved a reduced loss of 0.0010. Table 4 tabulates the loss values from the experiments using the proposed activation function with the various pretrained networks on the DIARETDB0, DRIVE, CHASE, and Kaggle datasets. From the experimental results, the ResNet-152 network performs better, achieving 0.0013 in the DI-ARETDB0 dataset, 0.0015in DRIVE, 0.0017in CHASE, and 0.0010 in the Kaggle dataset.
We compared our proposed model with some existing methodologies on the Di-aretDB0 dataset. The proposed activation function achieved the highest 0.93 AUC score compared with existing works [26,27]. On the DRIVE dataset, it achieved a 0.94 AUC score, which is better than the functions described in [28,29]. Additionally, the CHASE dataset had a higher AUC score of 0.97 than [30,31]. The Kaggle dataset achieved a maximum AUC of 0.99, which is more than previous studies [32,33]. On the Kaggle dataset, the proposed activation function achieved the highest accuracy of 99.41%. According to the experimental results, the proposed model has a better accuracy of 99.41%, a sensitivity of 98.28%, and a specificity of 99.94% of the Kaggle dataset. This is followed by AlexNet with an accuracy of 96.27, a sensitivity of 87.64, and a specificity of 96.89, while SqueezeNet has the least accuracy of 87.85.

Conclusions
We evaluated the performance of the enhanced CNN model using the DIARETDB0, DRIVE, CHASE, and Kaggle datasets. The image processing-based enhancement was performed using the improved CNN model. The DiarteDB0 dataset resulted in 96.6% classi-

Conclusions
We evaluated the performance of the enhanced CNN model using the DIARETDB0, DRIVE, CHASE, and Kaggle datasets. The image processing-based enhancement was performed using the improved CNN model. The DiarteDB0 dataset resulted in 96.6% classification accuracy; 97.96% sensitivity, 99.5% precision, and 99.1% F1 score; the DRIVE dataset resulted in 97.84% classification accuracy; 98.45% sensitivity, 99.68% precision, and 99.57% F1 score; the CHASE dataset, resulted in 99.05% classification accuracy, 98.45% sensitivity, 99.94% precision, and 99.89% F1 score, and the Kaggle dataset resulted in 99.41% classification accuracy; 98.28% sensitivity, 99.89% precision, and 99.93% F1 score. Using retina images, the proposed model efficiently diagnoses diabetic retinopathy. Comparing the proposed activation with traditional deep learning models, we found that it improved diagnosis and classification performance. Compared to previous classification techniques, the proposed improved activation function in the CNN model improves the accuracy and processing time. Due to the enhanced activation function utilized in the enhanced CNN model, the model's processing time is reduced by approximately 7 ms by avoiding the inseparable classification of nonlinear data. As compared to existing methods, the proposed activation function enhanced the classification of diabetic retinopathy by 99.41%.