You are currently viewing a new version of our website. To view the old version click .
Algorithms
  • Article
  • Open Access

7 March 2023

Detectron2 for Lesion Detection in Diabetic Retinopathy

and
Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), 91054 Erlangen, Germany
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Deep Learning for Healthcare Applications and Analysis

Abstract

Hemorrhages in the retinal fundus are a common symptom of both diabetic retinopathy and diabetic macular edema, making their detection crucial for early diagnosis and treatment. For this task, the aim is to evaluate the performance of two pre-trained and additionally fine-tuned models from the Detectron2 model zoo, Faster R-CNN (R50-FPN) and Mask R-CNN (R50-FPN). Experiments show that the Mask R-CNN (R50-FPN) model provides highly accurate segmentation masks for each detected hemorrhage, with an accuracy of 99.34%. The Faster R-CNN (R50-FPN) model detects hemorrhages with an accuracy of 99.22%. The results of both models are compared using a publicly available image database with ground truth marked by experts. Overall, this study demonstrates that current models are valuable tools for early diagnosis and treatment of diabetic retinopathy and diabetic macular edema.

1. Introduction

Diabetes is a significant contributor to blindness among people aged 20 to 74 in the United States, according to a study conducted by the National Health and Nutrition Examination Survey (NHANES) at the Centers for Disease Control and Prevention (CDC) [1]. The study, which was published in the Journal of the American Medical Association [2], found a link between diabetes and failing eyesight in people with the disease.
Diabetes can lead to two serious eye conditions: diabetic retinopathy (DR) and diabetic macular edema (DME). DR is assessed by grading the presence and severity of retinopathy in the macula and peripheral retina of each eye [3]. DR is divided into non-proliferative diabetic retinopathy (NPDR) and proliferative diabetic retinopathy (PDR) and is graded based on the presence of microaneurysms, hemorrhages, cotton wool spots, and hard exudates, as illustrated in Figure 1 [4]. Meanwhile, DME is identified by the presence of blot hemorrhages and hard exudates within a 2-disc diameter from the center of the macula.
Figure 1. The retinal fundus can exhibit various types of lesions, each with its distinct appearance. Soft exudates appear as white, feathery or fluffy spots, while hard exudates are harder, uneven and have a white or yellowish appearance. Microaneurysms are small, round and have defined borders, measuring less than 3 mm in diameter, and are red in color. Hemorrhages are indications of bleeding in the retina and can manifest as dots, blots or flames. Additionally, the macula and optic disc are also indicated.
The treatment for DR and DME is determined by the severity of the condition [5]. With mild or moderate DR, progression can often be prevented through good blood sugar control. Early detection of DR and DME is crucial as evidence suggests that appropriate management at an early stage, including regulation of blood pressure, glucose levels, and lipid profiles, can greatly slow the progression of DR and even reverse moderate NPDR to DR-free stage. DME is treated with anti-VEGF medications and Focal Laser Treatment. For PDR, the main therapy is panretinal photocoagulation (PRP) [6].
In recent years, the utilization of deep learning models has gained popularity in the analysis of retinal images and detection of DR and DME. Despite its effectiveness, previous research in the automatic detection of DR and DME from digital fundus images has faced criticisms due to its black-box approach that relies on various representations for predictions without explicitly displaying the diabetic retinopathy lesions like microaneurysms and retinal hemorrhages [5]. This has raised concerns among physicians regarding the acceptability of the method for clinical use. However, the new deep learning-based screening software presents an opportunity to address these concerns by filling the gap in detecting both DR and DME simultaneously, while providing a more transparent method for predictions.
This paper develops a prototype of a deep learning-based screening software for the early detection and location of diabetic retinopathy (DR) and diabetic macular edema (DME) from digital fundus images. The prior research in this field, which encompasses classical machine learning, deep learning, and deep neural networks, is critically reviewed and discussed in Section 2. The research methodology, which includes the dataset used, the baseline models (Faster R-CNN (R50-FPN) and Mask R-CNN (R50-FPN)), process flow, and training and loss functions, is detailed in Section 3. The results of the research, including evaluation metrics, analysis of accuracy, false Negative Rate, and convergence speed, and analysis of the baseline models in object detection and instance segmentation, are showcased in Section 4. Finally, Section 5 delves into the discussion and future work related to this research.

3. Methodology

3.1. Process Flow Explanation

Figure 3 provides a visual representation of the process of using the models to detect hemorrhages within an input image and determine the presence of either DME or DR. The input image is fed into the base deep learning models trained to detect hemorrhages and locate them within the image. The models output a prediction on the presence of hemorrhages within the image. If a hemorrhage is detected, the location of the hemorrhage with respect to the macula is calculated. The location of the hemorrhage is then used to classify the images. If the hemorrhage is found within the macula, it is determined that DME is present. Otherwise, DR is present.
Figure 3. The step-by-step process of detecting hemorrhages and locating DME/DR with Faster R-CNN (R50-FPN) and Mask R-CNN (R50-FPN) models.

3.2. Dataset

The study used 89 color fundus images from a publicly available database DIARETDB1 (89) [16], with 84 images having at least mild non-proliferative diabetic retinopathy (NPDR) and 5 images of normal eyes. Medical experts marked the images as ground truth. Annotations were created using Labelme, which were then converted to the Common Objects in Context (COCO) [17] format using a local script or the tool Roboflow [18]. The original data set was divided into a training set of 28 images and a test set of 61 images, with different confidence levels marking the affected areas of the images. The training set had 18 images with hard exudates, 6 images with soft exudates, 19 images with microaneurysms, and 21 images with hemorrhages, while the test set had 20 images with hard exudates, 9 images with soft exudates, 20 images with microaneurysms, and 18 images with hemorrhages. The ground truth confidence level in the DIARETDB1 data set was set to 0.75.
Publicly available datasets such as MESSIDOR [19], APTOS 2019 Blindness Detection [20], and the kaggle [21] diabetic-retinopathy-detection database are often utilized for detecting diabetic retinopathy (DR) and diabetic macular edema (DME) in public domain. However, one drawback of these datasets is that they solely offer a severity rating ranging from 0 to 4 for each image, and do not include precise annotations of the lesions present in the images. This can pose difficulties when utilizing these datasets for tasks that require pinpointing the location of the lesions.

3.3. Baseline Models

The study utilizes two models, Faster R-CNN (R50-FPN) and Mask R-CNN (R50-FPN), which are both based on the Detectron2 platform developed by Facebook AI Research (FAIR) [22]. This platform provides a flexible environment for developing and deploying computer vision algorithms and includes various object detection techniques, such as Mask R-CNN [23], RetinaNet [24], Faster R-CNN [25], and RPN. The Faster R-CNN (R50-FPN) [26] is a Faster R-CNN model with a ResNet50+FPN backbone, and the Mask R-CNN (R50-FPN) [27] is a Mask R-CNN model with a ResNet50+FPN backbone. These models will be further explained in Section 3.3.1 and Section 3.3.2.

3.3.1. Faster R-CNN (R50-FPN) Architecture

The architecture of Faster R-CNN (R50-FPN) [28] is depicted in Figure 4 and is made up of three main parts: the Backbone Network, the Region Proposal Network, and the Box Head. The Backbone Network is a ResNet+FPN backbone that extracts feature maps from the input image. The ResNet part of the backbone consists of residual blocks stacked on top of one another, which are simpler to optimize and improve accuracy compared to traditional deep networks. The Feature Pyramid Network (FPN) [29] part of the backbone creates proportionally scaled feature maps from a single-scale input image of any size.
Figure 4. The architecture of Faster R-CNN (R50-FPN) consists of 3 stages: (1) Backbone Network, (2) Region Proposal Network (RPN), (3) Box Head, and Fast R-CNN for object identification. Mask R-CNN (R50-FPN) employs the same 3-step procedure as Faster R-CNN (R50-FPN), but also includes a binary mask produced for each ROI in addition to class and box offset predictions in the third stage.
The Region Proposal Network (RPN) [28] is a deep learning network used for object detection that generates rectangular object proposals with corresponding objectness scores from the input image. It shares its convolutional layers with the Fast R-CNN object identification network to save computation time. The Box Head is a type of region of interest head that uses fully connected layers to refine box placements and classify objects. It takes the feature maps and region proposals generated by the RPN and performs computations on each region, cutting and warping the feature maps with proposal boxes to create multiple fixed-size features. The final output is limited to 100 boxes after non-maximum suppression.

3.3.2. Mask R-CNN (R50-FPN) Architecture

Mask R-CNN [23] is an extension of Faster R-CNN, with the main difference being the inclusion of an additional output branch for generating object masks. While Faster R-CNN outputs class labels and bounding-box offsets, Mask R-CNN also generates binary masks for each region of interest (ROI). The procedure for Mask R-CNN (R50-FPN) [28] is similar to that of Faster R-CNN (R50-FPN), with the first two stages being the backbone network and RPN. In the third stage, in addition to class and box offset predictions, a binary mask is generated for each ROI. This allows for more precise spatial arrangement of objects, as it involves pixel-to-pixel alignment.
Mask Representation: Mask R-CNN [23] predicts binary masks for objects in an image by using a fully convolutional network (FCN). The FCN generates an m x m mask for each region of interest (ROI), preserving the pixel-to-pixel correspondence through convolutions. This allows for precise extraction of the object’s spatial structure. The RoIAlign layer in Mask R-CNN helps maintain the accuracy of the small RoI features by aligning them with the input, leading to better mask prediction performance. The RoIAlign layer is crucial for accurate mask prediction and ensures per-pixel spatial correspondence.
RoIAlign: The RoIAlign layer in Mask R-CNN improves the accuracy of the features extracted from regions of interest (RoIs) compared to the standard RoIPool operation. RoIPool quantizes the RoI to the granularity of the feature map and divides it into spatial bins, which can result in inaccuracies. RoIAlign, on the other hand, aligns the retrieved features with the input, removing the harsh quantization introduced by RoIPool and providing improved alignment. This improved alignment allows for more accurate bounding box regression, resulting in better object detection performance compared to models like Fast R-CNN.

3.4. Training and Loss Function

The RPN is trained to classify the anchor boxes as objects or not objects by applying back-propagation and Stochastic Gradient Descent (SGD) [30]. The RPN training uses the “image-centric” sampling method, where each mini-batch originates from a single image with a mix of good and bad example anchors. The loss function is computed by randomly selecting 256 anchors from the image, with a positive to negative anchor ratio of up to 1:1. If an image has fewer than 128 positive samples, the mini-batch is boosted with negative samples.
The entire architecture is trained using a four stage alternating training method [23]. First, the backbone CNN network is initialized with ImageNet weights and region proposals are generated through fine-tuning these weights. Then, the RPN is trained and used as a proposal for the object detection network. The backbone network is again initialized with ImageNet weights, but it is not yet connected to the RPN network. The RPN and Fast R-CNN detector are then fine-tuned by fixing the common layer weights and tweaking only the layers specific to the detector network.
Both machine learning models were trained on an Intel CPU with 4GB of RAM, using Python 3.7, PyTorch 1.8 or later, TorchVision, and TensorBoard version 1.6.0, on a Linux system. With a learning rate of 0.00025, the models were trained for 30,000 iterations, using data augmentations such as Re-size-Shortest-Edge to maintain image aspect ratios [31]. This can help prevent the image from being distorted, which is especially important for image classification and object detection tasks where the shape and size of objects are critical features for determining their class or location.

4. Results

4.1. Evaluation Metrics

The performance of the detector is evaluated using sensitivity, specificity, and accuracy [32]. These performance measures are commonly used in medical diagnosis and are defined as follows:
Sensitivity is the ratio of true positive (TP) cases to the sum of true positive and false negative (FN) cases. Specificity is the ratio of true negative (TN) cases to the sum of true negative and false positive (FP) cases [33]. Accuracy is the ratio of the sum of true positive and true negative cases to the overall sample.
S e n s i t i v i t y = T P / ( T P + F N )
S p e c i f i c i t y = T N / ( T N + F P )
A c c u r a c y = T P + T N / ( T N + T P + F N + F P )
In the evaluation of object detection and instance segmentation models in Section 4.3, the key metrics utilized to assess the performance are Intersection over Union (IoU) and the standard metrics provided by the Common Objects in Context (COCO) dataset [17]. IoU measures the overlap between the predicted and ground-truth bounding boxes and is calculated as the ratio of their intersection to their union. A high IoU score indicates a good match between the two bounding boxes. An example of IoU scores is shown pictorially in a Figure 5.
Figure 5. Example IoU scores for the detected bounding box.
In addition to IoU, COCO provides standard metrics, including Average Precision (AP) and Average Recall (AR) [34]. These metrics are listed in Table 2. AP measures the accuracy of the model’s object detection, while AR measures the model’s ability to recall objects. AP is calculated across different scales and at different IoU thresholds (IoU = 0.50, 0.75) and is divided into categories based on object size (small, medium, large). These categories are represented as APsmall (for small objects with an area < 322), APmedium (for medium objects with an area between 322 and 962), and APlarge (for large objects with an area > 962) [35].
Table 2. Evaluation metrics from COCO [35] used to assess the performance of object detection models on the COCO dataset. These evaluation metrics include Average Precision (AP) and Average Recall (AR) for bounding box detection, as well as AP and AR for different object sizes and IoU thresholds.
Similarly, average recall will be calculated at different levels of maximum detections per image (ARmax = 1, 10, 100) and divided into categories based on object size (ARsmall, ARmedium, ARlarge). These metrics provide a comprehensive evaluation of the performance of object detection models and help assess the strengths and weaknesses of the approach, specifically with respect to object size and detection count.

4.2. Analysis of Accuracy, False Negative Rate, and Convergence Speed

Figure 6a illustrates the accuracy of the Mask R-CNN (R50-FPN), which was 99.34% with a sensitivity of 97.5% and a specificity of 96.6%. This indicates that the model accurately classified the majority of positive and negative cases, with a low number of false positive and false negative results. For the Faster R-CNN (R50-FPN), the accuracy was 99.22% with a sensitivity of 97.37% and a specificity of 96.49%. Although the accuracy is slightly lower compared to the Mask R-CNN (R50-FPN), the sensitivity and specificity are still high, which means the model still has a good performance in classifying the positive and negative cases.
Figure 6. Faster R-CNN (R50-FPN) in red and Mask R-CNN (R50-FPN) in blue over 30k iterations. (a) Illustrates accuracy, with Mask R-CNN (R50-FPN) having a higher overall accuracy of 99.34%. (b) Compares false negative rate, with Mask R-CNN (R50-FPN) performing better with a final false negative rate of 0.8%. (c) Demonstrates convergence of loss value, with Faster R-CNN (R50-FPN) demonstrating faster convergence and a lower minimum loss value.
Figure 6b shows the comparison of the false negative rate of the two models. The results indicate that Mask R-CNN (R50-FPN) performs better, with a lower false negative rate of approximately 0.8% compared to Faster R-CNN (R50-FPN)’s rate of 0.6%. Figure 6c illustrates the convergence of the loss value for both models during the training process. Both models converge to a minimum value, with Faster R-CNN (R50-FPN) demonstrating faster convergence and a lower minimum loss value compared to Mask R-CNN (R50-FPN).

4.3. Analysis of Baseline Models in Object Detection and Instance Segmentation

In the context of this work, ‘detection’ refers specifically to the identification and localization of hemorrhages in retinal images, and should not be confused with the more general concept of object detection. For Faster R-CNN (R50-FPN), Table 3 reveals that the highest Average Precision (AP) is obtained at an Intersection over Union (IoU) threshold of 0.50:0.95 with all maxdets set to 100, with a value of 0.477. However, the Average Recall (AR) is low at 0.032 under the same setting. The model performs well for large objects, with an AP of 0.812 and an AR of 0.830 at IoU 0.50:0.95 with all maxdets set to 100. But it performs poorly on small objects, with an AP of 0.180 and an AR of 0.182 under the same setting.
Table 3. COCO evaluation metrics for object detection using Faster R-CNN (R50-FPN) baseline. The metric maxdets specifies the maximum number of detections per image for calculating the average precision (AP) and average recall (AR).
Table 4 indicates that Mask R-CNN (R50-FPN) has superior performance in both object detection and instance segmentation tasks, with a slightly better performance in object detection. The AP for object detection is 0.475 at IoU 0.50:0.95 with all maxdets set to 100, while the AR is 0.031 at the same setting. The AP for instance segmentation is 0.424 at the same setting. The AP is highest for medium-sized objects, with a value of 0.612 in object detection and 0.543 in instance segmentation.
Table 4. COCO Evaluation Metrics for object detection and instance segmentation using Mask R-CNN (R50-FPN) baseline.
The objective evaluation of the accuracy and sensitivity of a Faster R-CNN (R-50-FPN) model can be facilitated by visually comparing the model’s output to the ground truth annotations, as demonstrated in Figure 7. This comparison enables the detection of discrepancies or errors in the model’s predictions and provides insights into the degree of agreement with expert annotations.
Figure 7. The results of Faster R-CNN (R-50-FPN) can be visually compared to the ground truth to assess the performance of the model. (a) Shows the output image from Faster R-CNN (R-50-FPN), (b) Shows the ground truth image overlayed on the output image, and (c) Shows the ground truth image marked by experts. The visual comparison of the model’s predictions to expert annotations offers an objective evaluation of the model’s accuracy and sensitivity. It highlights any discrepancies or errors in the model’s predictions and provides insight into the degree of agreement between the model’s predictions and the expert annotations.
In the context of hemorrhage detection and segmentation, baseline models were evaluated, and the results are presented in Figure 8. The first image demonstrates early stages of DR without DME, with a marked area on the top right that indicates a failure in identification. The second image shows successful hemorrhage detection in a case with both DR and DME. These results showcase the potential of baseline models for detecting and segmenting hemorrhages, which can aid in the diagnosis and monitoring of DR in patients.
Figure 8. Results of hemorrhage detection and segmentation from baseline models. (a) Hemorrhage detection indicating early stages of DR without DME. The marked area on the top right represents a failure in identification. (b) Hemorrhage detection indicating both DR and a clear case of DME.
In conclusion, the summary of results shown in Table 5 indicate that both models have a relatively high average precision, with Mask R-CNN (R50-FPN) performing slightly better overall compared to Faster R-CNN (R50-FPN). However, the performance of both models can be improved by adjusting the IoU and maximum detection settings.
Table 5. Summary of object detection performance reported as Average Precision (AP) at different IoU thresholds, using different backbones."BB” denotes the bounding box-based approach, while “IS” denotes the instance segmentation-based approach.

5. Discussion and Future Work

In recent years, deep learning models have been utilized to analyze retinal images and detect the presence of diabetic retinopathy and diabetic macular edema. Previous work in this field has relied on multiple levels of representation to make predictions, without explicitly displaying the diabetic retinopathy lesions such as microaneurysms and retinal hemorrhages. This black-box approach, while effective, has raised concerns about its acceptability for clinical use among physicians. These studies have primarily focused on detecting diabetic retinopathy and diabetic macular edema at later stages, such as individuals with referable DR or advanced DR, which indicates that these patients require closer follow-up or treatment from ophthalmologists. However, early-stage detection is essential to achieving the best possible outcomes for patients with diabetic retinopathy and diabetic macular edema. Research shows that with appropriate care at an early stage, it may be possible to significantly slow the development of DR and even reverse moderate NPDR to a DR-free stage in order to achieve optimal control of blood pressure, glucose levels, and lipid profiles. In addition, advanced DR is incurable, which underscores the importance of early detection.
The results of the evaluation of the deep learning models, Faster R-CNN (R50-FPN) and Mask R-CNN (R50-FPN), in detecting early stages of DR and DME by focusing on the presence of hemorrhages in the retinal fundus show promising results. Both models achieved a high accuracy, with the Mask R-CNN (R50-FPN) having a higher accuracy of 99.34% compared to the Faster R-CNN (R50-FPN) which had an accuracy of 99.22%. The Mask R-CNN (R50-FPN) also had a higher sensitivity of 97.5% and a higher specificity of 96.6%, while the Faster R-CNN (R50-FPN) had a sensitivity of 97.37% and a specificity of 96.49%. This indicates that both models accurately classified the majority of positive and negative cases, with a low number of false positive and false negative results.
In the future, improvement in the accuracy of these models can be achieved through fine-tuning with a larger and more diverse dataset that includes small objects. The impact of alternate backbone architectures, such as InceptionNet and DenseNet, on the models’ performance should also be investigated. The ultimate goal is to make the detection of DR and DME fully automatic by training the model to identify all pathologies, the macula, and the optic disc based on distinctive features. It is important to note that most of the publicly available datasets which are commonly used for diabetic retinopathy and diabetic macular edema detection, have limitations in terms of annotations and lesion localization. These datasets only provide a severity score for each image on a scale of 0 to 4, without specific annotations of the lesions on the images, which can make it challenging to use them for tasks that require lesion localization. Thus, it is crucial to develop and use datasets with more specific annotations for lesion detection and localization in the training of deep learning models for accurate and efficient detection of diabetic retinopathy and diabetic macular edema.

Author Contributions

Conceptualization, F.C.; software, F.C.; data curation, F.C.; writing—original draft preparation, F.C.; writing—review and editing, F.C.; supervision, H.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

A publicly available dataset is used in this work which is available at https://www.it.lut.fi/project/imageret/diaretdb1/ (accessed on 1 July 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Diabetic Retinopathy: Cdc.gov-visionhealth-factsheet. Available online: https://www.cdc.gov/visionhealth/pdf/factsheet.pdf (accessed on 1 September 2022).
  2. Zhang, X.; Saaddine, J.B.; Chou, C.F.; Cotch, M.F.; Cheng, Y.J.; Geiss, L.S.; Gregg, E.W.; Albright, A.L.; Klein, B.E.K.; Klein, R. Prevalence of Diabetic Retinopathy in the United States, 2005–2008. JAMA 2010, 304, 649–656. [Google Scholar] [CrossRef] [PubMed]
  3. Zachariah, S.E.A. Grading diabetic retinopathy (DR) using the Scottish grading protocol. Community Eye Health 2015, 28, 72–73. [Google Scholar] [PubMed]
  4. Diabetes Retinal Screening, Grading and Management Guideline. Available online: https://www.worlddiabetesfoundation.org/sites/default/files/WDF08-386%20Pacific%20Island%20Ret%20Screen%20Guidelines.pdf (accessed on 1 September 2022).
  5. Ting, D.S.W.; Cheung, C.Y.L.; Lim, G.; Tan, G.S.W.; Quang, N.D.; Gan, A.; Hamzah, H.; Garcia-Franco, R.; San Yeo, I.Y.; Lee, S.Y.; et al. Development and Validation of a Deep Learning System for Diabetic Retinopathy and Related Eye Diseases Using Retinal Images From Multiethnic Populations With Diabetes. JAMA 2017, 318, 2211–2223. [Google Scholar] [CrossRef] [PubMed]
  6. Dai, L.; Wu, L.; Li, H.; Cai, C.; Wu, Q.; Kong, H.; Liu, R.; Wang, X.; Hou, X.; Liu, Y.; et al. A deep learning system for detecting diabetic retinopathy across the disease spectrum. Nat. Commun. 2021, 12, 3242. [Google Scholar] [CrossRef] [PubMed]
  7. Pragathi, P.; Rao, A.N. An effective integrated machine learning approach for detecting diabetic retinopathy. Open Comput. Sci. 2022, 12, 83–91. [Google Scholar] [CrossRef]
  8. Reddy, G.T.; Bhattacharya, S.; Ramakrishnan, S.S.; Chowdhary, C.L.; Hakak, S.; Kaluri, R.; Reddy, M.P.K. An ensemble based machine learning model for diabetic retinopathy classification. In Proceedings of the 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE), Vellore, India, 24–25 February 2020; pp. 1–6. [Google Scholar]
  9. Alsaih, K.; Lemaitre, G.; Rastgoo, M.; Massich, J.; Sidibé, D.; Meriaudeau, F. Machine learning techniques for diabetic macular edema (DME) classification on SD-OCT images. Biomed. Eng. Online 2017, 16, 1–12. [Google Scholar] [CrossRef] [PubMed]
  10. Nguyen, Q.H.; Muthuraman, R.; Singh, L.; Sen, G.; Tran, A.C.; Nguyen, B.P.; Chua, M. Diabetic retinopathy detection using deep learning. In Proceedings of the 4th International Conference on Machine Learning and Soft Computing, New York, NY, USA, 17–19 January 2020; pp. 103–107. [Google Scholar]
  11. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  12. Zhao, S.W.; Li, C.D.; Zhang, W.; Wang, X.H.; Wang, L.M.; Li, H.Y. Deep learning-based approach for grading diabetic macular edema. J. Ophthalmol. 2018, 2018, 8415759. [Google Scholar]
  13. Xu, W.; Wang, Y.; Zhang, J.; Wang, W. An end-to-end deep learning approach for detecting and grading diabetic macular edema in digital fundus images. Med. Biol. Eng. Comput. 2020, 58, 2191–2199. [Google Scholar]
  14. Chen, W.; Lu, J.; Li, R.; Guo, S.; Lu, Z.; Liu, M.; Yang, X. Deep transfer learning-based detection of diabetic macular edema in digital fundus images. J. Med. Syst. 2019, 43, 418. [Google Scholar]
  15. Alyoubi, W.L.; Shalash, W.M.; Abulkhair, M.F. Diabetic retinopathy detection through deep learning techniques: A review. Inf. Med. Unlocked 2020, 20, 100377. [Google Scholar] [CrossRef]
  16. Kauppi, T.; Kalesnykiene, V.; Kamarainen, J.K.; Lensu, L.; Sorri, I.; Raninen, A.; Voutilainen, R.; Uusitalo, H.; Kalviainen, H.; Pietila, J. The DIARETDB1 diabetic retinopathy database and evaluation protocol. In Proceedings of the British Machine Vision Conference, Warwick, UK, 10–13 September 2007; Volume 2007, pp. 1–10. [Google Scholar]
  17. Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part V 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
  18. Roboflow. Available online: https://roboflow.com/ (accessed on 1 July 2022).
  19. [Messidor—ADCIS]. Available online: https://http://messidor.crihan.fr (accessed on 1 June 2022).
  20. Hadid, A.; Pietikainen, M.; Martinkauppi, B. Color-based face detection using skin locus model and hierarchical filtering. In Proceedings of the 2002 International Conference on Pattern Recognition, Quebec City, QC, Canada, 11–15 August 2002; pp. 196–200. [Google Scholar]
  21. Kaggle [Online]. Available online: https://https://kaggle.com/c/diabetic-retinopathy-detection (accessed on 1 June 2022).
  22. Wu, Y.; Kirillov, A.; Massa, F.; Lo, W.Y.; Girshick, R. Detectron2. 2019. Available online: https://github.com/facebookresearch/detectron2 (accessed on 1 July 2022).
  23. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2017; pp. 2961–2969. [Google Scholar]
  24. Zhang, H.; Chang, H.; Ma, B.; Shan, S.; Chen, X. Cascade retinanet: Maintaining consistency for single-stage object detection. arXiv 2019, arXiv:1907.06881. [Google Scholar]
  25. Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
  26. COCO-Detection: Faster rcnn R50 FPN. Available online: https://github.com/facebookresearch/detectron2/blob/main/configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml (accessed on 1 July 2022).
  27. COCO-InstanceSegmentation: Mask rcnn R50 FPN. Available online: https://github.com/facebookresearch/detectron2/blob/main/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml (accessed on 1 July 2022).
  28. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Adv. Neural Inf. Process. Syst. 2015, 28, 3–7. [Google Scholar] [CrossRef] [PubMed]
  29. Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
  30. Zheng, S.; Meng, Q.; Wang, T.; Chen, W.; Yu, N.; Ma, Z.M.; Liu, T.Y. Asynchronous Stochastic Gradient Descent with Delay Compensation. arXiv 2016, arXiv:1609.08326. [Google Scholar] [CrossRef]
  31. Detectron2.Data.Transforms. Available online: https://detectron2.readthedocs.io/en/latest/modules/data_transforms.html#detectron2.data.transforms.ResizeShortestEdge (accessed on 1 June 2022).
  32. Van Stralen, K.J.; Stel, V.S.; Reitsma, J.B.; Dekker, F.W.; Zoccali, C.; Jager, K.J. Diagnostic methods I: Sensitivity, specificity, and other measures of accuracy. Kidney Int. 2009, 75, 1257–1263. [Google Scholar] [CrossRef] [PubMed]
  33. Lalkhen, A.G.; McCluskey, A. Clinical tests: Sensitivity and specificity. Contin. Educ. Anaesth. Crit. Care Pain 2008, 8, 221–223. [Google Scholar] [CrossRef]
  34. Goutte, C.; Gaussier, E. A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In Proceedings of the Advances in Information Retrieval: 27th European Conference on IR Research, ECIR 2005, Santiago de Compostela, Spain, 21–23 March 2005; Proceedings 27. Springer: Berlin/Heidelberg, Germany, 2005; pp. 345–359. [Google Scholar]
  35. COCO. Available online: https://cocodataset.org/#detection-eval (accessed on 1 September 2022).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.