The Use of Convolutional Neural Networks and Digital Camera Images in Cataract Detection

: Cataract is one of the major causes of blindness in the world. Its early detection and treatment could greatly reduce the risk of deterioration and blindness. Instruments commonly used to detect cataracts are slit lamps and fundus cameras, which are highly expensive and require domain knowledge. Thus, the problem is that the lack of professional ophthalmologists could result in the delay of cataract detection, where medical treatment is inevitable. Therefore, this study aimed to design a convolutional neural network (CNN) with digital camera images (CNNDCI) system to detect cataracts efﬁciently and effectively. The designed CNNDCI system can perform the cataract identiﬁcation process accurately in a user-friendly manner using smartphones to collect digital images. In addition, the existing numerical results provided by the literature were used to demonstrate the performance of the proposed CNNDCI system for cataract detection. Numerical results revealed that the designed CNNDCI system could identify cataracts effectively with satisfying accuracy. Thus, this study concluded that the presented CNNDCI architecture is a feasible and promising alternative for cataract detection.


Introduction
According to a World Health Organization report [1], the estimated number of visually impaired people worldwide in 2021 is at least 2.2 billion. More than 1 billion can be prevented or have not been resolved. The leading cause of vision impairment or blindness is cataracts, which account for roughly 94 million. Its early detection and cataract surgery can decrease the possibility of blindness, avoid its deterioration, and improve patients' visions.
A cataract is a crystalline opacity clouding the clear lens in human eyes. The blurring lenses block the light focus on the retina, resulting in poor vision. In addition, lens capsule or protein denaturation causes protein clumps, and pigments deposited on crystals are the main reasons for cataracts. Besides, many factors also cause cataracts, including genetic, aging, and smoking [2]. In the early stage, cataracts are generally painless, nonitching, and have almost no significant influence on sight. Thus, most patients are not aware of cataracts at the onset. Meanwhile, delays in detection and treatment could greatly increase the possibility of blindness. There are several ways to classify cataracts. The most common standard is based on the area of protein deposition, through which cataracts can be classified into three types, namely nuclear cataract, cortical cataract, and posterior subcapsular cataract. A nuclear cataract develops in the nucleus of the lens, becoming yellow and brown. On the other hand, a cortical cataract, mostly triggered by diabetes, occurs in the cortex of the lens. Meanwhile, a posterior subcapsular cataract, often happening to those who have taken a high dose of steroid medicine, exists at the back of the lens [3]. Another classification is based on the severity of the cataract, including normal, mild, medium, and severe cataracts.
To this day, regular cataract screening proves to be an effective way of preventing blindness and determining patients who need surgery [4]. Cataract surgery is the most successful and safest way to restore vision [1]. When cataracts have deteriorated enough to impact vision, removing the blurring lens by cataract surgery is necessary. In the medical field, ophthalmologists directly use slit lamps to screen a patient's ocular tissues by a highintensity light source [5] and manually grade by certain diagnostic criteria, such as the Lens Opacities Classification System III [6]. However, these manual judgments and instruments are time consuming, costly, and not easy to carry. Li et al. [7] pointed out that manual diagnosis may be influenced by personal experience accumulation, owning to subjective and error-prone judgment. The scarcity of well-experienced ophthalmologists, poor eye care resources, and economic considerations resulted in people having no access to effective treatment in time, leading to the blindness of most patients [8]. Nevertheless, many studies have made a great effort in developing highly portable and automatic cataract detection systems in recent years to treat cataracts early and enhance the accuracy of diagnoses.
When employing machine learning methods for cataract detection, feature extraction requires abundant engineering techniques and ophthalmic domain knowledge. Thus, the feature selection procedure is laborious and highly dependent on experience [9]. Simultaneously, predefined features might oversimplify the problem and omit some important hidden patterns. This investigation exploits the strength of convolutional neural networks in feature extraction and uses digital images provided by smartphones to conduct cataract detection. Meanwhile, the motivations, contributions, and innovations of this study are to develop user-friendly, portable, and automatic cataract detection systems for areas short of medical resources for instruments and ophthalmologists. In this way, users can pre-examine cataracts via mobile devices with digital images. The rest of this study is organized as follows: Section 2 examines related literature in recent years; Section 3 introduces convolutional neural networks; Section 4 presents the CNNDCI architecture for cataract detection and numerical results; Section 5 addresses the conclusions and directions for future study.

Related Work
Ophthalmic medical images have been widely employed in analyzing cataract severity and are expected to provide a better degree of accuracy in identifying and classifying cataract disease [10]. Generally, six types of ophthalmic images (i.e., slit-lamp images [11][12][13], retro illumination images [14], ultrasonic images [15,16], anterior segment optical coherence tomography images [17], fundus images [9,[18][19][20][21][22], and digital camera images [5,8,[23][24][25][26][27][28]) have been used for cataract detection. Fundus images and slit-lamp images are the two most frequently used in cataract detection. However, instruments for fundus images and slit-lamp images are not easy to access for people living in rural areas. Comparatively, digital camera images are more available as an alternative for ophthalmic images when cataract detection is performed [29]. Therefore, a cataract detection system based on digital camera images is highly required for early cataract detection that is more popular and user friendly.
Nayak [23] used the image preprocessing technique, such as big ring area and edge pixel count, to select features from the pupil area of images and conducted classification tasks by support vector machines. Fuadah et al. [24] gathered the features in digital camera images and converted them into grayscale manually. The gray level co-occurrence matrix was used to extract features divided into contrast, dissimilarity, and uniformity. The k-nearest neighbor was used to classify the images into normal or cataract. Pathak and Kumar [5] proposed a texture features-based algorithm for cataract detection by the occurrences of cataracts from the true color images. Khan et al. [25] presented a computer-aided diagnostic system that aids cataract detection in resource-lacking areas. In preprocessing, this study utilized Daugman's operator to isolate the iris and pupil from optical images for the region of interest, where six features were gathered. The support vector machine was employed in the presented system to classify images into cataract and noncataract. Tawfik et al. [26] used two classifiers-support vector machines and artificial neural networks-to perform cataract detection. The discrete wavelet transform and Log Gabor transform were employed to select features from pupil areas. This study reported that support vector machines outperformed artificial neural networks in terms of classification accuracy. Agarwal et al. [27] designed an android application system to detect the presence of a cataract in an individual's eye using a smartphone as a medium. Three classifiers were conducted: support vector machines, Naïve Bayes, and the K-nearest neighbors algorithm. Numerical results showed that the K-nearest neighbors algorithm could generate more accurate results than the other two classifiers. Sigit et al. [8] developed a cataract detection system using a single-layer perceptron with smartphones to reduce ophthalmologists' workload. The proposed system was able to perform classification with satisfying accuracy. Yusuf et al. [28] presented a web-based cataract detection system employing a convolutional neural network with digital camera images. They pointed out that the classification accuracy was influenced by transfer learning trained on ImageNet.

Convolutional Neural Networks
Convolutional neural networks with both feature extraction and classification capabilities [29] can cope with classification problems effectively and efficiently. In addition, weight sharing and feature extraction in the convolutional neural network have highly improved the computation efficiency. Thus, convolutional neural networks can handle problems with large-scale data [18]. A convolutional neural network includes a convolutional layer, a max-pooling layer, a flatten layer, and a dense layer. The convolutional layer can effectively extract essential features. The feature map F has a convolution operation with kernel maps, an operation represented by Equation (1).
where N l − 1 is the number of feature maps of l − 1th layer, F l j is the jth feature map of l layer, F l − 1 i is the ith feature map of l − 1 layer, ⊗ represents the convolutional operation, K l ij represents the kernel map connecting the ith feature map of l − 1 layer and jth feature map of l layer, and b l j is the bias. The activation function f (), common Sigmoid, and ReLU are shown as Equations (2) and (3), which are used to learn complex patterns that can solve problems linear models could not deal with.
Sigmoid : ReLU : The Max-pooling layer is used to reduce the size of the feature map, not only to avoid overfitting but also to decrease the computation time. Its operation can be expressed by Equation (4).
where Max-pooling () represents the max-pooling operation. In the flatten layer, the 2dimensional feature maps F were converted into a 1-dimensional array, as illustrated in Equation (5).
where n represents the number of neurons in the flatten layer, and F H , F W , F C represent the height and width of the feature map and channel size, respectively. The flattened neurons were used as the input of the dense layer, and its operation is determined by Equations (6) and (7).
where x l j represents the output, N l−1 is the number of neurons of l − 1 layer, w l i,j represents the weight between the ith neuron of l − 1 layer and jth neuron of lth layer. The output layer is the last layer of the network and produces the final result.

The Proposed CNNDCI System
This study proposed a cataract detection system and employed CNN with digital camera images to identify cataracts. Figure 1 illustrates the proposed CNNDCI architecture. This study collected two datasets from GitHub.com, which were published by krishnabojha [30] and piygot5 [31]. In this study, datasets of krishnabojha [30] and piygot5 [31] are denominated as dataset I and dataset II, respectively. All images were photographed by digital camera. Both datasets contain two classes, namely cataract and noncataract. Dataset I contains 9668 images, including 4514 cataract images and 5154 noncataract images. Dataset II contains 89 images, including 43 cataract images and 46 noncataract images. The ImageDataGenerator [32], one of the Keras functions, was employed to preprocess image data. This study used the dataset I to train CNN with fivefold cross-validation procedures and then provided classification accuracies. Dataset II was employed to examine the classification performances of the trained CNN models by different data. Table 1 indicates the numbers of data in each partition for dataset I with fivefold cross-validation. Table 2 illustrates the training data and testing data for dataset I with fivefold cross-validation. Figure 2 illustrates the convolutional neural networks coping with digital camera eye images used in this investigation.
The proposed model comprised seven layers, including two convolutional layers, two Max-pooling layers, a flatten layer, and two dense layers. Meanwhile, the input images contained three basic colors: red, green, and blue, with 64 × 64 sizes. The proposed CNN included two convolutional layers, 32 filters with kernel sizes of 3 × 3, a stride of 1. Convolutional layers were activated by rectified linear unit (ReLU). The ReLU only conveyed positive values; the outputs were zero when values were negative. A pool size of 2 × 2 was employed in two max-pooling layers. The ReLU activation function and 128 neurons were used in the first dense layer. As a binary classifier, one neuron and the Sigmoid activation function were employed in the last dense layer. This study could obtain a highly satisfying performance when the dropout rate approached zero. Thus, the dropout rate was set to zero.
The learning algorithm using binary cross-entropy as a loss function aimed to minimize the training error between predicted and actual values. In this study, an epoch was set to 1000 for the training procedure, and the Adam optimizer with a learning rate of 0.001 was utilized. The components of the proposed CNN model are presented in Table 3. After training, the CNN model was built and performed on a server. Users were required to use mobile devices to photograph eyeball appearances. Then, digital camera images were uploaded to the server through a website. Finally, the detected results were immediately generated on mobile devices for users.     To implement the proposed CNNDCI system, the hardware environment used for the convolutional neural network comprised NVIDIA GeForce GTX 1080 GPU, Intel(R) Core (TM) i7-7700 CPU @ 3.6oGHz, 32GB RAM, and Windows 10 as the operating system. The python deep learning library Keras version employed was 2.4.3, used to run TensorFlow.

Numerical Results
In this study, a confusion matrix, represented in Table 4, was used to measure the performance of models. Three indices-accuracy, precision, and recall, expressed as Equations (8)-(10), respectively-were employed to measure the performance of forecasting models [33].
where T p , F p , T n , and F n are the number of true positive, false positive, true negative, and false negative.  Table 5 shows three measurements of accuracy generated by fivefold cross-validation. Figure 3 indicates the convergence curves of the five-fold cross-validation of training and testing accuracy. In terms of classifying accuracy, the average testing accuracy of the dataset I and dataset II are 98.5% and 92%, respectively. Comparing numerical results of the present study with the previous study of dataset I, this study obtained a higher average testing accuracy with a fivefold cross-validation procedure than the previous study without crossvalidation procedures. In addition, the recall values are 97.9% and 91%, respectively, for Dataset I and Dataset II. The high recall values indicate that the portion of the false negative was smaller and imply that the probability that a cataract could not be detected was very low. Table 6 lists the numerical results of related studies using digital camera images. It can be observed that the proposed CNNDCI with Dataset I outperformed the other related studies in terms of average classification testing accuracy with a fivefold cross-validation procedure. For Dataset II, the proposed CNNDCI model can obtain a satisfying average classification accuracy compared with the current related studies.     Figure 4 illustrates the graphical user interface, including four steps: the start page, uploading images of eyes, cropping images of eyes, and detecting results. First, users pressed the Start button and then uploaded images of eyes for detection. After uploading the eye images, users needed to crop them according to the proper position. After cropping the images, the uploaded images were detected by the CNNDCI built up in a server. Finally, the CNNDCI delivered the detected results immediately to users through mobile devices. The progress against the most recent state-of-the-art similar studies is that this study developed a very user-friendly, convenient, and accurate cataract detection system.

Conclusions
Regular screening and early treatment can hugely decrease the probability of deterioration and blindness. Meanwhile, an inexpensive, robust, and convenient tool for screening cataracts is essential in rural or underdeveloped areas. In addition, with the rapid development of the deep learning technique, the convolutional neural network has been one of the most powerful classifiers. Thus, this study used digital camera images to train a convolutional neural network classifier to detect cataracts. The numerical results revealed that the proposed model is robust with two datasets. Furthermore, a user-friendly graphical user interface was provided to increase the easy-to-use feature and popularity of the proposed CNNDCI system. For future studies, more detailed data, such as classifications of severe levels or the locations of protein deposition, could be included by improving the current CNNDCI system into a multiclassification system. The level of cataracts in-cludes normal, mild, medium, and severe cataract, while the location of protein deposition includes nuclear cataracts, cortical cataracts, and posterior subcapsular cataracts [34]. In particular, the convolutional neural network classifier can provide more detailed results with more detailed data and the multiclassification function. Another possible direction for future work is improving the quality of training and image enhancement [35,36] by continually collecting noise-free images [37]. It must be noted that this study was limited by the environmental conditions of users. Furthermore, it was found that the reflection of light and the positions of images influenced the quality of the performance of CNNDCI. Therefore, a notice function to remind users of light reflections and positions of images could also be a direction for future work.