Accelerating Retinal Fundus Image Classification Using Artificial Neural Networks (ANNs) and Reconfigurable Hardware (FPGA)

Diabetic retinopathy (DR) and glaucoma are common eye diseases that affect a blood vessel in the retina and are two of the leading causes of vision loss around the world. Glaucoma is a common eye condition where the optic nerve that connects the eye to the brain becomes damaged, whereas DR is a complication of diabetes caused by high blood sugar levels damaging the back of the eye. In order to produce an accurate and early diagnosis, an extremely high number of retinal images needs to be processed. Given the required computational complexity of image processing algorithms and the need for high-performance architectures, this paper proposes and demonstrates the use of fully parallel field programmable gate arrays (FPGAs) to overcome the burden of real-time computing in conventional software architectures. The experimental results achieved through software implementation were validated on an FPGA device. The results showed a remarkable improvement in terms of computational speed and power consumption. This paper presents various preprocessing methods to analyse fundus images, which can serve as a diagnostic tool for detection of glaucoma and diabetic retinopathy. In the proposed adaptive thresholding-based preprocessing method, features were selected by calculating the area of the segmented optic disk, which was further classified using a feedforward neural network (NN). The analysis was carried out using feature extraction through existing methodologies such as adaptive thresholding, histogram and wavelet transform. Results obtained through these methods were quantified to obtain optimum performance in terms of classification accuracy. The proposed hardware implementation outperforms existing methods and offers a significant improvement in terms of computational speed and power consumption.


Introduction
Healthcare engineering plays an important role in improving the quality of human life. Portable healthcare devices are widely used in diagnosing serious healthcare conditions. Glaucoma and diabetic retinopathy (DR) are two of the serious and widespread eye-related diseases. As reported by the World Health Organization (WHO), diabetes will become the major cause of death by 2030 [1]. It was reported that by 2030, people with diabetes will be 82 million in developing countries and 48 million in developed countries. Similarly, if untreated, DR, which exists in diabetic patients, could lead to blindness in elderly people all over the world, whereas glaucoma, a chronic disease that affects the optic nerve, is the second major cause of blindness leading to loss in the visual field that eventually leads to permanent blindness [2,3]. There are serious risk factors associated with glaucoma, such as raised intraocular pressure that destroys the blood vessels and optic nerve. The ophthalmologists who conduct the tests to detect eye diseases have to be very skilled given the time required to make diagnosis. In the case of DR, there are no symptoms at the early stage and the changes in vision are not noticeable. As the disease progresses, the symptoms of DR, including floating spots in the vision, blurred vision and sudden loss of vision, could appear. In DR, the blood vessels in the retina are eventually affected, and vision is lost. As a result, an over-accumulation of glucose could permanently damage the tiny blood vessels. The main signs of DR are microaneurysms (MAs), hard exudates, haemorrhage, cotton-wool spots, macular oedema, venous loops and venous beading [4], as shown in Figure 1. It was estimated by the International Diabetes Federation (IDF) that the number of diabetic patients will rise up to 592 million by 2035 [5]. As predicted by the WHO, glaucoma is the second foremost cause of blindness worldwide. It was shown in numerous studies that eye pressure is a major risk factor for optic nerve damage, which in turn results in glaucoma. At the front of the eye is a space called the anterior chamber [6], where a clear fluid flows continuously in and out of the chamber and nourishes nearby tissues. The fluid leaves the chamber at the open angle where the cornea and iris meet. When the fluid reaches the angle, it flows through a spongy meshwork which acts as a drain and leaves the eye-opening [3]. A normal fundus image in comparison to glaucoma is shown in Figure 2. In a recent study conducted by the National Eye Institute (NEI) [7], the statistical prediction for the number of people who will be affected by glaucoma is staggering in comparison to previous years.  Hence, identifying these eye diseases at an early stage is essential; however, a large number of patients remains uninformed of the illness until it has progressed to a much more advanced stage [3]. Considering a significant rise in patients suffering from glaucoma and DR, it is the need of the hour that an efficient automated eye disease detection system is developed in both software and hardware that are able to detect glaucoma and DR at early stages so that early treatment could prevent permanent blindness.
The diagnosis of glaucoma and DR mainly involves advanced imaging techniques such as optical coherence tomography (OCT) and scanning laser polarimetry (SLP). However, these methods are expensive and require specialised skills. The fundus images have been widely used to diagnose glaucoma [8,9] and DR [7] where the damage to the optic nerve is detected using features such as the cup to disc ratio and the ratio of distance between the optic disc centre and optic nerve [8]. However, such methods require precision and may not yield good classification accuracy if due attention was not paid to aspects such as suitable feature selection. There are other techniques such as image segmentation that have also been used for glaucoma and DR detection [8]. However, there are a number of drawbacks associated with the image segmentation technique as well, such as thresholding and localisation, which may lead to inferior results in diagnosis.
A number of studies have been reported in the literature, such as that of Ravishankaer et al. [10] and others [11][12][13][14][15][16], which have shown that retinal blood vessels could be accurately detected in the images using image processing algorithms. Sinthanayothin et al. [11] applied moat operators to the green channel of the enhanced image. In retrospect, red findings were extracted with thresholding.
The principle behind these algorithms is to first detect blood vessels and then approximate the location of the optic disk. However, segmentation algorithms have proven to be complex and time-consuming. There are a number of techniques reported in the literature, such as in [8], where the authors proposed a method where a non-uniform fundus image intensity was estimated and shade correction was performed by enhancing the green channel of the image and subtracting the result from the original fundus image. Another image enhancement method was adopted by [17], where colours were normalised by applying the histogram. Principal component analysis (PCA) was used to reduce the feature size, and a support vector machine (SVM) was used for classification. In order to detect the bright DR areas from fundus images, Zhang and Chutape [17] applied adaptive contrast enhancement using the local mean and standard deviation of intensities.
Furthermore, deep convolutional neural networks (DCNN) have successfully been applied to analyse images [18]; however, despite the attractive qualities of DCNN and the relative efficiency of their local architectures, there are a number of drawbacks associated with this, such as high computational cost due to high-resolution images (such as fundus images) and the need for large training datasets.
In the context of the wider literature and state-of-the-art, this paper offers two major contributions-improved classification accuracy and hardware acceleration through field programmable gate arrays (FPGAs). Firstly, an in-depth analysis of various feature selection methods is provided, and the most suitable method is proposed for the detection of glaucoma and DR images by means of a rather simple and lightweight adaptive thresholding method. In this method, the main emphasis is given on localising the optic disk and cup. This method is shown to be more effective, less compute-intensive and simpler. A second major contribution of this paper is the acceleration and implementation of the neural classifier on the hardware platform (FPGA). Hardware implementation is particularly important to demonstrate the viability of classifying fundus images in real time. Hardware implementation is also important to demonstrate a route from a proof of concept design to a successful prototype. Given the amount of time required to simulate large datasets in software, it is essential that neural architectures be implemented on a fully parallel hardware platform. This study demonstrates the viability of such mapping from the software domain to fully parallel reconfigurable hardware (FPGA).
This paper is organised as follows: The Materials and Methods section describes the proposed methodology in both software and hardware. The Results and Discussion section provides an insight into the test design and experimental analysis. The Conclusion section is devoted to concluding remarks with future work followed by References.

Materials and Methods
In this section, various preprocessing techniques are analysed, and an adaptive thresholding technique is proposed to localise the optic disk and hard exudates for both glaucoma and DR images. Other techniques such as histogram and wavelet transform were also investigated, and results are shown. In order to carry out the experiments and formulate the data sets for the classification of glaucoma, the optic disk was first segmented from the fundus image. The high-resolution fundus (HRF) image database was used for the preprocessing methods. The database can be downloaded from the public website: 'https://www5.cs.fau.de/research/data/fundus-images/'. This dataset consists of three sets of retinal fundus images: healthy retinas, glaucoma images and DR retinas. Each set contains 15 images each, whereas healthy images include healthy retinas, DR images include images of patients with haemorrhages, bright lesions and spots after laser treatment and, finally, the glaucoma images include patients suffering from symptoms of focal and diffuse nerve fibre layer loss. Each image belongs to a different patient. The images were divided into training and test, and the neural classifier was tested with the unseen test images. In total, 10 images each for glaucoma, healthy and DR were used for training and 5 images each for glaucoma, healthy and DR were used for testing. All images were acquired with a CANON EOS-20D digital camera with a 60-degree field of view (FOV) [19].
The images were independently classified by three trained ophthalmologists in retinal imaging and were labelled. Once the fundus images were preprocessed, glaucoma and DR classification were done using the feed-forward backend neural classifier. For all the digital fundus images, a corresponding ground truth file was included. The total dataset was divided into a training and test dataset.
In the following section, different preprocessing methods were investigated for feature selection and results are reported.

Preprocessing Based on Histogram
Histograms are generally used to represent the statistical relationship between the number of grey levels and the occurrence frequency of a digital image [13]. In order to assess the suitability, the histogram approach is used to localise the optic disk from the retinal fundus image. Each fundus image is represented by a collection of histograms, extracted with respect to the RGB (red, green and blue) and HSI (hue, saturation and intensity) colour models. As shown in Figure 3a (ii and iv), it is difficult to differentiate between optic disk and optic cup in blue and red channel images; however, the green channel as shown in Figure 3a (iii) provides much better contrast, and it is relatively easy to locate the boundary of both optic disk and the optic cup. In order to reduce the total number of histograms, the only green channel was used for preprocessing. The green channel was selected as shown to offer the best discriminatory power between the main retinal anatomy (blood vessels, fovea and optic disk) and the retinal background [20]. The RGB channels of retinal fundus image are shown in Figure 3a, and the corresponding histogram of the selected green channel representation is shown in Figure 3b, where pixel intensity is shown in the x-axis and pixel count is shown in the y-axis. Once the histogram is analysed, the optic disk and optic cup are segmented by providing a threshold value that has a high intensity in the histogram; as can be seen from the fundus image, the optic disk and cup are the brighter regions of the image. Mathematically, in a histogram, the probabilities assigned to each grey level can be given by the relation shown in Equation (1): where r k corresponds to the normalised intensity value, L to the number of grey levels in the image, n k to the number of pixels with grey level r k and N to the total number of pixels, whereas the plot of p r (r k ) with respect to r k is the histogram of the image. Once the histogram of the image is obtained, then the maximum index values are selected to localise the optic disk, since the optic disk is the brightest part in the fundus image. The optic disk is located with a red '+' sign as shown in Figure 4c. Once the optic disk is localised, then the region of interest is segmented from that to formulate the data set for backend classification. The optic disk segmentation process is represented in Figure 4c,d. As shown in Figure 4, an original RGB fundus image (a) was converted into the corresponding grey image of the green channel ( Figure 4b). As shown in Figure 4c, the optic disk is located by means of finding the maximum intensity pixel. Since the optic disk is of high intensity, by finding the maximum intensity pixel, the optic disk can be exactly located. A red'+' sign is used as an indicator to point that location. Once the optic disk is localised, the optic disk is segmented from the green channel grey image as shown in the zoomed view of Figure 4d.

Preprocessing Based on Adaptive Thresholding
Adaptive thresholding is one of the most robust and efficient methods for segmenting an image [21]. Since the region of interest (ROI) is the optic disk, the method of adaptive thresholding can be used effectively to segment the optic disk from the fundus image. In adaptive thresholding, the analysis is done for the distributed pixel intensity in the fundus image [22], and a threshold value is selected. The optic disk can be segmented by the mathematical representation as given in Equation (2). Here, the input image is denoted by g(x,y), I is the threshold value and output image is denoted by f(x,y). This expression can be written as: In this particular example, an analysis was done on the pixel intensity distribution where the value of I is chosen as 150. Using this method, the following results are obtained for the segmentation of optic disk as shown in Figure 5, where Figure 5b shows a zoomed view. The abovementioned methodology was implemented to localise the hard exudates from the fundus image as a part of feature selection for the classification of the DR. The pixel intensity distribution is analysed and the value of I is chosen as 75 < I < 120. The following results were obtained for the localisation of hard exudates as shown in Figure 6, whereas Figure 6b shows a zoomed view of the image shown in Figure 6a.

Preprocessing Based on Discrete Wavelet Transform (DWT)
In this method, DWT was applied on a green channel of the fundus image. The transformed image is decomposed using Daubechies wavelet 'db2' having two vanishing moments. Daubechies wavelet 'db2' was selected for the decomposition of the green component image because it is one of the conditions to segment the optic disk by excluding other bright lesions in the retinal image. The use of higher-order wavelet filters increases the number of computations, which in turn increases the computational complexity without much improvement. Hence, in order to reduce the complexity and computations with good efficiency, Daubechies wavelet 'db2' having two vanishing moments were selected for the decomposition of the fundus image [23]. The general equation of two dimensional forward and inverse discrete wavelet transforms for an image f(x,y) is given in Equations (3)- (6).
The two-dimensional wavelet transform shown in the above equations leads to a decomposition of approximation coefficients at level j-1 in four components: the approximations at level j, i.e., A j (LL) and the details in three orientations (horizontal (H), vertical (V) and diagonal (D)), i.e., H (LH), V (HL) and D (HH). Once the wavelet transform is implemented in the image to sort the brighter segment, an optic disk can be segmented, and the data sets can be formulated. The optic disk segmentation using wavelet transform is represented in Figure 7.

Backend Recognition Using a Neural Classifier
Once the data were preprocessed, the feed-forward neural classifier was used to classify glaucoma from healthy images and then classifying glaucoma from the DR images. In total, 15 images were taken for testing, 5 images each for glaucoma, DR and healthy images [19,24]. There have been several studies reported in the literature for classifying retinal images with various preprocessing methods in combination with machine learning techniques; a comprehensive review and details are provided in [25][26][27][28]. As stated in [28], Niemeijer et al. used the probability of finding pixels by Gaussian filters. For classification, shape and intensity properties were used.
In this paper, different neural network (NN) architectures were implemented for the classification of healthy, glaucoma and DR images. The best results were achieved by implementing the architecture as shown in Figure 8. The flowchart which illustrates the classification process is shown in Figure 9.

Hardware Implementation
In order to improve the computational time and exploit the parallelism of neural networks, the hardware implementation was performed on an FPGA device (Nexys4 DDR) to classify the healthy, glaucoma and DR images. The neural architecture shown in Figure 8 was implemented by using Very High-speed Hardware Description Language (VHDL) structural coding on the Artix-7 chip [29]. Since the preprocessing step was performed in MATLAB for the classification of glaucoma, DR and healthy images, only the area of the segmented optic disk was fed into the FPGA device through a universal asynchronous receiver and transmitter (UART) as a 16-bit input. In order to realise the implementation on hardware, different blocks were designed and developed. A UART was configured, which is an integrated circuit to control a computer's interface to its attached serial device. In this study, the input from the MATLAB has to be fed into the FPGA; hence, a UART receiver was modelled. The data were sent bit-by-bit, and only one bit at a time was fed into the intermediate register. The stream of data was organised such that there was a start bit followed by the collective serial data bits. Once the data were fully transmitted and received, they were shifted into the intermediate register of the receiver and provided the 16-bit serial data, which was given as an input to the transmitter. The UART fundamentally provides an interface with the RS-232C data terminal equipment (DTE).
The block diagram shown in Figure 10 represents the proposed hardware design for implementation. Figure 10. Neural network classifier hardware implementation design where input from MATLAB is processed through the hardware architecture implemented on a field programmable gate array (FPGA) device. Figure 10 represents the proposed neural network architecture implemented on a Xilinx Artix FPGA device. The neural network classifier contains one hidden layer and an output layer. The individual components, such as a multiplier, adder, UART and sigmoid activation function, were modelled separately, and the entire neural network structure was implemented using structural modelling in VHDL. Once the design was functionally verified, it was synthesised, and the bitstream was uploaded on the FPGA device. A schematic diagram of the hardware neural architecture is shown in Figure 11, and the synthesised design on-chip is shown in Figure 12. Power dissipation on the hardware platform mainly depends on the way hardware components are designed and connected, whereas the computational speed depends on the routing signals between various components and calculating the worst-case propagation delay of the circuit.  In order to implement the software-based neural classifier onto hardware, it is important that this design is carefully planned to both determine the most suitable hardware structure as well as the best mappings from the neural-network structures onto given hardware structures. As neural networks are inherently parallel, one of the biggest advantages of using an FPGA device is to exploit this parallelism by fully mapping the software architecture [30,31]. The hardware implementation results include behavioural simulations, elaborated RTL design and the proof of implementation on the physical FPGA. Further results are elaborated on in the following section.

Software Implementation Results
The feature sets were formulated from the abovementioned sections (Sections 2.1-2.3), where three different preprocessing techniques were evaluated. Out of the three preprocessing techniques implemented for the classification of glaucoma, DR and healthy images, the datasets formulated using the method with adaptive thresholding gave the best overall accuracy. The training procedure included 10 images each for glaucoma, DR and healthy, where features were selected as the area of the OD for glaucoma and area of the total exudates for DR. The accuracy between glaucoma and healthy images is plotted in Figure 13a and accuracy between glaucoma and DR images as a regression plot is shown in Figure 13b. The training accuracy for both glaucoma and healthy images and glaucoma and DR images in terms of convergence is shown in Figure 14a,b, respectively.  The classification accuracy using the data set formulated by an adaptive thresholding-based preprocessing technique was recorded as 99.901% for glaucoma and healthy images, as shown in Figure 13a. For the classification of glaucoma and DR images, the classification accuracy is recorded as 99.623%. The classification accuracy in terms of regression is plotted in Figure 13b.
In total, five fundus image samples from each class were taken for testing, and the relevant confusion matrices are shown in Figure 15. As shown in Figure 15, the correctly classified samples can be identified in the diagonal directions of the confusion matrix marked in the green squares. As the binary classification was performed, the network classification results were achieved with 100% accuracy. The ANN [32] classifier results showed overall accuracies of~100% for the training and test data sets. With all the 10 samples tested, the experimental result matches the ground truth. The evaluation of the performance in terms of computational time is shown in Figure 16. As shown in Figure 16, the computational time for each sample varied from 1.2-2 s both for DR and glaucoma images. A comparative study with regard to classification accuracy is shown in Table 1. As shown in Table 1, the authors of [22] reported that in total, 20 samples were used to test glaucoma and normal images. In [33], authors used, in total, 81 samples for testing, which included 31 images of healthy retina and 50 images of the diseased retina. In [31], the authors did not report total test images; however, the algorithm was evaluated on all 1200 images. In the proposed method, in total, 15 samples were used for testing healthy, glaucoma and DR images. As shown in Table 1, the proposed method offers better accuracy in comparison to the previously published techniques. It is due to adopting a rather simple and better feature selection method (adaptive thresholding) combined with a backend neural architecture. In the proposed adaptive thresholding based preprocessing method, the calculation of the area of the segmented optic disk was incorporated rather than calculating the local image features, such as mean and standard deviation.

Hardware Implementation Results
The hardware design was physically downloaded onto the FPGA device, whereas the output pin from the board was connected to an LED on a breadboard. The LED turns ON/OFF corresponding to the classified output as glaucoma or a healthy image. The hardware setup is shown in Figure 17. As shown in Figure 17, after the preprocessing step, when a 16-bit input obtained from the MATLAB is fed into the hardware where the neural classifier was implemented, the LED remains OFF to show that the input image is healthy; however, when an image is classified with glaucoma, the LED turns ON.
The computation time is an important factor to consider when large datasets are classified. As shown in Figure 16, the computation time to classify each image as either glaucoma or healthy ranges from 1.2 to 2 seconds in the software domain. Once the neural classifier was implemented on hardware, the total computation time was recorded through behavioural simulations, as shown in Figure 18. As shown in Figure 18, two sample values were fed to the hardware device within a timeframe of 800 ns. Hence, it takes approximately 400 ns for one sample to be classified. This shows that the hardware implementation is 3 × 10 6 times faster than software implementation. For this specific architecture, the logic utilisation in terms of lookup tables (LUT) is shown in Table 2. In total, 3.23% logic was utilised, which included the UART (0.38%), Sigmoid function (2.835%) and adder (0.02%). The total power consumption (static/dynamic) is shown in Table 3, where on-chip power was recorded as 13.776 W.

Experimental Setup
An automated retinal fundus image classification was implemented on both software and hardware platforms. The software implementation was performed in MATLAB on an Intel Core i5-6200 CPU. The hardware implementation was realised on a Nexys4 DDR FPGA board (Artix-7). The hardware design was developed using Xilinx Vivado Design Suite [34], whereas the power was measured using Xilinx Synthesis Tools. Table 4 reports the performance of software and hardware implementations in terms of execution time and power consumption. The software implementation in MATLAB took 1.2-2 s to compute each sample, whereas the average power consumption was recorded as 25.656 W. Similarly, FPGA implementation took almost 400 ns to compute one sample, and average power was recorded as 9.434 W. As can be seen, the hardware implementation is almost 3 x 10 6 times faster and almost 2 times more power-efficient compared to the software. The proposed work offers a significant improvement in terms of classification accuracy and execution time when compared to the existing work ( Table 5). The proposed work offers significantly improved performance in comparison to the work reported in [35][36][37].

Conclusions
Automatic retinal image classification is a challenging task, and devising a preprocessing methodology along with backend classification is crucial to accurately diagnose diseases such as glaucoma and DR from healthy images. The authors proposed an adaptive thresholding-based preprocessing technique which has proven to be one of the most robust methodologies. As the adaptive thresholding process involves calculating the area of the segmented optic disk, it does not depend on factors such as intensity, contrast, etc. This work demonstrated the viability of achieving optimum performance in terms of classification accuracy, where 100% accuracy was achieved with both glaucoma and healthy images and glaucoma and DR images. In order to speed up the execution time, hardware implementation was realised on an FPGA device. In this study, it was demonstrated that high accuracy and performance could be achieved without compromising on precision. This study also offers a unique perspective that FPGAs could be used for real-time diagnosis in healthcare, where large datasets are used and output is required in real time. In future work, we would extend this work to develop a standalone low-power hand-held custom design device that could perform retinal image classification in real time.