Retinal Blood Vessel Segmentation Using Hybrid Features and Multi-Layer Perceptron Neural Networks

Segmentation of retinal blood vessels is the first step for several computer aided-diagnosis systems (CAD), not only for ocular disease diagnosis such as diabetic retinopathy (DR) but also of non-ocular disease, such as hypertension, stroke and cardiovascular diseases. In this paper, a supervised learning-based method, using a multi-layer perceptron neural network and carefully selected vector of features, is proposed. In particular, for each pixel of a retinal fundus image, we construct a 24-D feature vector, encoding information on the local intensity, morphology transformation, principal moments of phase congruency, Hessian, and difference of Gaussian values. A post-processing technique depending on mathematical morphological operators is used to optimise the segmentation. Moreover, the selected feature vector succeeded in outfitting the symmetric features that provided the final blood vessel probability as a binary map image. The proposed method is tested on three known datasets: Digital Retinal Image for Extraction (DRIVE), Structure Analysis of the Retina (STARE), and CHASED_DB1 datasets. The experimental results, both visual and quantitative, testify to the robustness of the proposed method. This proposed method achieved 0.9607, 0.7542, and 0.9843 in DRIVE, 0.9632, 0.7806, and 0.9825 on STARE, 0.9577, 0.7585 and 0.9846 in CHASE_DB1, with respectable accuracy, sensitivity, and specificity performance metrics. Furthermore, they testify that the method is superior to seven similar state-of-the-art methods.


Introduction
In this paper, an automatic DR diagnosis system is proposed. Pointing to the 'fact sheet' of the world health organisation (WHO), over 2.2 billion people all over the world have a vision impairment or blindness. One billion of these cases can be prevented or at least controlled if these cases are detected at an early stage. Diabetic retinopathy (DR) is the fifth leading cause of vision impairment and is estimated at 4.5 million out of the one billion cases [1][2][3][4]. Currently, the need for eye disease screening and check ups has increased rapidly. Despite that, there is a notable shortfall of ophthalmologists, particularly in developing countries. Hence, the automation detection of the DR process by supervised procedures is a benefit and can significantly assist in working out this issue [5,6].
In the last couple of decades, the clinical importance of the segmentation of retinal blood vessels from fundus images has attracted researchers to diagnose many diseases. Consequently, the segmentation of retinal blood vessels is an attractive field of research as it helps in retinal image analysis and computer-assisted diagnosis [7,8].
Retinal vessel investigation is noninvasive, as it can be observed directly without any surgical intervention [9]. Additionally, it is an indispensable element to any automatic diagnosis system for the features by using a predefined filter, which is used to extract features from the input image. Additionally to this, Transfer learning, an advanced approach of Deep learning, is recently used. Rather than building and training the network to learn features from scratch, Transfer learning concepts are uses as a pre-trained model on vast numbers of different images. So the model is already pre-learned by the tremendous number of different features and only need to fine-tune for the more suitable and relevant application. Since the proposed work belongs to the former category, previous work using supervised learning is reviewed next.
A supervised learning method as an ensemble system based on feature-based AdaBoost classifier (FABC) using a collection of a 41-D feature vector is constructed for segmentation, but a high relative number of features is necessary to get the accurate results [25]. A 7-D feature vector comprises a grey level and moment invariant based feature segments. An artificial neural network (ANN) is used as a classifier. The method approaches are unable to achieve the desired results without sophisticated preprocessing steps [26]. An ensemble system of bagged and boosting analysis composed of the 9-D feature vector is constructed from the gradient vector field, morphological transformation, line strength measures, and Gabor filter to handle healthy and pathological colour retinal images. The methods use a number of features, and a decision tree applies where the effects of weak learners are combined using bootstrap aggregation to increase the accuracy of the segmentation [27]. The retinal blood vessel morphology is exploited to identify the successive stage of the severity of the DR disease. A technique based on an ANN is used to automate the screening of DR. An ANN that contains one input layer, five hidden layers, and one output layer is used. The input to the ANN is derived from the Gabor filter and moment invariant based features. The input image is processed using a mathematical morphology operator, mean, and Gaussian filter. This technique is used to diagnose diseases like hypertension and diabetes mellitus, for which a change in the retinal vasculature is a primary symptom. The method is applied only on the one dataset while it fails on another dataset [17]. The retinal vessel is uprooted using Lattice Neural Networks, and dendritic processing introduced relatively better performance in vessels. Nevertheless, the final results are low in DRIVE, STARE datasets [28]. Extensive preprocessing steps, such as zero-phase whitening, global contrast normalisation and gamma correlation on fundus image patches, are used as inputs to CNN. Although the prediction on the DRIVE, STARE, and CHASE_DB1, showed reduced vessels misclassification and central vessels reflex problem, the results are mainly based on preprocessing steps [29]. Five stages of DNN with the autoencoder transform the selected RGB patches of the retinal image into a vessel map image. Even though the prediction is made at every pixel in a patch, final results are based on the boundless cross-modality training on datasets [24]. A supervised hierarchical retinal blood vessel segmentation method is presented to diagnose several ailments, such as cardiovascular diseases, DR, and hypertension. The proposed method uses two superior layers: a convolution neural network, as a deep hierarchical feature extractor, and a random forest, which works as a trainable classifier. This method is tested only on two datasets. Although it outperforms the aforementioned methods, the time cost of this method is relatively high [30]. A supervised method based on simple extreme machine learning uses a 39-D feature vector constructed from local features (29 features), morphological transformation operators (6 features), phase congruency (1 feature), Hessian components (2 features) and divergence of a vector field (1 feature). Regardless of the fact that the method tries only one public dataset and another unknown private dataset, the obtained results are relatively low concerning the other publicly available datasets [12]. Information obtained from a stationary wavelet transform with a multi-scale convolution neural network can catch the tiny vessel pixel. This method uses a rotation operator as a basis of a joint strategy for data augmentation and prediction [31]. A hybrid feature set and hierarchical classification used to segment retinal blood vessels from a retinal colour image. Six groups of discerning features were used to construct the feature vector compose of local intensity variation features, the local binary pattern features, a histogram of gradient features, vector field divergence features, morphological transformation, and high-order-local auto-correlation features [32]. Moreover, a random forest-based technique was used as a feature selection in addition to using two hierarchical trainers, comprised of a support vector machine (SVM) and random forest, used as weak classifiers. At the same time, the final results are from a Bayesian network. The final output results completely depend on the random forest technique, as a feature selection in addition to hierarchical classification. However, it was so low when it tested for each classifier separately. A hybrid segmentation method with a self-mask generated schema was used to extract the retinal blood vessel from the fundus retinal image. However, its performance was very low on the DRIVE dataset [4]. A transfer learning model, VGG16, as a pre-trained model with multilevel/multi-scale deep supervision layers was incorporated to segment the retinal blood vessel. The VGG16 model was improved by convolving a vessel-specific Gaussian kernel with two pre-initialised scales. The final output is an activation map with well-learned vessel-specific feature at multiple scales. The activation map is increasingly obtained by the symmetric feature map. Regardless of using a pre-trained model, VGG16, which trained on ImageNet (15 million of high-resolution labelled images), the method required better post-processing steps [6]. Despite the increasing use of deep learning, there are some drawbacks that face it. First, the need for copious numbers of images to train and test, which does not exist in our case, and second, a high amount of power in processing using a graphical processing unit (GPU). Furthermore, it has a complicated, long structure, which needs a lot of experience, training and effort.
For fulfilling the retinal blood vessel segmentation, this paper uses hand-crafted features and a multi-layer perceptron neural network (MLP), with the following contribution.

2.
Special multi-layer perceptron neural network structure for full gauge fundus retinal evaluation.

5.
Morphological transformation are applied twice. First, white top-hat and black bottom-hat are used as filters to extract features and to suppress noise during the extraction. Second, a closing operator is used to connect the isolated vessels and to fill the gap areas in the segmented images during the post-processing step. 6.
Model for automatic feature extraction and classification of blood vessel in retinal fundus images. 7.
Maximum and minimum moments of phase congruency are used for the first time.
In-depth visual analysis plus comparison with the state-of-the-art methods.
This paper is organised as follows. Section 2 outlines the datasets and the methodology, where feature extracting steps are discussed in detail. In Section 3, the results of the proposed method are presented. The discussion of the results is in Section 4. The conclusions of the paper are described in Section 5.

Materials and Methods
Three datasets are commonly in use for vessel segmentation research. These are the Digital Retinal Images for Vessel Extraction (DRIVE) dataset [33], the Structure Analysis of Retina (STARE) dataset [34] and CHASE_DB1 dataset [35]. Because we do some early processing on STARE and CHASE_DB datasets in creating mask images related to each one, the difference between the three datasets must be explained as follows.

DRIVE dataset
The DRIVE dataset consists of 40 images (7 of which are pathology) divided into 20 images for training and 20 for testing (3 of which are pathology). The image size is 565 × 584 pixels captured by a Canon CR5 non-mydriatic 3 charge-coupled-device (CCD) cameras at a 45 • field of view (FOV). The image has a FOV approximately 540 pixels in diameter. The DRIVE dataset provides a mask for every image to facilitate the identification of the field of view (FOV) area, as shown in Figure 1. Every image is accompanied by a manual segmentation label, manually created by three experts (observers) and validated by an experienced ophthalmologist (annotator). The training set is segmented only once and twice for the testing set, resulting in 577,649 pixels (12.7%) marked as a vessel in the first set (first observer), against 556,532 pixels (12.3%) in the second set (second observer). In this paper, the first set (first observer) is used as the ground truth (label) for the performance evaluation, while the second set is serving as a human observer reference for the performance comparison. The DRIVE dataset is depicted in Figures 2 and 3.

STARE dataset
The STARE (Structure Analysis of Retina) dataset comprises 20 images (10 of which present pathology) of size 700 × 605 pixels. The dataset is captured by a TopCon TRV-50 fundus camera at 35 • FOV. The dataset is manually segmented twice by two observers; the first one segments 10.4% of pixels as vessels and 14.9% for the other. Since the blood vessels of the second observer are much thinner than the first, the first is considered the ground truth. Figure 4 illustrates a sample of the STARE dataset.
Unlike DRIVE, STARE does not offer masks to determine the FOV, so the method described in [11,34] is leveraged to create them.

The CHASE_DB1 Dataset
The CHASE_DB1 Dataset includes 28 images of size 990 × 960 pixels, captured from 14 school children for both left and right eyes by a hand-held NidekNM-200-D fundus camera at 30 • FOV. The segmentation results of the first observer are deployed as the ground truth. This dataset is characterised by nonuniform background illumination. Thus, the contrast of blood vessels is inferior, with the presence of central vessel reflexes. The CHASE_DB1 is pictured in Figure 5. Like STARE, CHASE_DB1 does not offer mask images, so as in the above, the method described in [11,34] is also leveraged to create them. The proposed model to segment retinal blood vessels from a colour retinal image contains four stages: (i) green channel extraction, (ii) features extraction, (iii) evaluation of segmentation, (iv) post-processing. In the first stage, the green channel is extracted from the RGB retinal fundus image. The choice of the green channel, in particular, is due to the fact that in this channel, the blood vessel tree is more vivid than in other channels. In the second stage, a 24 feature vector is constructed for pixel representation. The features of this vector encode information on the local intensity value (1 feature), morphological white top-hat value at six different scales (6 features), morphology black-bottom-hat values at six different scales (6 features), maximum-moment of phase congruency (1 feature), minimum-moment of phase congruency (1 feature), Hessian components values (4 features) and difference of Gaussian features at five different scales (5 features). At the end of this stage, a features matrix based on the feature vector and the correspondence manual label is constructed for each pixel in the green channel image. Later, a training sample from this feature matrix is randomly selected and used as input to the classifier to perform the segmentation. In the third stage, a supervised learning approach based on a multi-layer perceptron neural network is adopted. The rule for blood vessel extraction is to learn by the algorithm through the provided training sample. The segmented reference images are usually termed label or gold standard images. The last stage, post-processing, is used to enhance the classifier performance and acts to reduce two types of artefacts. The proposed method is tested on the three publicly available datasets, DRIVE, STARE, and CHASE_DB1, which contain digital colour retinal images with 8 bits per colour channel. A complete block diagram for the proposed model is shown in Figure 6. The remainder of this section is dedicated to explaining the method in detail.

Feature Extraction
As mentioned above, in the second stage of our proposed method, a 24 feature vector is constructed for pixel representation. The features composing the vector are as follows.

Local Intensity Feature
In a retinal fundus image, blood vessel pixels appear much darker than the background pixels. So, the intensity value for each pixel in the retinal green channel is considered as a local intensity feature. Figure 7 depicts an image after a local intensity feature is extracted from the colour image.

Morphological White Top-Hat Transformation
In general, mathematical morphology transformation in image processing can combine (merge) two sets of inputs, the first related to the image and the second a predefined linear structuring element. One type of such transformation, white top-hat, is an operator used for lighting objects on a dark background. In the beginning, the opening operator removes objects smaller than the structuring element. Then the difference between the original image and the output of the opening operator is eliminated. Specifically, the opening operator uses the predefined structuring element, which is oriented at a particular angle θ, to eradicate a vessel or part of it when the vessel's width is shorter than the element and, both are orthogonal [15].
Let I be the green image to be processed, and S be the linear structuring element. Then the white top-hat transformation feature of I is given by where • is the opening operator, c ∈ {3, 7, 11, 15, 19, 23} the length of the structuring element and θ the rotation angle of S, spanning [0 : π] in steps of π/15. The result of the white top-hat transformation in all direction is given by where A = {x | 0 ≤ x ≤ π and x mod (π/12) = 0}, and set A is a set that contains the orientation angles. The length of S is critical to extract the vessel with the largest diameter. Therefore, a set of different lengths is used to formulate six different features. In case the opening operator • along a class of S is considered, the summation of white top-hat transformation at each direction will brighten the vessels regardless of its direction. The effectiveness of the summation on the entire retinal image will enhance all vessels regardless of their orientation. That includes the small (tortuous) vessels in the bright zone. Figure 8 shows the image after applying the white top-hat operator. Following the white top-hat feature extractor computation proposed in [15], six different features are extracted. Not only do vessel-like structures become much lighter but they also acquire intensity values relative to the local background. So, the intensity values of extracted blood vessels are relative to the local intensity of neighbourhood pixels in the original image [36]. The final response of the white top-hat transformation operator is a corrected uniform image with enhanced edge information.

Morphological Black Bottom-Hat Transformation
The counterpart of morphological white top-hat is the morphological black bottom-hat transformation. While the former is used for light objects on a dark background, the latter is used for the opposite. Black bottom-hat transformation is defined as the residue of the closing operator and the original image. In other words, it is a closing • of the image minus the image [12]. The black bottom-hat transformation feature can be defined as follows Let I be the green image to be processed, S is the structuring element, • is the closing operator and θ is an orientation angle at a given scale c. Then the black bottom-hat transformation feature on image I is given by Now, where B c is the summation of the black bottom-hat transformation in all directions of S at a different θ.
Both c and θ are defined as in the white top-hat transformation. The image is simplified by both of the above morphological filters. So the cross-curvature computation is easy, and the vessel segments are linearly coherent. Figure 9 shows the image after applying black top-hat operators.

The Principal Moments of Phase Congruency
Instead of extracting features by using intensity gradient-based methods in the spatial domain, the phase congruency method is based on Fourier components. Its importance stems from its ability to detect edges and corner points in an image effectively and invariantly to change in contrast and illumination conditions. Furthermore, phase congruency has a significant advantage over all other feature detectors as it can mark features that discriminate like edges (blood vessels) correctly. It is based on the local energy model, reported by Morrone et al. [37], which suggests that features as line and edges are perceived at points where the Fourier components are maximal in phase. The model is defined as a ratio of local energy to the overall path length used up by Fourier components from the head to tail.
Let x be some location at a signal, the nth Fourier component of which at point x has amplitude A n , and let |E(x)| be the magnitude at that point of the vector from the head to tail (considered as the local energy). Then the phase congruency, (PC), of the signal is given by In case all the Fourier components are in phase, then all the complex vectors will be aligned, and the PC will be 1; otherwise, it will be 0. While the model is not based on gradient intensity for detecting features, the construction of a dimensionless measure PC at any point in the image is possible. The most notable drawback of this model is the bad localisation and its sensitivity to noise. A modified model is proposed by Kovesi [38,39], to compensate for the image noise. It provides proper localisation since PC is represented as the phase difference between the amplitudes of the weighted mean of the local phase energy angleφ(x) and the local phase of nth Fourier component φ(x), both evaluated at point x. As such, the modified model can be expressed as follows.
PC(x, y) = where (x, y) is the (green channel) pixel coordinate, I is the total number of orientation. K is the total number of scales, w i (x, y) is a weighted factor for frequency spread in orientation i (because PC from many frequencies is significantly better than PC over a few frequencies), T is the estimated noise influence (recommended value is 3), and η is a constant incorporated to avoid division by zero if the local energy vanishes. In Equation (6), a cosine of the phase difference between the amplitude ofφ(x) and φ(x) at a particular point is used instead of a sine because when the frequency components are aligned (in phase) with the "local energy", the phase difference value can be expressed as φ n −φ n = 0, with the cosine being 1, resulting in a maximum PC. The weighting function w used to compute frequencies in ranges is defined as a sigmoid function where c is the filter response cut-off value, below which PC values are ignored, ρ is the value of the gain factor that controls the sharpness of the cut-off (its value set to 10), and the fractional value of s(x, y) of the frequencies is calculated by taking the amplitude of the sum of the filter response and dividing by the maximum amplitude A max (x, y) at each point (pixel) (x,y) of the images as follows.
Herein, Equation (6) represents PC for all orientation from [0 : π] overall scales K in a 2D image. Unlike the proposed method in [12], the modified model in [39,40] is used by calculating PC independently for each orientation using Equation (6), then the moment of PC covariance is computed. Hence, the variation of the moment within an orientation is taken into account. In such a case, the principal axis is corresponding to the axis about which the magnitude of the moment is minimum (minimum moment), which provides a good indicator for the orientation of the feature. On the other side, the magnitude of the maximum moment corresponding to the moment about the axis perpendicular to the principal axis is considered as an indicator of the significance of the feature. Accordingly, the maximum moment of PC is used directly to establish whether the point in the retinal image has a significant edge feature (blood vessel), where the magnitude has a large value. Meanwhile, if the minimum moment of PC is also large then it is considered as a strong indicator that the feature point has a strong 2D component and can be considered a corner.
The calculation of the maximum and minimum moment of PC is represented in [39,40] based on a classical moment analysis equation. Firstly, the PC information is combined in an orientation matrix (Om) given by evaluated over all multiple orientations formulated from the PC covariance matrix (Cm) given by where PC o is the amplitude of PC, which is determined at orientation o, and the sum is performed overall six orientations. Figure 10 vividly depicts the phase congruency filter response at the six different directions. According to the above illustration of the PC model, in case all Fourier components are in phase, all the complex vectors would be aligned, indicating that the feature point has a robust 2D component and can be considered an edge like feature. A singular value decomposition on a PC covariance matrix is performed, and the maximum moment, (M), and minimum moment, (m), are obtained as follows.  The Hessian matrix H(x, y) for a point I(x i , y j ) in the 2D retinal image is used for vessel detection and is given by where the second partial derivatives components are presented as f xx , f xy , f yx , and f yy for the neighbours of the point (x, y). The Hessian matrix is exploited to describe the second-order structure of intensity variation value around a pixel on the retinal image. The information obtained from the H describes the local curvature of the data in a small neighbourhood surrounding each pixel. The often employed approach to the retinal image is inspired by the theory of linear scale-space [36]. Accordingly, a multi-scale algorithm is designed to identify vascular structure by investigating the signs and magnitudes of the eigenvalues of H, which reflect specific shapes of structures in the retinal image, besides their brightness.
Let I(x, y) be the intensity of the a 2D green retinal image at a particular point (x, y) in an image, and G 2D (x, y, σ) is a 2D Gaussian kernel. Then, 2 × 2 Hessian matrix H(x, y, σ) components of the second order derivatives of G 2D (x, y, σ) multiplied by I(x, y) at a scale σ are defined as: where the 2D Gaussian kernel is given by with σ the standard deviation of the Gaussian distribution, referring to a scale of the filter, and * the convolution operator. Different values of σ are experimentally tested, and the optimal values of σ are detected as 1 2 , 1, 2, and 2 √ 2. The response of second-order derivative of a Gaussian function at a scale σ represents a probe kernel that can be a measure of the contrast between regions inside and outside the given range (−σ, σ) in the derivative direction in the retinal image [41,42]. In this respect, the blood vessel pixels gain the highest response at a suitable scale of σ, while the background pixels get a low response. It is noteworthy that the nature of the second-order derivative of the Hessian matrix is invariant to linear grey variation of the retinal image pixels, because of the smooth nature of the Gaussian kernel. So only local contrast is assessed along with the actual greyscale values.
As soon as the elements of the Hessian matrix are obtained, the corresponding eigenvalues and eigenvectors are determined. The eigenvalues, λ i , have an essential role in the discrimination of the local orientation pattern. Eigenvalues not only represent the magnitude of maximal local contrast change but also the change of the magnitude of the local intensity in the other perpendicular direction. Consequently, the eigenvalue decomposition of Hessian can be exploited to distinguish between the edge-like feature in a fundus retinal image and the background. Concerning blood vessel skeleton, darker features are considered. We are interested in only the case where one eigenvalue has a positive and high magnitude. Table 1 illustrates all possible variants of the structure's shape and brightness that can be identified through the analysis of Hessian eigenvalues in any 2D medical image. We are only interested in the fundus retinal image modality (in our case, the blob-like structures are dark). Table 1. The relations between the eigenvalues for different features in any 2D image are adopted from Frangi' Schema where H = High-value; L = low-value; N = Noise usually small; +/ − = positive or negative eigenvalue; Dark = dark feature on a bright background (blood vessel in our case [41]. The representation structure we are interested in is indicated with a *. Geometric information for the blob-like structure is obtained through a dissimilarity measure based on the second-ordered derivative that is called blobness ratio, R b , given by

2D
This ratio is maximal for a blob-like structure and invariant to the grey level. Since the intensity value of the blood vessel pixels is lower (darker) than the background pixels, the magnitude of the second-derivative (eigenvalues) will be high at the blob-like structure and small at the background. So the ratio will be higher at the blood vessel pixels and lower at the background pixels.
The magnitude of the second-order derivatives at the background is small, distinguishing background structure from the blood vessels. The norm of Hessian is used to quantify this property in terms of the structureness ratio, S, and is given by where ||.|| F is the Forbenius norm of the components of Hessian matrix. Intuitively, the structureness ratio S is small for the background pixels mostly because of their lack of structure and higher amount of blood vessel pixels. Finally, a vessel function V(S) combines the components of the blobness ratio R b , and structureness ratio, V(S), to map these features into a probability-like estimates V o (S) of being a vessel given by where β is equal to 0.5 and c is half of the maximum Frobenius norm S. Both V and S are computed at a different scale σ. Herein, R b accounts for the eccentricity of the second-order ellipse. Then the maximum value for both V, S are taken as a Hessian feature for each pixel and they are given by: Both V Max , S Max , λ 1 and λ 2 are used as a features for every pixel of a green image. The visualisation of the four features of the Hessian matrix components are delineated in Figure 12.

Difference of Gaussian
Difference of Gaussian (DoG) kernels are exploited as an edge detector [16]. A DoG kernel is much closer to the second-order of Gaussian function, which is described in detail in Section 2.1.5. To characterise a pixels features regarding its neighbours, a DoG filter is utilised at five different smoothing scales, σ ∈ { √ 2 2 , 1, √ 2, 2, 2 √ 2}, with the base scale set to σ = 0.5. Using DoG as a band path filter not only serves to remove noise in an image but also vividly depicts the homogeneous area in an image. The visualisation of the five DoG features are depicted in Figure 13. In this paper, a 24-D feature vector is constructed for each pixel in a retinal image and is used to differentiate vessel pixels, lesions, and background. Five groups of features are used. The RGB image is transformed into a green channel image because the contrast between blood vessel pixels and background is more precise and prominent than the other channels. So, the intensity value of each pixel in the green channel image is used as a feature. The moments of phase congruency emphasise the edges of the retinal blood vessel and it is the first time to be used as a feature in the fundus retinal image. The Hessian matrix components emphasise the zero crossing. The white top-hat transformation using multi-scales and multi-orientation structuring elements are efficient in making a corrected uniform image. Black bottom-hat transformation using multi-scales and a multi-orientation structuring element is used as a filter to suppress noise. The difference of Gaussian is used, and the response of the filter is a homogeneous image.

Segmentation
The procedure can be distinguished in three steps: first, scaling the features by normalisation; second, design of the neural network, in which the configuration is decided and trained (applied what the NN learned); third, testing, to identify each pixel as a vessel or a non-vessel to get the final binary image.

Normalisation
In the end, in the feature extraction process, each pixel in the green channel image is characterised by a vector V(x, y) (in a 24-D feature space) given by where the features v j have distinct ranges and values. Normalisation, by narrowing down the range between the different features is necessary and leads to an easier classification procedure of assigning the candidate pixel to the suitable class: class K 1 of blood vessel pixel or class K 2 of non-vessel pixel.
One can obtain a normalised versionv j of features v j bȳ where µ j is the mean value of the jth feature and σ j their standard deviation. Finally, the features are normalised with a zero mean and unit variance for each pixel independently. The resulting features are used in the classification procedure. According to [26], using the results of linear classifiers to separate the two classes of vessel segmentation gives poor results because of the nature of the distribution of the training dataset at the feature space, which is nonlinear. Consequently, segmentation can be made more accurately and efficiently using a nonlinear classifier, such as support vector machine (SVM) [32], neural network [24], a multi-layer feed forward neural network [17], decision tree [43], random forest [32], extreme learning machine [12] and convolution neural network(CNN) [44].

Design of the Neural Network
In the present work, we use a modified MLP neural network of one input layer (with 24 input units), three hidden layers, and one output layer as a binary classifier. In the purpose of getting the optimal topology of hidden layers, several topologies and different numbers of neurons were tested. The test results showed that excellent results are obtained when each of the three hidden layers contains 27 units. A rectified linear unit (Relu) activation function is selected for the hidden layers [45]. Further, in classification, cross-entropy (CE) was found better to use as a loss function than a Minimum Mean Square Error (MSE) loss function, because of the unbalanced distribution of vessel and non-vessel pixels [45], where the CE measures the nearness of two probability distributions.

Training
To train the classifier, a set S Train of M candidates features for which the feature vector [V− (22)], and the classification result, K 1 (vessel pixel) or K 2 (non-vessel) are known. The sample that forms S Train is collected randomly from the manually labelled pixels, vessel and non-vessel, from the training images as proposed in [26,43] for each dataset. Once the classifier passes the training steps, it becomes ready to work and classify images of which the results are not known a priori.

Application of Neural Network (Testing)
The trained neural network is used to test "unseen" dataset images. Specifically, a binary image is generated in which blood vessel pixels are distinguished from background pixels. The mathematical descriptions of pixels are individually input to MLP. In particular, the input layer units receive a vector V of the normalised features, as in Equations (22) and (23). The Relu function has a tremendous out-of-sample prediction ability due to its simplicity. It also alleviates the problem of a vanishing gradient in the learning phase of the ANN [46]. Finally, we obtained a binary image that contains only extracted blood vessel and background pixels.
In this paper, we use a multi-layer neural network (MLP) as a binary classifier, since there are only two classes (blood vessel) and (non-vessel). MLP is an extension of an artificial neural network (ANN), considered to be more nonlinear when mapping a set of the input variable to a set of output variables [47]. Also, it does not make any assumption regarding the probabilistic information about the pattern under consideration.

Post-Processing
In the post-processing stage, two steps are used to reduce two types of classification artefacts. First, the discontinuous vessels pixels, which are surrounded by vessel neighbours and are misclassified as background pixels (FN). Second, the non-vessel pixels, which appear as small isolated regions and are misclassified as vessels (FP). For the sake of overcoming both artefacts, a mathematical operator is used. A mathematical closing operator is applied for the first artefact to fill any holes smaller than a predefined structure (a disk of radius 9 in our experiments) in the vessel map. For the other artefact, the acreage of isolated pixels are connected in eight connected areas, then any area below 100 pixels is removed, as proposed by [9]. Once both artefacts are corrected, they will be reclassified correctly as vessel pixels for the first artefact and background pixels for the second.
The experimental results in Section 3 show that the combination of the five groups of features, 24D features, which are encoded to the modified MLP, yields great performance. Our experiments are tested in the environment of 2.4 GHz Intel7-77ooQH CPU and 16 GB memory using Scikit-learn: open source machine learning libraries [48].

Results
The segmentation performance of the proposed method on the fundus image can be assessed by comparing the segmentation testing results with the GS image label. To this end, we coded our method and carried out numerous experiments on the three publicly available datasets DRIVE, STARE, and CHASE_DB, as mentioned at the beginning of this section.

Performance Measures
The classification result of every pixel can be one of the four types shown in the confusion matrix of Table 2, where the entries are as follows. True positive (TP) indicates a vessel pixel classified correctly as a vessel pixel. False-positive (FP) indicates a non-vessel pixel classified wrongly as a vessel pixel. True negative (TN) indicates a non-vessel pixel classified correctly as a non-vessel pixel. False negative (FN) indicates a vessel pixel classified wrongly as a non-vessel pixel. We started by training our classifier using a set of images selected randomly from three datasets: DRIVE, STARE and CHASE_DB1. From each dataset, a sample of around 30% of the S Train are used as a training sample set. Firstly, the RGB image is transformed into a green channel image, since the green channel shows the best contrast between vessel and background [11,[18][19][20]25]. Then, for every pixel in the image, a collection of hybrid features including local intensity value, morphology top-hat transformation, morphology bottom-hat transformation, minimum/maximum moment of phase congruency, Hessian matrix components and difference of Gaussian features are extracted from the green band image to generate a 24-D feature vector, which is represented as 24 different features for each pixel.
The design of the feature vector characterises every pixel in terms of some quantifiable measurements that can be easily used to differentiate effectively between a vessel and non-vessel pixel in the classification step, as we described in detail in Section 2. The extracted features are collected in a feature matrix, which represents all features from training images for each dataset. The (S Train ) are randomly selected from the feature matrix to teach the classifier how to differentiate between vessel pixels and background. After the MLP is trained, it is used to identify and differentiate between the two different patterns in the test images, blood vessel pixel and the background pixel.
To quantify the classification results, we use the following standard metrics: sensitivity (Se), specificity (Sp), positive prediction value (Ppv), negative prediction value (N pv), and accuracy (Acc), which are given by the equations below [49]. It should be noted, however, that the values of the variables in those equations are cumulative, i.e., denote the total number of pixels classified. For example, TP in those equations denotes the total number of vessel pixels correctly classified as vessel pixels.
Additionally, the F1 metric is suggested for evaluating classification performance when the dataset is unbalanced, as is the case here, and is given by All the above metrics are in [0,1], with 0 being the worst and 1 being the best (though generally not practically attainable.) In our tests, all these metrics are evaluated for each image, then the average for all images tested from each dataset is calculated. In our experiments, we tested images from all three datasets: DRIVE, STARE, and CHASE_DB1. For each image tested, the feature vectors were calculated for all pixels and applied consecutively to the classifier. Based on the classifier's decision for each pixel, and on our knowledge of the truth, we calculated the total numbers of the TP, FP, TN, FN cases for the image as a whole. These numbers are then plugged into the above equations to obtain the corresponding metrics.

The Proposed Method Evaluation
The last results are obtained from the probability map image (binary image) and the five performance metrics on the DRIVE, STRAE, CHASE_DB, respectively. The average of the selected measures is tabulated in Tables 3-5. We start with the DRIVE dataset, from which we randomly selected 20 images for testing. The results for the performance metrics are given in Table 3.  Tables 4 and 5 show, respectively, the same metrics for the STARE and CHASE_DB1 datasets, where 20 images from the former and 14 from the latter were tested. At the bottom of each table, the averages of the metrics for that dataset are placed. Then segmentation results are identified from the obtained metrics on 20 images of the STARE dataset are presented in Table 4. The best and worst segmentation results are identified from the obtained metrics and are presented in Table 5 on the CHASE_DB1 dataset.   Table 6 summarises the best and the worst results on the whole dataset under study using the proposed method.

Comparison with The-State-Of-Art Methods
This method is applied with hand-crafted features and a modified MLP with CE, to improve its performance compared with seven other state-of-the-art methods of the same category. The comparison is carried out on the three datasets mentioned above. Tables 7-9 show the comparison results in terms of the sensitivity (Se), specificity (Sp), accuracy (Acc), and F1 metrics for the three data sets. In all three tables, the results for the proposed method are shown in the last row. From the tables, it is evident that the proposed method is better for all metrics than all the other methods except for the F1 metric in Table 9. Moreover, visual evidence of the results are pictured by showing some examples for the best and the worst cases are also provided in Figures 14-19. It is worth mentioning that using different FOVs may give significantly different results. So, to ensure a fair comparison, segmentation metrics are computed on pixels inside the FOV only for all methods, and FOV masks were created for STARE and CHASE_DB1 datasets as mentioned earlier. Table 7 shows the quantitative results of the proposed method with comparison to the relevant state-of-the-art methods on the DRIVE dataset.  Tables 7-9 show an overview of segmentation results for the DRIVE, STARE and CHASE_DB1 datasets. The performance results of the proposed method in terms of average accuracy (Acc), sensitivity (Se), specificity (Sp), and F1 score outperform the seven state-of-the-art methods.
A sample of the visual results of the best and worst results for the three datasets in two different images are shown in Figures 14 and 15 for DRIVE, Figures 16 and 17 for the STARE dataset, and Figures 18  and 19 for the CHASE_DB1 dataset. However, this method shows bad performance for CHASE_DB1 compared with DRIVE and STARE where a small decreasing in performance (associated with Table 9) is noticed for all results and for F1 in particular. This may be due to the high dimensions of this dataset, the characteristic of the nonuniform background illumination for the images, the bad contrast between vessels and background in addition to the presence of central vessel reflex.   Table 8 shows the quantitative results of the proposed method with comparison to the relevant state-of-the-art-methods on the STARE dataset. A sample of the visual results of the best and worst results on the STARE dataset in two different images is shown in Figures 16 and 17.   Table 9 shows the quantitative results of the proposed method with comparison to the relevant state-of-the-art-methods on the CHASE_DB1 dataset. A sample of the visual results of the best and worst results on the CHASE_DB1 dataset in two different images is shown in Figures 18 and 19.  The evaluation performance of the proposed method is computed on pixels inside the FOV only. As with most of the previous work in the DRIVE dataset, the FOV mask is used, while the FOV for both STARE and CHASE_DB1 are created by the same method as that used in [11,15,26,30].
It worth mentioning that using different FOVs may give significantly different results. To compare our results to other relative retinal vessel segmentation algorithms, the results are compared to those of state-of-the-art methods for DRIVE, STARE, and CHASE_DB1 datasets, respectively. The comparison results show the robustness of the proposed method in improving performance measure values of accuracy, sensitivity, and specificity to all datasets used in this study.

The Performance Model
While experimenting, we observed that the proposed method performs better than the conventional supervised and faster than the automatic feature extracted models. We proposed an automatic system to extract and differentiate between the blood vessel and background pixels in the fundus image. A modified MLP neural network was employed to act as a semi-deep network. This network is strengthened with the use of Relu, which is used mainly in convolution neural networks (CNN) and CE loss function. CE as a loss function is the most proper to use due to the nature of the distribution for the vessel and background pixels, which is not a Gaussian prior. However, it is independent and identically distributed (I.I.D). The modified MLP learns from the 24 hand-crafted features how to successfully detect and differentiate between the blood vessel and the background pixel. The proposed method profoundly depends on the variety between the type of the selected features and the design of the network itself. In other words, we take our account to make some symmetry for features itself to form a feature vector that can represent each pixel adequately. Despite that preprocessing is always the first stage in most of the state-of-the-art-methods in vessels segmentation model, we preferred to offset it by using five groups of features that can homogenise, uniform, and process the noise in the image intensity. For example, the blood vessel pixel has a blob like-edge, which is detected effectively using the maximum and minimum moment of phase congruency features. The second derivative of hessian matrix features is used to identify the zero-crossing in the final images, the five Gaussian scaled features successfully rendered a more homogeneous image, white top-hat features corrected the uniformity of the image, black bottom-hat features suppressed the noise in the image. Because of the difference in the image dimensions between datasets, each dataset was independently evaluated and tested. We used only one classifier, MLP, and the final results depend on it without any other procedure of features selection or any ensemble or hierarchy classifier. We represented some gaps in the introduction, which were mainly in the complicated structures of the system or the high time cost to segment the vessel. We described some gaps, which were primarily focal to the complicated nature of the system and the high time cost. The datasets have many challenges, such as the scarcity, number of available images, and the variety of image illumination and dimensions. Hence, for more generalised DR diagnosis, more retinal fundus images with high illumination are necessary.

The Medical Implications of the Model
A modified MLP model acts as a soft and rapid automation system to provide the ophthalmology with primary information and knowledge about the blood vessels (e.g., size, tortuosity, crossing, lesions structure, hard and soft exudates, cotton-wool-spots) and the quick aids. This system can not uproot the role of the ophthalmology doctors, but it can help in understanding the diagnosis results, early diagnosis, and increase the result accuracy as well. Despite getting good results in the proposed model, there is still a chance for further improvement.

Conclusions
In this paper, we have proposed an integrated method to segment blood vessels in retinal images. The experimental results prove that the method succeeds both absolutely and in comparison with seven other state-of-the-art similar methods using three well known publicly available datasets. The proposed method encompasses many elements that wholly contribute to its success. The first element is a feature choice, where 24 carefully selected features are employed. The second element is the choice of the classifier, namely an MLP neural network, and the design of this classifier. The third element is the way the classifier is trained, where randomly chosen images from all three datasets participate in the training. The fourth element is post-processing, where classification artefacts can be mitigated. Classification results are provided both visually and quantitatively, wherein the latter case, five standard classification metrics are employed. Both the visual and quantitative results show the good performance of the proposed method vividly, despite the wide variation of vessel sizes. Actually, the method detects both wide and tiny blood vessel with the same efficiency. We avoid using preprocessing to preserve the blood vessel structure and avoid losing any information. Furthermore, our results are only based on the crafty choice of features and the correcting of the outlier values by normalisation. So concentrating on the handling of the outlier values by the feature engineer may help in decreasing the reliability of using feature selection techniques or hierarchical classification as a way to increase system performance. Our experimental results contradict previous thought that the only advantage is in using feature selection and/or hierarchy classification techniques to improve performance. On the contrary, it suggests that using features that are selected craftily or by using feature engineers can give promising results. In the retinal disease diagnosis community, the relevant dataset is scarce. In other words, the availability for investigating and researching using real images from patients who suffer from an eye ailment is a hard task, particularly in a developing country. So, the only available choice is to work on the most publicity available dataset. The average accuracy, sensitivity and specificity values accomplished by the proposed method are 0.9607, 0.7542, 0.9843 on DRIVE, 0.9634, 0.7806, and 0.9825 on STARE, and 0.9577, 0.7585, and 0.9846 on CHASE_DB1, which are comparable with the current blood vessel segmentation techniques. Although 24-D feature vectors are constructed for every pixel in the image, the ensuing space and time complexity is offset by the simplicity of the method, compared to the cutting edge methods like deep learning. The fast implementation and simplicity of the proposed method make it a handy early prediction tool of DR.