Intelligent Image Processing System for Detection and Segmentation of Regions of Interest in Retinal Images

The automatic detection, segmentation, localization, and evaluation of the optic disc, macula, exudates, and hemorrhages are very important for diagnosing retinal diseases. One of the difficulties in detecting such regions of interest (RoIs) with computer vision is their symmetries, e.g., between the optic disc and exudates and also between exudates and hemorrhages. This paper proposes an original, intelligent, and high-performing image processing system for the simultaneous detection and segmentation of retinal RoIs. The basic principles of the method are image decomposition in small boxes and local texture analysis. The processing flow contains three phases: preprocessing, learning, and operating. As a first novelty, we propose proper feature selection based on statistical analysis in confusion matrices for different feature types (extracted from a co-occurrence matrix, fractal type, and local binary patterns). Mainly, the selected features are chosen to differentiate between similar RoIs. The second novelty consists of local classifier fusion. To this end, the local classifiers associated with features are grouped in global classifiers corresponding to the RoIs. The local classifiers are based on minimum distances to the representatives of classes and the global classifiers are based on confidence intervals, weights, and a voting scheme. A deep convolutional neural network, based on supervised learning, for blood vessel segmentation is proposed in order to improve the RoI detection performance. Finally, the experimental results on real images from different databases demonstrate the rightness of our methodologies and algorithms.


Introduction
Eye disorders can lead partial or even total loss of vision, significantly affecting life quality [1].In order to ensure timely medical intervention and treatment, their early detection relies on the analysis of retinal components such as optic disc (OD), macula (MA), blood vessels (BVs), exudates (EXs), and hemorrhages (HEs).Only intelligent processing of a retinal image can provide real measures of these specific ophthalmological zones.Adding complex image processing to dedicated neural networks and efficient descriptors and classifiers, the system for determining the RoIs in retinal images can be considered as an effective support for decision-making in early disease diagnosis.The system's input is the retinal image and the output is the degree of retinal RoI damage.Automatic detection based on specific imaging devices and real-time image processing can be used for the early-stage decision of an ophthalmologist.Computer-aided diagnosis in an intelligent system also has the advantages of computational power, speed, and continuous development.Many different approaches for retinal RoI identification and evaluation are found in the literature.However, accurate and simultaneous detection, localization, and evaluation of OD, MA, BV, EX, and HE is a difficult task for such a system.The general difficulty for simultaneous detection of retinal RoIs is the symmetries between them, like between OD and EX (color and, sometimes, segmented shapes) and between EX and HE (segmented shapes).It must have intelligent algorithms for image processing and interpretation and sufficient computational power.Usually, it must exclude the similarities (symmetries) and point out the differences.
The first RoI, the OD, is the easiest to detect, due to its shape, colors, size, and structure.Therefore, many papers are related to its detection and localization [2][3][4][5][6][7][8][9][10][11][12][13][14].For example, the maximum number of decisions from a set of detectors can be used to detect the center of the OD [7].By selecting some textural features and image decomposition in overlapping boxes, OD detection becomes more simple and efficient [4].Taking into account the density of large blood vessels in the OD region, the authors in [6,8] proposed fast and accurate methods for OD detection.Considering that the OD has the highest radial symmetry, a methodology based on the radial symmetry detector and separation of vascular information is used in [14] for OD localization.Sometimes, the OD and EX are similar in color, intensity, and shape.In pathological images, a large EX can be similar to the OD.To overcome this inconvenience, an automatic OD detection is proposed in [13], based on the symmetry axis of blood vessel distribution in this region.Some authors remove the OD before the detection and localization of EXs.To this end, the authors in [15] proposed a methodology to differentiate EXs from ODs by considering the features on different color components.The optic disc is identified as a region where the majority of pixels have a great difference on R and G channels.Next, a thresholding method is applied in order to extract the exudates from the obtained binary image.Some features like mean intensity, entropy, uniformity, standard deviation, and smoothness are computed for obtaining an improved accuracy.The mentioned characteristics have different values for EXs and ODs; so, EXs can be identified by removing the areas that do not have matching values of the previous characteristics.For the optic disc detection, in [16], an algorithm based on the 2D Gaussian filter and blood vessel segmentation was implemented.Also for OD detection Pereira et al. [17] applied a modern algorithm based on ant colony optimization and anisotropic diffusion.More recently, the authors in [18] proposed a supervised model for OD abnormality detection for different datasets, taking into account a deep learning approach from global retinal images.The results show a good accuracy and indicate potential applications.Automatic image analysis methods aiming to detect and segment MA and fovea are currently based on support vector machine classification [19], hierarchical image decomposition [20,21], statistical segmentation methods [22], deep learning [23], and pixel intensity characteristics [24].The authors in [7] presented a method which combines different OD and MA classifiers to benefit from their strong points (high accuracy and low computation time).With the purpose of detection of macular degeneration, Veras et al. [25] proposed some algorithms to localize and segment the MA.Sekhar et al. [26] addresses blood vessels localization, segmentation, and their spatial relations in order to detect the optic disc and macula.Based on pixel intensity (low), the authors in [27] also proposed the MA detection technique.They also proposed an algorithm for the OD diameter estimation.Akram et al. [28] presented a system based on artificial intelligence for evaluation of macular edema to support the ophthalmologists in early detection of the disease.They used a comprehensive characteristics group and a Gaussian-mixture-model-based classifier for accurate detection of MA.Chaudhry et al. [29] detect macula from an autofluorescence imaging device in connection with OD position.This automatic positioning is based on a top-down approach using vascular tree segmentation and skeletonization.In [30] the authors proposed a method to detect and localize the macula center using two steps: first, the context -based information is aggregated with a statistical type model in order to obtain the approximate localization of the macula; second, the aggregation of seeded mode tracking is used for final localization of the macula center.
Over time, many methods and approaches have been used for the detection of EXs (e.g., thresholding methods considering the green channel [31][32][33]).As a consequence that the illumination conditions are different, image preprocessing algorithms for correction are necessary.Kaur and Mittal [34] proposed an algorithm for EX detection which first identifies and rejects the Symmetry 2018, 10, 73 3 of 25 healthy areas based on their characteristics and second segments the EXs from the remaining regions using a region growing method.Bu et al. [35] presented a hierarchical procedure for EX detection in eye fundus images to solve some difficulties such as the recognition of similar-colored objects (e.g.cotton wool spots, optic disc, and the normal macular reflection).Using an artificial intelligence method, an automated identification of EXs is obtained in [36].After an initial primary processing phase, an image segmentation based on fuzzy clustering is used and a dataset of regions of interest is obtained.For each area, characteristics such as size, color, edge, and texture are extracted.The authors proposed a deep neural network, based on the extracted characteristics for the detection of areas containing EXs.The images are firstly filtered by applying a fuzzy clustering technique.Furthermore, the method in [37] localizes and evaluates the exudates, taking into account different image processing techniques, morphological filters, and image features.
The approaches for HE identification commonly used can also be grouped into different categories such as thresholding, segmentation, mathematical morphology, classification, clustering, neural network, etc.For example, Seoud et al. [38] described and validated a method for the identification of micro-aneurysms and HEs in retinal images, which can be used in the diagnosis and grading of diabetic retinopathy.Their contribution is a set of dynamic shape features which show the evolution of shapes under a processed image and allow differentiation between lesions and blood vessels.The method was tested against six image datasets, both public and private.A splat-based feature classification algorithm with applications to large, irregular HE detection in retinal images was proposed in [39].
For detection and segmentation of retinal RoIs, different classifiers were used: model-based (template matching), minimum distance between vector features, voting scheme, and, recently, neural network (NN).Garcia et al. [40] proposed an algorithm for computer -based identification of the lesions due to the diabetic retinopathy.The algorithm is integrated into a system in order to support the ophthalmologist decision in the detection and monitoring of the disease.The authors analyze the performance of three different NN classifiers in automated detection of lesions or exudates: support vector machine, radial basis function, and multilayer perceptron.By applying different criteria and testing on 67 images, they conclude that the multilayer NN classifier based on image criterion has the best results.This research was extended in [41], where new criteria and classifiers (e.g., logistic regression) were tested against those in the previous paper.Now, the relationship between the extracted set of characteristics and the class of objects is highlighted.Again, the multilayer perceptron performed the best out of the selected methods, but only by a small margin in comparison with logistic regression.Moreover, in [42] a new neural network supervised approach for blood vessel detection was proposed.The proposed algorithm uses an NN arhitecture for pixel classification and creates a vector of moment-invariants-based and gray-level features for pixel representation.The performance of the algorithm was tested on STARE [43] and DRIVE [44] datasets, with better performance than other existing solutions.The authors in [45] used CNNs (Convolutional NNs) to detect hemorrhages.They tried to simplify and speed it up by heuristically classifying samples of the training phase.
The authors in [46] presented a methodology and their implementation which is based on a voting procedure with the aim of raising the decision accuracy.They proposed the technique of majority voting and probability approach to evaluate the correctness of decision.The important goal of the work consisted in solving detection problems in image processing algorithms which are geometrically constrained.For example, the algorithm performance was tested with good accuracy for OD identification on the MESSIDOR dataset [47].In order to increase the performances of classifications, the voting scheme and NN approaches were combined in [48].In this case, two connected NNs were considered: one for feature selection and the other for classification, based on local detectors.Finally, a voting scheme with different weights for local classifiers was used, taking into account the results from associated confusion matrices.
In order to increase the efficiency of the classifiers, primary processing of retinal images is necessary.Therefore, Nayak et al. [49,50] presented algorithms for identification and monitoring glaucoma and diabetic retinopathy, using primary processing l (histogram equalization, filtering, and morphological operations).For a particular disease, specific characteristics are established.Based on these features, an artificial NN classifier identifies the presence of a particular disease and its evolution.The authors in [51] introduce an algorithm based on morphological operations for the identification of micro-aneurysms and HEs in retinal images.The image is firstly primary processed and then, important regions of interest such as OD, blood vessels, and fovea are successively removed from the retinal image.Only the HEs and micro-aneurysms remain.The obtained results are similar to those of other papers as sensitivity and specificity.The method for computer aided detection of the diabetic retinopathy, developed in [52], consists of the following phases: image enhancement by local histogram equalization, Gabor filters, SVM (support vector machine) classifier, HE detection, fovea localization, and diabetic retinopathy identification.The algorithms were implemented and tested on free database HRF [53] with acceptable sensitivity and specificity.The authors in [54] presented a method and algorithms for successive detection of important RoIs in retinal images using an artificial neural network (ANN) and k-means clustering.The ANN was simulated on a common PC and it created time-consuming difficulties for real implementation.The application was developed only for the MESSIDOR database.
As a technical novelty, this paper proposes an Intelligent System for diagnosis and evaluation of regions of interest from Retinal Images (ISRI), with multiple functions: F1, OD detection; F2, MA detection; F3, EX detection and size evaluation; and F4, HE detection and size evaluation.All of these four functions are addressed.Two main contributions of the authors can be mentioned: (a) proper feature selection based on statistical analysis in confusion matrices for different feature type, and (b) the weight-based fusion of the local classifiers (associated with features) into global classifiers (corresponding to the RoIs).The paper is organized as follows: In Section 2, the methodology and algorithms for primary processing of retinal images with the goal of RoI detection, segmentation, and size estimation are described and implemented.In Section 3 the experimental results obtained using images from two databases (STARE and MESSIDOR) are reported and analyzed.Finally, the discussions and conclusions are presented in Sections 4 and 5, respectively.

ISRI Architecture
We propose an algorithm and software implementation for the identification and evaluation of four RoIs from retinal images: OD, MA, EX, and HE.It can be considered as an intelligent image processing system and diagnosis support for eye diseases.The basic methodology for determination of the mentioned RoIs is texture analysis in small patches (boxes).ISRI has three inputs (retinal image, I; box dimension setting, BDS; and manual box selection, MBS-the last two only for the learning phase) and three outputs (RoI type, RoI; RoI localization, L(i,j); and size estimation of RoI, S [%]).The ISRI concept is shown in Figure 1.As can be seen, it has two phases: a Learning Phase (LP), considered as a calibration one, and an Operating Phase (OP), considered as a measuring one.Also, ISRI has an Input Module (IM) for image preprocessing which is active in both phases.
The IM contains five blocks configured in a pipeline structure: Image Buffer (IB), Decomposition on Color Components (DCC), Primary Processing (PP), Box Division (BD), and Blood Vessel Segmentation (BVS).The retinal image may come from an Image Acquisition Device (IAD) or from an Image Database (DB) and it is stored in the IB to then be processed.First, it is decomposed in color components of RGB and HSV (Hue-Saturation-Value) spaces by DCC.This image is processed by the PP block to improve the representation using the CLAHE algorithm (Contrast-Limited Adaptive Histogram Equalization [38]).Images to be analyzed are further decomposed in boxes by the BD block with the aid of two algorithms: the sliding box algorithm (Figure 2a) and the nonoverlapping box algorithm (Figure 2b).The box dimension is manually selected (Box Dimension Selection, BDS) depending of the image resolution, the algorithm for division, and the type of RoI.The last operation in this initial image processing phase is the blood vessel segmentation by the BVS block.The LP contains two successive steps: Step 1, calculating the values of a set of features for the regions of interest (Feature Calculation module, FC); and Step 2, selecting effective features for class representation (Feature Selection module, FS).For the LP, manually selected boxes are used (Manual Box Selection, MBS).There are also two successive steps in the OP: Step 1, calculating the values of the selected features corresponding to different RoIs for the boxes extracted from the input image (Computing Selected Features module, CSF); and Step 2, assigning the class label to the RoI (Classifier module, C).RoI_C selects the box type (OD, MA, EX, HE) as a consequence of the RoI classifiers' decisions.The size evaluation of the detected RoI is done by segmentation and "1"s counting in the RoI_E block.RoI localization is done by the RoI_L block as the position of the upper corner (i,j) of the investigated box.
In order to calibrate and test the system, two well-known public databases were used: STARE [43] and MESSIDOR [47].They contain both healthy subjects and pathological cases with abnormalities.The STARE database consists of 400 retinal images captured by a TopCon TRV-50 fundus camera.It also contains two sets of manual segmentations obtained by two ophthalmologists used as references.The MESSIDOR database contains three sets of eye fundus images (1200 color images extracted with a color video 3CCD camera on a TopCon TRC NW6 nonmydriatic retinograph with a 45 degree field of view) with 8 bits per color plane at 1440 × 960, 2240 × 1488, or 2304 × 1536 pixels.Each image is in TIFF format and has an associated medical annotation.

Input Module
The input module has four main functions: image decomposition on color channels, primary processing of each component, box division on each color channel, and blood vessel segmentation in each box.Because the retinal images have zones with different brightness and also contain noises with different small patterns in the background, primary processing for image enhancement is needed.Therefore, the first operation in primary processing is noise rejection.Due to efficiency and edge preserving, for noise reduction, a median filter (3 × 3 kernel) and morphological filter (erosion and dilation) were used in the PP for each component of interest of the color spaces (R, G, B, and H).Because the eye fundus images have usually low local contrast the histogram equalization is performed.Another primary processing operation, binarization, is used in the blood vessel segmentation (IM) and also in the final RoI segmentation (OP).The above operations are commonly used and will not be detailed in this work.Due to primary processing methods (noise elimination), EXs or HEs of less than 3 × 3 pixels cannot be detected.
As we mentioned above, two types of box division algorithms are used.For OD and MA localization we used a sliding box algorithm as it is necessary that a box contains the entire RoI to be analyzed.We have chosen the sliding window approach for OD and MA localization because it is more possible to get a fine matching of these regions of interest in a sliding window rather than in a traditional approach (nonoverlapping-fixed-windows).For EX and HE, a nonoverlapping algorithm is used because it is simpler in terms of computing effort and it is necessary for the global evaluation of diseases (the spread on the entire image).The algorithm for creating the sliding boxes starts with the box from the upper left corner of the image and a specified step sliding (Figure 2a).The analyzed box slides until it touches the right margin of the image.Then, the sliding box returns to the beginning and performs a vertical down slide, moving on to the next row.The algorithm runs until the lower right corner is touched.The nonoverlapping box algorithm for EX and HE is presented in Figure 2b.In this case, the box division has a smaller range than OD or MA (¼ of the sliding box dimension).For high-precision localization, an optimum box size must be chosen depending on the database and a good framing of the RoI (the minimum dimension framing the RoI).This dimension is experimentally chosen (input BDS).At each step of the sliding box algorithm, a new box is considered.Thus, a matrix of boxes (grid of boxes) is created.The box has a unique ID consisting of the row number and column number in the grid of boxes.The size of the sliding step depends of the resolution of the input images (like the patch dimension); for example, in the case of the STARE database it was experimentally chosen to be 10 pixels, while in the case of the MESSIDOR database it was 15 pixels.
Blood vessel detection is necessary only for F1 and F4 functions (OD and HE detection).The blood vessel segmentation (BVS block) is necessary in two situations: for more accurate detection of the OD in function F1 (in this case blood vessels need to be highlighted) and for more accurate detection of the HEs in function F4 (in this case blood vessels need to be removed).In other cases, blood vessel detection is not considered.The input to BVS is a box obtained from box division algorithms on the green channel (for better contrast of blood vessels).This box is divided into smaller boxes of dimension 27 × 27 pixels using a sliding box algorithm with a sliding step size of 1 pixel.The new boxes represent the input in the main module of BVS, namely the CNN.The CNN has three convolutional layers (one of which is fully connected), two pooling layers (average pooling layers), one dropout layer, and a Softmax layer.The CNN-based architecture of BVS is presented in Figure 3 [55].The input data for the proposed neural network (the input layer) are boxes of 27 × 27 pixels on the green channel (STARE DB case).For images with higher resolution like those from MESSIDOR, larger boxes (55 × 55 pixels) can be considered but in this case an adjustment of the CNN is necessary.(Note that the learning phase of the blood vessel segmentation does not have to be repeated for different resolutions of the input images.However, it is possible, on demand, with consideration of the training time.)These boxes are passed through the entire network to get their classification into two classes: vessel (V) and nonvessel (NV).The second layer with 20 neurons (local 3 × 3 filter functions) is a convolutional type with both stride and padding equal to 1.The third layer is an average pooling one, which performs the image subsample, based on the mean in a 3 × 3 sliding box.
The new image (box) has a resolution of 9 × 9 (stride equal 3).The next layer with 50 neurons (filters) is also a convolutional type, with both stride and padding equal to 1 and, consequently, with output 9 × 9 pixels.The fifth layer is a new pooling layer with stride equal to 3 (padding 0) which reduces the box dimension to 3 × 3. It follows a dropout layer for the regularization of the network (the rate is equal to 0.5).The penultimate layer is a convolutional one of the fully connected type with 2 neurons (for classification), stride 1, and padding 0. The final layer of the CNN is called Softmax, is based on a loss function measure, and has two outputs: V and NV.The blood vessel segmentation was made by pixelwise classification.The training of the CNN used 50 epochs and 4000 small boxes of 27 × 27 pixels (2000 with vessels and 2000 without vessels) on the green channel.The CNN for blood vessel segmentation was implemented using MatConvNet.For high-precision localization, an optimum box size must be chosen depending on the database and a good framing of the RoI (the minimum dimension framing the RoI).This dimension is experimentally chosen (input BDS).At each step of the sliding box algorithm, a new box is considered.Thus, a matrix of boxes (grid of boxes) is created.The box has a unique ID consisting of the row number and column number in the grid of boxes.The size of the sliding step depends of the resolution of the input images (like the patch dimension); for example, in the case of the STARE database it was experimentally chosen to be 10 pixels, while in the case of the MESSIDOR database it was 15 pixels.
Blood vessel detection is necessary only for F1 and F4 functions (OD and HE detection).The blood vessel segmentation (BVS block) is necessary in two situations: for more accurate detection of the OD in function F1 (in this case blood vessels need to be highlighted) and for more accurate detection of the HEs in function F4 (in this case blood vessels need to be removed).In other cases, blood vessel detection is not considered.The input to BVS is a box obtained from box division algorithms on the green channel (for better contrast of blood vessels).This box is divided into smaller boxes of dimension 27 × 27 pixels using a sliding box algorithm with a sliding step size of 1 pixel.The new boxes represent the input in the main module of BVS, namely the CNN.The CNN has three convolutional layers (one of which is fully connected), two pooling layers (average pooling layers), one dropout layer, and a Softmax layer.The CNN-based architecture of BVS is presented in Figure 3 [55].The input data for the proposed neural network (the input layer) are boxes of 27 × 27 pixels on the green channel (STARE DB case).For images with higher resolution like those from MESSIDOR, larger boxes (55 × 55 pixels) can be considered but in this case an adjustment of the CNN is necessary.(Note that the learning phase of the blood vessel segmentation does not have to be repeated for different resolutions of the input images.However, it is possible, on demand, with consideration of the training time.)These boxes are passed through the entire network to get their classification into two classes: vessel (V) and nonvessel (NV).The second layer with 20 neurons (local 3 × 3 filter functions) is a convolutional type with both stride and padding equal to 1.The third layer is an average pooling one, which performs the image subsample, based on the mean in a 3 × 3 sliding box.
The new image (box) has a resolution of 9 × 9 (stride equal 3).The next layer with 50 neurons (filters) is also a convolutional type, with both stride and padding equal to 1 and, consequently, with output 9 × 9 pixels.The fifth layer is a new pooling layer with stride equal to 3 (padding 0) which reduces the box dimension to 3 × 3. It follows a dropout layer for the regularization of the network (the rate is equal to 0.5).The penultimate layer is a convolutional one of the fully connected type with 2 neurons (for classification), stride 1, and padding 0. The final layer of the CNN is called Softmax, is based on a loss function measure, and has two outputs: V and NV.The blood vessel segmentation was made by pixelwise classification.The training of the CNN used 50 epochs and 4000 small boxes of

Learning Phase
The learning phase is necessary because this system can be used for different DBs and different fundus cameras.It can be mentioned that the set parameters in the LP can be stored and are not required for each use of the system.LP is usually necessary when the imaging device for the retina is changed, another retinal database is used, or other parameters are required.Therefore, in the LP, the system calibration is practically done.We used a supervised method for system training because, in this case, it performs better and requires less computational effort than an unsupervised one.To this end, a set of candidate features (CF, Table 1) for each box from a learning set (LS) was calculated.Five classes are considered: OD, MA, EX, HE (as RoIs), and the class of the rest, NR (Non-RoI).The LP block diagram is presented in Figure 4.In order to select the features used in the segmentation process, the LP is divided into two tasks: (1) the establishment of the representative value of the features for each class, and (2) feature selection and establishment of classifiers for the segmentation process of each class.Correspondingly, LS is also divided into two disjoint sets: LS1 (learning set for Task 1) and LS2 (learning set for Task 2).
Each class has 10 representative boxes in LS1 and 10 representative boxes in LS2 (experimentally and arbitrarily chosen).Thus, the number of boxes is 50 for LS1 (5 × 10 = 50 boxes) and also for LS2 (2).
Task 1.For each feature F from (CF), the representative of each class is established by averaging the corresponding feature values on the representative class boxes LS1: The exemplification of the representative feature calculation for the OD class is done in the next section (Table 2).
Task 2. This task consists of feature selection and weight assignation for classifiers.Because the classification process takes into account heterogeneous features, it is convenient to consider the image classification based on weighted local classifiers.To this end, another set (LS2) of 50 boxes (10 for each class) is used.Taking into account the representatives of classes established in Task 1, each box from the learning set LS2 is classified based on minimum distance to the representatives of the classes for each feature.The considered distances are the absolute differences in the case of scalar features and the Minkowski distance of order 1 in the case of the Local Binary Pattern (LBP) histogram (Table 1).

Feature Evaluation for Class Representatives
In order to calibrate the system for different features and color channels (corresponding to RoIs) the following set of texture features are investigated [56,57]: statistical features of order 1 (Im, mean intensity), features extracted from co-occurrence matrices (Con, contrast; En, energy; Ent, entropy;

Learning Phase
The learning phase is necessary because this system can be used for different DBs and different fundus cameras.It can be mentioned that the set parameters in the LP can be stored and are not required for each use of the system.LP is usually necessary when the imaging device for the retina is changed, another retinal database is used, or other parameters are required.Therefore, in the LP, the system calibration is practically done.We used a supervised method for system training because, in this case, it performs better and requires less computational effort than an unsupervised one.To this end, a set of candidate features (CF, Table 1) for each box from a learning set (LS) was calculated.Five classes are considered: OD, MA, EX, HE (as RoIs), and the class of the rest, NR (Non-RoI).The LP block diagram is presented in Figure 4.In order to select the features used in the segmentation process, the LP is divided into two tasks: (1) the establishment of the representative value of the features for each class, and (2) feature selection and establishment of classifiers for the segmentation process of each class.Correspondingly, LS is also divided into two disjoint sets: LS1 (learning set for Task 1) and LS2 (learning set for Task 2).
Each class has 10 representative boxes in LS1 and 10 representative boxes in LS2 (experimentally and arbitrarily chosen).Thus, the number of boxes is 50 for LS1 (5 × 10 = 50 boxes) and also for LS2 (2).
Task 1.For each feature F from (CF), the representative of each class is established by averaging the corresponding feature values on the representative class boxes LS1: F OD , F MA , F EX , F HE , F NR .The exemplification of the representative feature calculation for the OD class is done in the next section (Table 2).
Task 2. This task consists of feature selection and weight assignation for classifiers.Because the classification process takes into account heterogeneous features, it is convenient to consider the image classification based on weighted local classifiers.To this end, another set (LS2) of 50 boxes (10 for each class) is used.Taking into account the representatives of classes established in Task 1, each box from the learning set LS2 is classified based on minimum distance to the representatives of the classes for each feature.The considered distances are the absolute differences in the case of scalar features and the Minkowski distance of order 1 in the case of the Local Binary Pattern (LBP) histogram (Table 1).In order to calibrate the system for different features and color channels (corresponding to RoIs) the following set of texture features are investigated [56,57]: statistical features of order 1 (Im, mean intensity), features extracted from co-occurrence matrices (Con, contrast; En, energy; Ent, entropy; Hom, homogeneity; Cor, correlation; Var, variance), fractal type features (Dm, mass fractal dimension; L, lacunarity) and LBP type features (LBP, histogram of Local Binary Pattern) (Table 1).The mean intensity Im is a statistical characteristic, calculated as the average of the pixel intensity I(i,j) on different color channels for each box.M × M represents the box dimension.The co-occurrence matrix C d is computed as the average of eight co-occurrence matrices of dimension K × K, where K represents the number of levels in the monochrome image, with a displacement d and directions at 0, 45    For DFD calculation, a 3D configuration is created in order to be covered with boxes of dimension r × r× s, where r is the division factor and s is the box height s = rImax/M [58].Here Imax represents the maximum value of the intensity level and M × M is the image dimension.Each pixel (i,j) from the square Sr(u,v) of dimension r × r has an intensity value I(i,j) which belongs to a box from the stick of boxes with height s.The maximum value of I(i,j), for     ,, r i j S u v  , is noted by p(u,v)and the minimum value is noted by q(u,v) (Table 1).It can be considered that nr(u,v), which is the difference between p(u,v) and q(u,v), covers the 3D relief of the monochrome image created by Sr(u,v) and covers the entire configuration created on the monochrome image I. Next, the DFD algorithm is similarly with the standard box counting algorithm.The lacunarity, which characterizes the homogeneity and gaps, is a powerful tool in medical image classification and it is also a fractal type feature [59].The lacunarity can be considered as a complementary feature because it can differentiate between two textures of the same fractal dimension.A high value of lacunarity shows that the texture has many holes and great heterogeneity.For DFD calculation, a 3D configuration is created in order to be covered with boxes of dimension r × r× s, where r is the division factor and s is the box height s = rI max /M [58].Here I max represents the maximum value of the intensity level and M × M is the image dimension.Each pixel (i,j) from the square S r (u,v) of dimension r × r has an intensity value I(i,j) which belongs to a box from the stick of boxes with height s.The maximum value of I(i,j), for (i, j) ∈ S r (u, v), is noted by p(u,v)and the minimum value is noted by q(u,v) (Table 1).It can be considered that n r (u,v), which is the difference between p(u,v) and q(u,v), covers the 3D relief of the monochrome image created by S r (u,v) and I(i,j), for (i, j) ∈ S r (u, v).Therefore, ∑ u ∑ v n r (u, v) covers the entire configuration created on the monochrome image I. Next, the DFD algorithm is similarly with the standard box counting algorithm.The lacunarity, which characterizes the homogeneity and gaps, is a powerful tool in medical image classification and it is also a fractal type feature [59].The lacunarity can be considered as a complementary feature because it can differentiate between two textures of the same fractal dimension.A high value of lacunarity shows that the texture has many holes and great heterogeneity.

Mean Intensity
Another feature successfully used in texture classification is the LBP histogram.For each pixel, a neighborhood of dimension (2h + 1) × (2h + 1) is considered.The contour of the neighborhood is of 8h = n pixels and the texture defined in the neighborhood of the central pixel is described as a vector of elements where v C represents the value of the central pixel p C and v 1 , v 2 , . . ., v n are the values of its neighbors p 1 , p 2 , . . ., p n (p 1 is the pixel from upper left corner of the neighborhood).The neighborhood pixels are tested clockwise.Based on the type of difference obtained (negative or positive) between the central pixel v C and each neighbor (3), defined in V T , the LBP code [59] can be computed as a sequence of binary numbers (by concatenating the result of each difference), as described in Equation (4).
After computing the binary format, the LBP code is converted to decimal format (Equation ( 5)).
The LBP histogram vector H = [H 0 , H 1 , . . ., H n − 1 ] is calculated as a histogram of values of LBP codes of all box pixels.
At the end of this task, the representatives of the classes are the averages of the feature values for the boxes considered in LS1 (Tables 2-4).

Feature Selection and Establishment of Classifiers 1. Feature Selection
Feature selection is the most difficult problem due to the similarities or symmetries between the retinal RoIs-OD and EX (color and segmented shapes) or EX and HE (segmented shapes).It is done in the learning phase by the means of confusion matrices (CMs), calculated for each feature from the previous set and for each color component.CMs (Figure 5) are computed by considering sets of 10 manually selected images (boxes) for each class-OD, MA, EX, HE, and NR.In CM, ODa represents the total number of actual boxes which contain OD, MAa represents the total number of actual boxes which contain MA, and so on.Similarly, ODp represents the total number of predicted boxes which contain OD, MAp represents the total number of predicted boxes which contain MA, and so on.OD/OD represents the total number of boxes which contain the OD and were correctly predicted.Similarly, MA/MA represents the total number of boxes which contain the MA and were correctly predicted, OD/MA represents the number of boxes containing OD classified as MA, and so on.tr is the total number of boxes correctly classified, meaning the trace of CM.In order to select a feature as efficient for the classification process, the trace of CM is computed (Equation ( 6)).tr(CM) = OD/OD + MA/MA + EX/EX + HE/HE + NR/NR (6) Using the concept of the confusion matrix (Figure 5), three selection criteria are used: • If tr(CM) ≥ 0.75 • Total, where Total (Equation ( 7)) represents the number of boxes considered, then the tested feature is a relevant one and it is kept; generally, this feature can be used for more RoIs.
In our case, Total = 50.

•
To include a feature into the set of selected features for the class C i , the corresponding value C i /C i from its CM must be higher than 0.75C i a, where C i a represents the total boxes containing C i used in the learning phase, Task 2 (LS2)-in our case, C i a = 10.These criteria act as the performance evaluation, in order to select the most efficient features from a given set of features.

•
If the distance (Minkowski of order 1 [60]) between two CMs (corresponding to two features) is low, then one of the features can be rejected as redundant.
For simplicity, we considered the following notation: Symmetry 2018, 10, x FOR PEER REVIEW 11 of 26 Similarly, MA/MA represents the total number of boxes which contain the MA and were correctly predicted, OD/MA represents the number of boxes containing OD classified as MA, and so on.tr is the total number of boxes correctly classified, meaning the trace of CM.In order to select a feature as efficient for the classification process, the trace of CM is computed (Equation ( 6)).

  tr CM OD OD MA MA EX EX HE HE NR NR
Using the concept of the confusion matrix (Figure 5), three selection criteria are used: , where Total (Equation ( 7)) represents the number of boxes considered, then the tested feature is a relevant one and it is kept; generally, this feature can be used for more RoIs.
In our case, Total = 50.


To include a feature into the set of selected features for the class C i , the corresponding value C i /C i from its CM must be higher than 0.75C i a, where C i a represents the total boxes containing C i used in the learning phase, Task 2 (LS2)-in our case, C i a = 10.These criteria act as the performance evaluation, in order to select the most efficient features from a given set of features.


If the distance (Minkowski of order 1 [60]) between two CMs (corresponding to two features) is low, then one of the features can be rejected as redundant.
For simplicity, we considered the following notation:

Calculation of Feature Weights and Establishment of Classifiers
In the Feature Selection task, a set of efficient features was selected for each class C i , i = 1, 2, 3, 4 (e.g., excluding NR).Then, for each class C i and selected feature F j , from the corresponding CM a weight wij is calculated using Equation ( 9) and a local classifier Dij (11) is considered and associated with wij.
A local classifier Dij for the class C i and the feature F j is implemented in the following way.First, a confidence interval j i CI is created (Equation ( 10)), where j i F min and j i F max represent, respectively, the minimum and the maximum of the values F j for the class C i obtained from the LS1 set.If F j = LBPH, then a confidence band for the histogram is considered.The classifier Dij is given in Equation (11), where B is the box to be classified.

Calculation of Feature Weights and Establishment of Classifiers
In the Feature Selection task, a set of efficient features was selected for each class C i , i = 1, 2, 3, 4 (e.g., excluding NR).Then, for each class C i and selected feature F j , from the corresponding CM a weight w ij is calculated using Equation ( 9) and a local classifier D ij (11) is considered and associated with w ij .
A local classifier D ij for the class C i and the feature F j is implemented in the following way.
First, a confidence interval CI j i is created (Equation ( 10)), where F j i min and F j i max represent, respectively, the minimum and the maximum of the values F j for the class C i obtained from the LS1 set.If F j = LBPH, then a confidence band for the histogram is considered.The classifier D ij is given in Equation (11), where B is the box to be classified.
Based on local classifiers D ij for each class C i , a global classifier D i is developed (Equation ( 12)).Thus, by considering the local classifier D ij , an intelligent processing network for classification predicts that the patch B belongs to class C i , i = 1, 2, 3, 4, if the associated weighting vote D i (B) (Equation ( 12)) is maximum and higher than 0.7 × s ∑ j=1 w ij , where s is the number of features associated with the class C i .If not, then B belongs to the class C 5 = NR.The threshold 0.7 was experimentally chosen.This can be considered as classification process based on a voting scheme with a threshold.
Taking into account the selection criteria, in the learning phase, Step 2, for each RoI a dedicated classifier is built: D 1 for the OD class, D 2 for the MA class, D 3 for the EX class, and D 4 for the HE class.These classifiers are visible in the operating phase (Figure 6).So, each classifier has specific features (Table 5): D 1 -ImG, ConG, FDG, and LBPH; D 2 -ConG, EnG, LG, and LB; D 3 -ImG, ConG, EnG, HomG, and FDG; D 4 -ImR, FDR, LB, and LR.

Operating Phase
The OP (Figure 6) classifies the boxes obtained by division of the retinal image under investigation and also determines some parameters of the detected RoIs like position and size.The block diagram from Figure 6 shows a multilevel structure of this phase.The first level does the calculation of features established in the learning phase.All features are computed in a parallel manner: contrast on green (ConG), energy on green (EnG), homogeneity on green (HomG), mean intensity on green (ImG), mean intensity on red (ImR), differential fractal dimension on green (FDG) differential fractal dimension on red (FDR), lacunarity on blue (LB), lacunarity on green (LG), lacunarity on red (LR), and local binary pattern on H (LBPH).The second layer is the classifiers' layer for OD, MA, EX, and HE.For each feature involved, the corresponding weight (from the learning phase) is assigned.The classifiers, based on a voting scheme, use the representatives of the classes also established in the learning phase.The classifier outputs are equal to 1 if the box contains the corresponding RoI and 0 if not.The last layer of the structure represents the output layer (module) of ISRI and establishes the type of RoI-OD, MA, EX, HE, or NR (by RoI detection module)-RoI position, and the evaluation of the size of RoI (as percent from box area).To this end, a binarization procedure is needed.Below, the algorithms for RoI detection and evaluation are presented (Algorithms 1-4).Images to be analyzed; selected features F j for OD detection: ImG, ConG, FDG, and LBPH; the weights for selected features, w1j; and the intervals i   Images to be analyzed; selected features F j for OD detection: ImG, ConG, FDG, and LBPH; the weights for selected features, w1j; and the intervals i Calculate the total percent occupation of HE in the whole image I; 15 Return BHE, BHE position, percent occupation of HE.

Experimental Results
For the testing and validation of the proposed sensor we used 100 boxes from 60 images (Figure 7) for the learning phase and 200 images for the operating phase.As mentioned in Section 2, 50 boxes (10 for OD, 10 for MA, 10 for EX, 10 for HE, and 10 for NR; Figure 8) were used in feature selection and another 50 boxes were used in classifier implementation.A box dimension of 128 × 128 pixels was used for STARE and a box dimension of 256 × 256 pixels was used for MESSIDOR.The sliding step sizes (BDS) were 10 pixels for the STARE DB and 15 pixels for the MESSIDOR DB.In BVS, the sliding step size was 1 pixel for both.The same features were selected both in the STARE case and the MESSIDOR case.Examples of feature calculations for OD class representations are presented in Table 2 (36 features) and Table 3 (LBP feature).The last column contains the average values of the row.The results concerning the representatives for the classes OD, MA, EX, and HE are presented in Table 4.The features from CF (Table 1) were tested on R, G, B, and H color components and, in concordance with the selection criterion, the selected features and the corresponding color channels are marked with green in Table 4.
The confusion matrices are calculated using the set LS2 for all features and all classes.Some examples of CMs are presented in Figure 9.As can be observed, the values greater than 8 (greater than 0.75C i a-see feature selection from the sub-section 2.3.2) are highlighted with green.For example, according to Figure 9a, ImG is a feature selected for OD and EX detection.Similarly, Figure 9b

Experimental Results
For the testing and validation of the proposed sensor we used 100 boxes from 60 images (Figure 7) for the learning phase and 200 images for the operating phase.As mentioned in Section 2, 50 boxes (10 for OD, 10 for MA, 10 for EX, 10 for HE, and 10 for NR; Figure 8) were used in feature selection and another 50 boxes were used in classifier implementation.A box dimension of 128 × 128 pixels was used for STARE and a box dimension of 256 × 256 pixels was used for MESSIDOR.The sliding step sizes (BDS) were 10 pixels for the STARE DB and 15 pixels for the MESSIDOR DB.In BVS, the sliding step size was 1 pixel for both.The same features were selected both in the STARE case and the MESSIDOR case.Examples of feature calculations for OD class representations are presented in Table 2 (36 features) and Table 3 (LBP feature).The last column contains the average values of the row.The results concerning the representatives for the classes OD, MA, EX, and HE are presented in Table 4.The features from CF (Table 1) were tested on R, G, B, and H color components and, in concordance with the selection criterion, the selected features and the corresponding color channels are marked with green in Table 4.
The confusion matrices are calculated using the set LS2 for all features and all classes.Some examples of CMs are presented in Figure 9.As can be observed, the values greater than 8 (greater than 0.75C i a-see feature selection from the Section 2.3.2) are highlighted with green.For example, according to Figure 9a, ImG is a feature selected for OD and EX detection.Similarly, Figure 9b           process (Figure 11, boxes labeled with SB).The results of size evaluation of EXs and HEs are presented in Tables 7 and 8, respectively.Then, by summing the partial areas of EXs (HEs) from boxes belonging to the same image, a global evaluation in pixels and percent can be obtained (Table 9).

Discussion
For testing and evaluation of the proposed system we used a set of 200 images from STARE and MESSIDOR databases (S1: 50 normal, without EX or HE; S2: 50 only with EX; S3: 50 only with HE; S4: 50 with both EX and HE).From S1, S2, S3, and S4, OD and MA were detected; from S2 and S4, EX was evaluated; and from S3 and S4, HE was evaluated.The flexibility of the system allows the use of different box dimensions for different databases.For box classification, the performances are presented in Table 10, where TP is true positive, TN is true negative, FP is false positive, and FN is false negative.The obtained performances in terms of accuracy are presented and compared with other methods in Table 11.It can be observed that only our system performs the detection of all RoIs and the results are similar to or better than those of other methods.
In this particular experiment the blood vessel analysis was not considered for OD detection as it has been detailed in a previous work by the authors [6].By considering the pixel density of the Symmetry 2018, 10, 73 22 of 25 blood vessels in the candidate boxes to contain OD, it was experimentally shown that the accuracy detection increased.Contrarily, by eliminating the blood vessels from the retinal image, HE detection and segmentation gave better results.As a direction for feature research, we propose to extend the system functions to other parameters like OD diameter, cup/OD diameter ratio, and blood vessel segmentation of an entire image.different box dimensions for different databases.For box classification, the performances are presented in Table 10, where TP is true positive, TN is true negative, FP is false positive, and FN is false negative.The obtained performances in terms of accuracy are presented and compared with other methods in Table 11.It can be observed that only our system performs the detection of all RoIs and the results are similar to or better than those of other methods.In this particular experiment the blood vessel analysis was not considered for OD detection as it has been detailed in a previous work by the authors [6].By considering the pixel density of the blood vessels in the candidate boxes to contain OD, it was experimentally shown that the accuracy detection increased.Contrarily, by eliminating the blood vessels from the retinal image, HE detection and segmentation gave better results.As a direction for feature research, we propose to extend the system functions to other parameters like OD diameter, cup/OD diameter ratio, and blood vessel segmentation of an entire image.

Conclusions
The proposed method for the detection and evaluation of retinal RoIs takes into consideration information about pixel distribution (second-order statistics), color, and fractal textures.An intelligent system is thus implemented which contains three main modules: preprocessing, learning, and operating.The retinal images are decomposed into boxes (of variable size) for different RoIs, with the aid of a sliding box algorithm or a nonoverlapping box algorithm.Furthermore, a CNN structure is proposed for blood vessel segmentation.In the learning phase, effective feature selection is made based on the primary statistic information provided by confusion matrices and minimum distance classifiers.Two important cues are considered for the operational mode: weights given to the selected features and confidence intervals associated with the selected features for each class (representatives of classes).Conclusively, statistical analysis and comparison with similar works validated the implemented system.

Conclusions
The proposed method for the detection and evaluation of retinal RoIs takes into consideration information about pixel distribution (second-order statistics), color, and fractal textures.An intelligent system is thus implemented which contains three main modules: preprocessing, learning, and operating.The retinal images are decomposed into boxes (of variable size) for different RoIs, with the aid of a sliding box algorithm or a nonoverlapping box algorithm.Furthermore, a CNN structure is proposed for blood vessel segmentation.In the learning phase, effective feature selection is made based on the primary statistic information provided by confusion matrices and minimum distance classifiers.Two important cues are considered for the operational mode: weights given to the selected features and confidence intervals associated with the selected features for each class (representatives of classes).Conclusively, statistical analysis and comparison with similar works validated the implemented system.

Figure 1 .
Figure 1.Intelligent System for diagnosis and evaluation of regions of interest from Retinal Images concept (RoI -Region of Interest).

Figure 1 .Figure 2 .
Figure 1.Intelligent System for diagnosis and evaluation of regions of interest from Retinal Images concept (RoI -Region of Interest).

Figure 2 .
Figure 2. Exemplification of box division of the analyzed image: (a) sliding box algorithm; (b) nonoverlapping box algorithm.

1 .
Feature Evaluation for Class Representatives . The mean intensity Im is a statistical characteristic, calculated as the average of the pixel intensity I(i,j) on different color channels for each box.M × M represents the box dimension.The co-occurrence matrix Cd is computed as the average of eight co-occurrence matrices of dimension K × K, where K represents the number of levels in the monochrome image, with a displacement d and directions at 0, 45 o , 90 o , 135 o , 180 o , 225 0 , 270 0 , and 315 o .For Con, Ent, En, Hom, Cor, and Var, the parameter d (distance in pixels) is considered as a subscript.

Figure 4 .
Figure 4. Block diagram for the learning phase.

Figure 4 .
Figure 4. Block diagram for the learning phase.

Figure 5 .
Figure 5. Confusion matrix (CM) for one feature F.

Figure 6 .
Figure 6.Block diagram for the operating phase (OP) and output module (included).
(a) OD Detection and Localization Algorithm 1: Operating Phase-OD Inputs:

Figure 6 .
Figure 6.Block diagram for the operating phase (OP) and output module (included).

(a) OD Detection and Localization Algorithm 1 : 26 Figure 6 .
Figure 6.Block diagram for the operating phase (OP) and output module (included).
determines the classifiers which use ConG, Figure 9c determines the classifiers which use EnG, Figure 9d determines the classifiers which use HomG, Figure 9f determines the classifiers which use LBPH, Figure 9g determines the classifiers which use FDG, and Figure 9h determines the classifiers which use LR.It can be observed in Figure 9e that EntG is not a proper feature for any class.
determines the classifiers which use ConG, Figure 9c determines the classifiers which use EnG, Figure 9d determines the classifiers which use HomG, Figure 9f determines the classifiers which use LBPH, Figure 9g determines the classifiers which use FDG, and Figure 9h determines the classifiers which use LR.It can be observed in Figure 9e that EntG is not a proper feature for any class.

Figure 7 .
Figure 7. Examples of images from MESSIDOR (top) and STARE (bottom) databases for the learning phase.

Figure 8 .
Figure 8. Boxes of RoIs for feature calculation.

Figure 7 .Figure 7 .
Figure 7. Examples of images from MESSIDOR (top) and STARE (bottom) databases for the learning phase.

Figure 8 .
Figure 8. Boxes of RoIs for feature calculation.

Figure 8 .
Figure 8. Boxes of RoIs for feature calculation.

Figure 10 .
Figure 10.Example of images from MESSIDOR (top) and STARE (bottom) databases for the operating phase.Figure 10.Example of images from MESSIDOR (top) and STARE (bottom) databases for the operating phase.

Figure 10 .
Figure 10.Example of images from MESSIDOR (top) and STARE (bottom) databases for the operating phase.Figure 10.Example of images from MESSIDOR (top) and STARE (bottom) databases for the operating phase.

Figure 11 .
Figure 11.Results of box classification.

Table 1 .
Feature set (candidate features, CF) for learning phase.
Images to be analyzed; selected features F j for MA detection: ConG, EnG, LG, and LB; the weights for selected features, w2j.Images to be analyzed; selected features F j for HE detection: ImR, FDR, FDG, and LR; the weights for selected features, w4j.
10, for LBPH.Outputs: Box containing OD (BOD) and BOD position; 1 For each image I: 2 Image decomposition in G and H components; Image primary processing; 3 Image decomposition by the sliding box algorithm (128 × 128 pixels, for STARE database); 4 For each box Bk: 5 Use CNN for blood vessel detection; 6 Segmentation of blood vessels and assign value of 1 to pixels of blood vessels; 7 Calculate the selected features: ImG, ConG, FDG, and LBPH; (c) EX Detection, Localization, and Evaluation Algorithm 3: Operational Phase-EX Inputs: Images to be analyzed; BOD position; selected features F j for EX detection: ImG, ConG, EnG, HomG, and FDG; the weights for selected features, w3j.(c) EX Detection, Localization, and Evaluation Algorithm 3: Operational Phase-EX Inputs: Images to be analyzed; BOD position; selected features F j for EX detection: ImG, ConG, EnG, HomG, and FDG; the weights for selected features, w3j.

Table 2 .
Exemplification of feature calculation for the OD class (marked with orange are the features selected for OD).

Table 2 .
Exemplification of feature calculation for the OD class (marked with orange are the features selected for OD).

Table 2 .
Exemplification of feature calculation for the OD class (marked with orange are the features selected for OD).

Table 3 .
LBP on H for the set of 10 boxes for the establishment of the OD class.

Table 4 .
Results for class representatives.Marked with green are the features selected for different RoIs.

Table 5 .
Results of test for feature selection and their weight establishment.Marked with gray are the classes which accept the corresponding feature for the classifier and the associated weights.

Table 6 .
Results for class representatives.

Table 7 .
Area and percentage of EXs for different segmented boxes.

Table 8 .
Area and percentage of HEs for different segmented boxes.

Table 9 .
Segmented boxes with EX from the image Im0001 (STARE DB).Partial and total size and corresponding percentage.

Table 10 .
The confusion matrices and the performances for box classification.

Table 10 .
The confusion matrices and the performances for box classification.