Classiﬁcation of Liver Diseases Based on Ultrasound Image Texture Features

: This paper discusses using computer-aided diagnosis (CAD) to distinguish between hepatocellular carcinoma (HCC), i.e., the most common type of primary liver malignancy and a leading cause of death in people with cirrhosis worldwide, and liver abscess based on ultrasound image texture features and a support vector machine (SVM) classiﬁer. Among 79 cases of liver diseases including 44 cases of liver cancer and 35 cases of liver abscess, this research extracts 96 features including 52 features of the gray-level co-occurrence matrix (GLCM) and 44 features of the gray-level run-length matrix (GLRLM) from the regions of interest (ROIs) in ultrasound images. Three feature selection models—(i) sequential forward selection (SFS), (ii) sequential backward selection (SBS), and (iii) F-score—are adopted to distinguish the two liver diseases. Finally, the developed system can classify liver cancer and liver abscess by SVM with an accuracy of 88.875%. The proposed methods for CAD can provide diagnostic assistance while distinguishing these two types of liver lesions.


Introduction
Liver diseases are among the most life-threatening diseases worldwide. Among the different kinds of liver lesions, liver cancer has a high incidence rate and high mortality rate in the countries of East Asia, Southeast Asia, sub-Saharan Africa, and Melanesia [1]. Liver abscess is less common than liver cancer. However, if it is not detected in time and treated in the proper manner, it may cause many serious infectious complications, even death. Liver biopsy is often used to evaluate liver diseases. It permits doctors to examine a liver and provides helpful information to make high-accuracy predictions. Along with those undeniable benefits, it may cause pain, infection or other injuries that hinder later treatments.
In this paper, three feature selection models-(i) sequential forward selection (SFS) [52,53], (ii) sequential backward selection (SBS) [53,54], and (iii) F-score [55]-are adopted to distinguish the two liver diseases. Marill and Green introduced a feature selection technique using the divergence distance as the criterion function and the SBS method as the search algorithm [52,53]. Whitney discussed its 'bottom-up' counterpart, known as SFS [53,54]. In this research, a large number of features are included, comprising 96 features from each sample. If all of them are used to train a classifier, it not only takes too much time but also cannot easily achieve high accuracy. To reduce the processing time and improve the accuracy, it is necessary to search for the important features from the feature set. Then, the crucial features of the samples are used to train and test by SVM. We took several steps to achieve this goal. First, in the ultrasound images, the liver lesions are marked by experienced physicians and the regions of interest (ROIs) are circled inside a red boundary, as illustrated in Figure 1, meaning that part of the liver lesion is located inside the red boundary of the whole-liver image. Second, all features are extracted from the collected ROIs. Third, several feature selection processes are carried out to optimize the feature set. Finally, the optimal feature sets are used to train and test by SVM.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 3 of 25 by the GLRLM. In this paper, we compared the results from the popular features of the GLCM and the gray-level run-length matrix (GLRLM). In this paper, three feature selection models-(i) sequential forward selection (SFS) [52,53], (ii) sequential backward selection (SBS) [53,54], and (iii) F-score [55]-are adopted to distinguish the two liver diseases. Marill and Green introduced a feature selection technique using the divergence distance as the criterion function and the SBS method as the search algorithm [52,53]. Whitney discussed its 'bottom-up' counterpart, known as SFS [53,54]. In this research, a large number of features are included, comprising 96 features from each sample. If all of them are used to train a classifier, it not only takes too much time but also cannot easily achieve high accuracy. To reduce the processing time and improve the accuracy, it is necessary to search for the important features from the feature set. Then, the crucial features of the samples are used to train and test by SVM. We took several steps to achieve this goal. First, in the ultrasound images, the liver lesions are marked by experienced physicians and the regions of interest (ROIs) are circled inside a red boundary, as illustrated in Figure 1, meaning that part of the liver lesion is located inside the red boundary of the whole-liver image. Second, all features are extracted from the collected ROIs. Third, several feature selection processes are carried out to optimize the feature set. Finally, the optimal feature sets are used to train and test by SVM.

Feature Extraction
Feature extraction is one of the most important stages in pattern recognition. It collects the input data for a classifier and thus can directly affect the performance of a CAD system. For example, with the same number of features, a better feature set could more exactly describe the special characteristics of each kind of liver disease such that it can improve the diagnostic result. As mentioned in Section 1, textural analysis of US images is a very useful tool for liver diagnosis, and two of the most effective methods are GLCM and GLRLM. In this research, we extract 96 features including 52 features of GLCM and 44 features of GLRLM, for analysis.

Materials
In medical imaging, the protocol of Digital Imaging and Communications in Medicine (DICOM) [12] is the standard for the communication and management of medical imaging information; therefore, DICOM files are typically used. In this research, for the convenience of image analysis, the original US images, supported by the Medical University Hospital in Taipei, were stored and then converted into 256-grayscale BMP files by MATLAB for more convenient processing. The images were from 79 cases of liver diseases including 44 cases of HCC and 35 cases of liver abscess. First, the original images were marked by experienced clinicians and verified in clinical reality. Then, the 32 ×

Feature Extraction
Feature extraction is one of the most important stages in pattern recognition. It collects the input data for a classifier and thus can directly affect the performance of a CAD system. For example, with the same number of features, a better feature set could more exactly describe the special characteristics of each kind of liver disease such that it can improve the diagnostic result. As mentioned in Section 1, textural analysis of US images is a very useful tool for liver diagnosis, and two of the most effective methods are GLCM and GLRLM. In this research, we extract 96 features including 52 features of GLCM and 44 features of GLRLM, for analysis.

Materials
In medical imaging, the protocol of Digital Imaging and Communications in Medicine (DICOM) [12] is the standard for the communication and management of medical imaging information; therefore, DICOM files are typically used. In this research, for the convenience of image analysis, the original US images, supported by the Medical University Hospital in Taipei, were stored and then converted into 256-grayscale BMP files by MATLAB for more convenient processing. The images were from 79 cases of liver diseases including 44 cases of HCC and 35 cases of liver abscess. First, the original images were marked by experienced clinicians and verified in clinical reality. Then, the 32 × 32-pixel ROIs were selected inside the marked boundaries, as presented in Figure 1. In Figure 2, the 32 × 32-pixel ROIs were sampled from the marked image. All samples were collected from the liver disease images for later procedures, as shown in Figure 3. In this research, we sampled 400 ROIs of each kind of disease for training and testing.
Appl. Sci. 2019, 9, x FOR PEER REVIEW  4 of 25 32-pixel ROIs were selected inside the marked boundaries, as presented in Figure 1. In Figure 2, the 32 × 32-pixel ROIs were sampled from the marked image. All samples were collected from the liver disease images for later procedures, as shown in Figure 3. In this research, we sampled 400 ROIs of each kind of disease for training and testing.

GLCM and Haralick Features
In this step, ROIs are analyzed by GLCM, the most popular second-order statistical feature, proposed by Haralick [42] in 1973. Haralick feature extraction is completed in two steps. In the first step, the co-occurrence matrix is calculated and in the second, the texture features, which are very useful in a variety of imaging applications, particularly in biomedical imaging, are computed based on the co-occurrence matrix. 32-pixel ROIs were selected inside the marked boundaries, as presented in Figure 1. In Figure 2, the 32 × 32-pixel ROIs were sampled from the marked image. All samples were collected from the liver disease images for later procedures, as shown in Figure 3. In this research, we sampled 400 ROIs of each kind of disease for training and testing.

GLCM and Haralick Features
In this step, ROIs are analyzed by GLCM, the most popular second-order statistical feature, proposed by Haralick [42] in 1973. Haralick feature extraction is completed in two steps. In the first step, the co-occurrence matrix is calculated and in the second, the texture features, which are very useful in a variety of imaging applications, particularly in biomedical imaging, are computed based on the co-occurrence matrix.

GLCM and Haralick Features
In this step, ROIs are analyzed by GLCM, the most popular second-order statistical feature, proposed by Haralick [42] in 1973. Haralick feature extraction is completed in two steps. In the first step, the co-occurrence matrix is calculated and in the second, the texture features, which are very useful in a variety of imaging applications, particularly in biomedical imaging, are computed based on the co-occurrence matrix.

GLCM
The GLCM is a matrix that shows how often the different combinations of gray levels transpire in an image. It is widely used to extract features, especially in the research of liver diseases (e.g., see [46]). In other words, it is a way of presenting the relationship between two neighboring pixels. The whole procedure to extract the Haralick features is presented as Figure 4. The co-occurrence matrix can be calculated as in Equation (1) where C d,θ (i, j) is the number of occurrences of the pair of gray levels i and j, with d, θ the distance and angularity respectively; and I(x, y) is the intensity of a pixel in the x th row and y th column in the image. We can see that, with the different pairs (d, θ), the image could be explored in different directions and distances. In this research, we choose the four directions θ = 0, 45 Figure 5). Thus, the pair (d = 1, θ = 0) is the nearest horizontal pixel. Moreover, there are also co-occurrence matrices for the vertical θ = 90 • and diagonal axes θ = 45 Appl. Sci. 2019, 9, x FOR PEER REVIEW 5 of 25 The GLCM is a matrix that shows how often the different combinations of gray levels transpire in an image. It is widely used to extract features, especially in the research of liver diseases (e.g., see [46]). In other words, it is a way of presenting the relationship between two neighboring pixels. The whole procedure to extract the Haralick features is presented as Figure 4. The co-occurrence matrix can be calculated as in Equation (1) , ( , ) = 1, if ( , ) = , ( + ∆ , + ∆ ) = 0, otherwise where , ( , ) is the number of occurrences of the pair of gray levels and , with , the distance and angularity respectively; and ( , ) is the intensity of a pixel in the x th row and y th column in the image. We can see that, with the different pairs ( , ), the image could be explored in different directions and distances. In this research, we choose the four directions = 0, 45 , 90 , 135 with = 1 (illustrated in Figure 5). Thus, the pair ( = 1, = 0) is the nearest horizontal pixel. Moreover, there are also co-occurrence matrices for the vertical ( = 90 ) and diagonal axes ( = 45 , 135 ).  For example, in Figure 6, we calculate the gray-level co-occurrence matrix of a 4 × 4-pixel image. In this case, 3 and 1 are near each other 2 times in the image; therefore, the positions (3, 1) and (1,3) are filled in with 2. Applying the same principle to all other pixels, we obtain the GLCM of the image.
The ROI that we need to process is 32 × 32 pixels with a 256-gray level. Therefore, we will have a 256 × 256 matrix with a total of 65,536 cells; however, many cells are filled with zeros (because these The GLCM is a matrix that shows how often the different combinations of gray levels transpire in an image. It is widely used to extract features, especially in the research of liver diseases (e.g., see [46]). In other words, it is a way of presenting the relationship between two neighboring pixels. The whole procedure to extract the Haralick features is presented as Figure 4. The co-occurrence matrix can be calculated as in Equation (1) where , ( , ) is the number of occurrences of the pair of gray levels and , with , the distance and angularity respectively; and ( , ) is the intensity of a pixel in the x th row and y th column in the image. We can see that, with the different pairs ( , ), the image could be explored in different directions and distances. In this research, we choose the four directions = 0, 45 , 90 , 135 with = 1 (illustrated in Figure 5). Thus, the pair ( = 1, = 0) is the nearest horizontal pixel. Moreover, there are also co-occurrence matrices for the vertical ( = 90 ) and diagonal axes ( = 45 , 135 ).  For example, in Figure 6, we calculate the gray-level co-occurrence matrix of a 4 × 4-pixel image. In this case, 3 and 1 are near each other 2 times in the image; therefore, the positions (3, 1) and (1,3) are filled in with 2. Applying the same principle to all other pixels, we obtain the GLCM of the image.
The ROI that we need to process is 32 × 32 pixels with a 256-gray level. Therefore, we will have a 256 × 256 matrix with a total of 65,536 cells; however, many cells are filled with zeros (because these For example, in Figure 6, we calculate the gray-level co-occurrence matrix of a 4 × 4-pixel image. In this case, 3 and 1 are near each other 2 times in the image; therefore, the positions (3, 1) and (1, 3) are filled in with 2. Applying the same principle to all other pixels, we obtain the GLCM of the image.
The ROI that we need to process is 32 × 32 pixels with a 256-gray level. Therefore, we will have a 256 × 256 matrix with a total of 65,536 cells; however, many cells are filled with zeros (because these combinations do not exist). This situation of many zero cells can lead to a poor approximation. The solution to this problem is that the number of gray levels is reduced, decreasing the number of zero cells, and the validity is improved considerably. In this research, the ROIs are scaled to 16 gray-level images before computing GLCM. After that, Equation (1) is normalized to be converted into a probabilistic form by Equation (2). The procedure is shown in Figure 7. .
(2) Therefore, we obtain the GLCM as in Equation (3 From the co-occurrence matrix, the textural features proposed by Haralick are calculated. Appl. Sci. 2019, 9, x FOR PEER REVIEW 6 of 25 combinations do not exist). This situation of many zero cells can lead to a poor approximation. The solution to this problem is that the number of gray levels is reduced, decreasing the number of zero cells, and the validity is improved considerably. In this research, the ROIs are scaled to 16 gray-level images before computing GLCM. After that, Equation (1) is normalized to be converted into a probabilistic form by Equation (2). The procedure is shown in Figure 7.
Therefore, we obtain the GLCM as in Equation (3) = From the co-occurrence matrix, the textural features proposed by Haralick are calculated.

Haralick Features
Thirteen features can be extracted from the GLCM for an image. These features are presented as follows: • Energy feature: This is also called angular second moment (ASM) or uniformity. It describes the uniformity of an image. When the gray levels of pixels are similar, the energy value will be large.
• Entropy feature: This concept comes from thermodynamics, which is a field of physics concerned with heat, temperature, and their relationship with energy and work. In our case, it could be considered a chaotic or disordered quantity.
• Contrast feature: This measures the intensity variations between the pixels with a fixed direction and distance ( , ). With the same gray level, the contrast value will be equal to 0. If | − | = 1, there is little

Haralick Features
Thirteen features can be extracted from the GLCM for an image. These features are presented as follows: • Energy feature: This is also called angular second moment (ASM) or uniformity. It describes the uniformity of an image. When the gray levels of pixels are similar, the energy value will be large.
• Entropy feature: This concept comes from thermodynamics, which is a field of physics concerned with heat, temperature, and their relationship with energy and work. In our case, it could be considered a chaotic or disordered quantity.
• Contrast feature: This measures the intensity variations between the pixels with a fixed direction and distance (d, θ). With the same gray level, the contrast value will be equal to 0. If |i − j| = 1, there is little contrast so the weight is just 1. If |i − j| = 2, the contrast of the gray level is higher; therefore, the weight is larger at 4. This means that the weight increases exponentially.
• Correlation feature: This feature describes the linear dependency of the gray levels in the co-occurrence matrix. It shows how a center pixel relates to others.
Appl. Sci. 2019, 9, 342 8 of 25 where µ x , µ y and σ x , σ y are the mean and standard deviations, which are calculated as In a symmetrical gray-level co-occurrence matrix, µ x = µ y and σ x = σ y . • Homogeneity feature: This feature is also known as inverse difference moment (IDM). It describes the local similarity of an image.
The weight of IDM is the inverse of the weight of contrast; therefore, it is lower at locations farther away from the diagonal of the GLCM. This means that positions nearer to the GLCM diagonal will have larger weights. • Sum average feature: where • Sum entropy feature: • Sum variance feature: • Difference average feature: where • Difference variance feature: • Difference entropy feature: • Information measures of correlation feature 1: • Information measures of correlation feature 2: where Figure 8 shows an example of a 4 × 4 grayscale image to illustrate the above equations. After we obtain a co-occurrence matrix of the 4 × 4 gray-level image, 13 textural features can be calculated. The results of the 13 textural features are shown in Table 1.

GLRLM and Textural Features
The other method we use in this paper to analyze the ROIs is the GLRLM. It was first proposed by Galloway in 1975 with five features [47]. In 1990, Chu et al. [48] suggested two new features to extract gray-level information in the matrix, before Dasarathy and Holder [49] offered another four features following the idea of a joint statistical measure of gray level and run length. Tang [50] provided a good summary of some features achieved with the GLRLM.

GLRLM
Run-length statistics extract the coarseness of a texture in the different directions. A run is defined as a string of consecutive pixels that have the same gray level intensity along a specific linear orientation. Fine textures contain more short runs with similar gray levels, while coarse textures have more long runs with significantly different gray levels.
For a given image, the pair (i, j) of a run-length matrix Q(i, j) is defined as the run-number of gray level i and run length j, as described in Figure 9. Hence, the RLM measures how many times there are runs of j consecutive pixels with the same value, with j going from 2 to the length of the longest in a fixed orientation. Even though many GLRLMs can be defined for a given image, normally four matrices are computed, for the horizontal, vertical and diagonal directions. The matrix P has the size (M × N), where M is equal to the maximum gray level and N is the possible maximum run length in the corresponding image. The typical directions are 0 • , 45 • , 90 • , and 135 • , and calculating the run-length encoding for each orientation will produce a run-length matrix.
there are runs of j consecutive pixels with the same value, with j going from 2 to the length of the longest in a fixed orientation. Even though many GLRLMs can be defined for a given image, normally four matrices are computed, for the horizontal, vertical and diagonal directions. The matrix P has the size ( × ), where is equal to the maximum gray level and is the possible maximum run length in the corresponding image. The typical directions are 0°, 45°, 90°, and 135°, and calculating the run-length encoding for each orientation will produce a run-length matrix.

GLRLM Features
After a run-length matrix is calculated along a given direction, several texture descriptors are calculated to obtain the texture properties and differentiate among different textures. These descriptors can be used either with respect to each direction or by combining them if a global view of the texture information is required. Eleven features are typically extracted from the run-length matrices: short run emphasis (SRE), long run emphasis (LRE), high-gray-level run emphasis (HGRE), low-gray-level run emphasis (LGRE), pairwise combinations of the length and gray level emphasis (SRLGE, SRHGE, LRLGE, LRHGE), run-length nonuniformity (RLN), gray-level nonuniformity (GLN), and run percentage (RPC). These features describe specific characteristics in the image. For example, SRE measures the distribution of short runs in an image, while RPC measures both the homogeneity and the distribution of runs of an image in a specific direction. The formulas for calculating the features and their explanation are as follows:

1.
Short Run Emphasis This describes the distribution of short runs. This value indicates how much a texture is composed of runs of short length in a given direction.
where n r denotes the total number of runs.

2.
Long Run Emphasis Similar to SRE, this describes the distribution of long runs. This value indicates how much a texture is composed of runs of long length in a given direction. These two features give more in-depth information about the coarseness of an image. 3.
Low-Gray-Level Run Emphasis This describes the distribution of low gray-level values. The more low gray-level values are in an image, the larger this value is.
High-Gray-Level Run Emphasis This contrast, with low-gray-level run emphasis, it describes the distribution of high gray-level values. The higher gray-level values are in an image, the larger this value is.

5.
Short-Run Low-Gray-Level Emphasis This describes the relative distribution of short runs and low gray-level values. The SRLGE value is large for an image with many short runs and lower gray-level values.
6. Short-Run High-Gray-Level Emphasis This describes the relative distribution of short runs and high gray-level values. The SRHGE value will be large for an image with many short runs and high gray-level values.

7.
Long-Run Low-Gray-Level Emphasis This describes the relative distribution of long runs and low gray-level values. The LRLGE value will be large for an image with many short runs and high gray-level values. 8.
Long-Run High-Gray-Level Emphasis This describes the relative distribution of long runs and high gray-level values. The LRHGE value will be large for an image with many short runs and high gray-level values.
10. Run-Length Nonuniformity This describes the similarity of the length of runs throughout the image in a given direction. It is expected to be small if the run lengths are similar throughout the image.
11. Run Percentage This feature is not a percentage in spite of its name. It presents the homogeneity and the distribution of runs of an image in a given direction. The RPC is the largest when the length of runs is 1 for all gray levels in a given direction. RPC = n r n p (40) where n p is the number of pixels. Figure 10 shows an example calculating the GLRLM from a 4 × 4 grayscale image in the horizontal direction. We will compute 11 features following the above formulas. The results of 11 textural features, which can be computed, are shown in Table 2.

Feature Selection
Feature selection has been an interesting research field in machine learning, pattern recognition, data mining, and statistics. The main idea of feature selection is to eliminate redundant features that contain little or no predictive information while keeping the useful ones. To find optimal features for classification, researchers have proposed several methods to analyze the feature set. In fact, the effectiveness of features on classification is highly problem-dependent. Extracted features could perform very well for one problem but may give poor performance for others. Hence, we must pick proper features for the given problem at hand. Ultimately, from various feature extraction methods, we need to find a set of features that is optimal for the problem. In this research, we use sequential

Feature Selection
Feature selection has been an interesting research field in machine learning, pattern recognition, data mining, and statistics. The main idea of feature selection is to eliminate redundant features that contain little or no predictive information while keeping the useful ones. To find optimal features for classification, researchers have proposed several methods to analyze the feature set. In fact, the effectiveness of features on classification is highly problem-dependent. Extracted features could perform very well for one problem but may give poor performance for others. Hence, we must pick proper features for the given problem at hand. Ultimately, from various feature extraction methods, we need to find a set of features that is optimal for the problem. In this research, we use sequential forward selection (SFS) [52,53], sequential backward selection (SBS) [53,54], and F-score [55] to find the optimal feature subset.

Sequential Forward Selection (SFS)
SFS [52,53] begins by evaluating all samples of a dataset that consist of only one input attribute. In other words, we start from the empty set and sequentially add feature x i , which results in the highest-valued objective function J(Y k + x i ). Its algorithm can be broken down into the following steps:

•
Step 1: Start with empty set Y o = ∅.

•
Step 2: Select the next best feature In our case, the objective function is based on the classification rate from a cross-validation test. The mean value of a 10-fold cross-validation test is used to evaluate the feature subset.
This procedure continues until a predefined number of features are selected. According to the above process, we see that the search space is drawn such as an ellipse to emphasize the fact that there are fewer states towards the full or empty sets. To find the optimum input feature set overall, the easiest means is an exhaustive search. However, this is very expensive. Compared with an exhaustive search, forward selection is much cheaper. SFS works best when the optimal subset has a small number of features, and the main disadvantage of SFS is that it is unable to remove features that become obsolete after the addition of other features.

Sequential Backward Selection (SBS)
Contrary to SFS, SBS [53,54] works in the opposite way. SBS starts from a full set of features and sequentially eliminates the worst feature x * to result in the highest-valued objective function J(Y k − x i ). Its algorithm can be broken down into the following steps:

•
Step 1: Start with full set Y o = X • Step 2: Eliminate the worst feature The objective function is based on the classification rate from a cross-validation test. The mean value of the 10-fold cross-validation test is used to evaluate the feature subset.
This procedure continues until a predefined number of features are left. SBS usually works best when the optimal feature subset has a large number of features since SBS spends most of its time visiting large subsets.

F-Score
F-score [55] is a technique that measures discrimination. Given training vectors x k , k = 1, . . . , m, if the number of positive and negative instances are n + and n − , respectively, then the F-score of the i th feature is calculated as in Equation (41)  k,i are the i th features of the k th positive and negative sample, respectively. F-score indicates the discrimination between the positive and negative sets; therefore, the larger the F-score, the more likely this feature is to be more discriminative. Thus, we could consider this score as a criterion for feature selection. A disadvantage of this method is that it cannot reveal shared information between features [55]. In the example, both features have low values of F-score; however, the set of them classifies the two groups precisely.
In spite of this drawback, F-score is simple and generally quite effective. We order all features based on F-score and then use a classifier to train/test the set that includes the feature with the highest F-score. Then, we add the second highest F-score feature to the feature set before training and testing all of the dataset again. The procedure is repeated until all features are added to the feature set.

SVM Classification
Support vector machine (SVM), which was proposed by Vapnik et al. [37], is a powerful machine learning method based on statistical learning theory. Its theory is based on the idea of finding an optimal hyperplane to separate two classes. This produces a classifier that performs well on unseen patterns. SVM has been widely applied in many fields, such as regression estimation, environment illumination learning, object recognition, and bioinformatics analysis. In each case, there are usually many possible hyperplanes to separate the groups; however, there is only one hyperplane that has a maximal margin.
In this research, two liver diseases need to be distinguished. Hence, LIBSVM, a popular machine learning tool for classification, regression, and other machine learning tasks, was used to implement a multiclass learning task. LIBSVM, which was proposed by Lin et al. [38], is a library for support vector machines. It is an integrated library for support vector classification (C-SVC, nu-SVC) and distribution estimation (one-class SVM). A typical use of LIBSVM includes two steps: the first step is training a data set to obtain a model and the second is to use the model to predict the information of a testing data set.

Performance Evaluation
To reduce the variability of the prediction performance, a cross-validation test is usually used to evaluate the performance of the proposed system. It is one of the most popular methods to evaluate a model's prediction performance. If a model was trained and tested on the same data, it would easily lead to an overoptimistic result. Therefore, the better approach, the holdout method, is to split the training data into disjoint subsets.
As it is a single train-and-test method, the error rate we got resulted from an 'unfortunate' split. Moreover, in some cases of a lack of samples, we cannot afford the luxury of setting. The drawbacks of the holdout can be overcome with a family of resampling methods, called cross-validation. Two well-known kinds of cross-validation are leave-one-out cross-validation (LOOCV) and k-fold cross-validation.
In k-fold cross-validation, the total samples are randomly partitioned into k groups, which have the same size. Of the k groups, one group is for testing the model, while the remaining (k − 1) groups are used as training data. This process is repeated k times (the folds) until all groups are tested. Then, the results from the k experiments can be averaged to produce a single estimation. Thus, the true accuracy is estimated as the average accuracy rate The advantage of this method is that all samples are used for both training and validation, and each observation is used for validation only one time. Although 10-fold and 5-fold cross-validation are commonly used, in general k is an unfixed parameter.
Leave-one-out cross-validation could be considered a degenerate case of k-fold cross-validation where k is the total number of samples. Consequently, for a data set with N samples, LOOCV performs N experiments. In each experiment, only one sample is used for testing, while N-1 samples left are for the training process.
In this research, we use 10-fold cross-validation for performance evaluation. The classification results in four kinds of value: true positive (TP), true negative (TN), false positive (FP), and false negative (FN) (shown in Table 3). Where 'true' and 'false' are intended for result correction, while 'positive' or 'negative' signifies the tumor is either HCC or liver abscess. Based on the information, we calculate the accuracy factors as in Equation (43), which describes the performance of classifiers

Result and Discussion
Various experiments were conducted. Two kinds of feature matrices (GLCM and GLRLM) were calculated and selected by the feature selection methods (sequential forward selection, sequential backward selection, and F-score) before being classified by the support vector machine. The following sections will present the results of the different combinations for considering the optimal methods for a CAD system.

Classification by All Features
As mentioned in the previous parts, we extracted 96 features from a region of interest (ROI). They consisted of many characteristics of each kind of liver disease. In this experiment, SVM was used to discriminate the diseases by using each kind of feature (GLCM or GLRLM) and using the two kinds (GLCM and GLRLM) together. Table 4 shows the results of classification in this case. Table 4. Accuracy rate without applying selection method.

GLCM (52 Features) GLRLM (44 Features) GLCM + GLRLM (96 Features)
75.75% 54.53% 61% As seen from Table 4, the classification results without applying any feature selection method of SVM are not accurate enough, except using the model that included GLCM and SVM, whose highest accuracy rate is 75.75%. The other combinations of SVM obtain detection rates of 54.53% and 61%.
This result shows that both GLCM and GLRLM features can be applied to classification of HCC and liver abscess; however, GLRLM seems to contain more noise than GLCM. The classification can adapt to the redundant features in both GLCM and GLRLM; however, SVM's performance is affected significantly. In the next section, the classification was conducted by applying sequential forward selection.

Classification by Using Sequential Forward Selection
After applying sequential forward selection for the different kinds of feature set, different numbers of features were selected. The results are shown in Table 5. In SVM classification, we set one more condition for the SFS process. That is, the lower bound of selected features is four features because if the number of selected features is not large enough, the result of classification will be not good. For example, the process of SVM, which is used to train and test all samples with only one feature, only takes approximately 0.3 s. Regarding SVM, we see that SFS gives a slight improvement from 75.75% to 78% with GLCM, a significant change in the recognition rate with GLRLM from 54.53% to 88.13%, and an improvement with the combination of GLCM and GLRLM from 61% to 89.25%.

Classification by Using Sequential Backward Selection
After conducting sequential backward selection, we obtain the results shown in Table 6. It is obvious that SBS enhances the accuracy of all methods to different levels. In case of GLCM features, the accuracy is slightly decreased by 0.25% with SVM. For GLRLM and the combination between GLCM and GLCM, the highest result is achieved by SVM and the GLCM-GLRLM combination with SBS (88.87%). This is followed by 88.25% with the model that used GLRLM and SVM.  (41), the F-score of each feature can be computed (shown in Tables 7 and 8). As shown in Figure 11, when we add more GLCM features based on their F-score, the accuracy increases and reaches a peak at the 36th feature at 77.25% accuracy and then slightly decreases at the end. Meanwhile, SVM achieves 87.125% accuracy at the 87th feature (including 34 features) before dropping rapidly from the 63rd feature to the end of GLRLM features, as shown in Figure 12. In this case, we can also obtain 86.625% accuracy with only 16 features at the 61st feature of GLRLM. When using all features, the trend of accuracy is quite similar to the case of applying GLRLM features. SVM can obtain a classification result up to the peak accuracy of 88.875% before dropping at the end, as shown in Figure 13.  Figure 11. Results of GLCM features and SVM. Figure 11. Results of GLCM features and SVM.   The other method to apply F-score is to select several thresholds where the gap between low F-score and high F-score is considerable. We selected four thresholds for each kind of feature and six thresholds for all features. The results are shown in Table 9. Table 9. Comparison between searching all and using a threshold.

Performance Analysis
The performance of all methods is summarized in Table 10. The models using SVM and GLRLM or all features with any selection method give the highest accuracy, approximately 88%. For example, SVM tried all cases of feature sets based on F-score for approximately 40 s. Finally, the highest accuracy obtained by the model, which contains SVM and all features selected by SFS, is 89.25%. Nevertheless, in case of reducing the processing load, the threshold of F-score for SVM could be considered because it also gives a good performance. It took only 1.8 s to achieve the classification rate of 87.375%, compared with 40 s for SFS. These differences will have significant meaning for processing a large dataset.

Conclusions
Hepatocellular carcinoma (HCC) and liver abscess are among the most dangerous liver diseases worldwide. Due to the need to diagnose liver disease based on ultrasound images, developing a computer-aided diagnosis (CAD) system will greatly assist inexperienced physicians in their decision making. Therefore, this research proposes a system to reduce the erroneous classification of HCC and liver abscess. First, 96 textural features including 52 features of the gray-level co-occurrence matrix (GLCM) and 44 features of the gray-level run-length matrix (GLRLM) were extracted from the regions of interest (ROIs), which were verified by radiologists and recognized by biopsy. To obtain the important features, we applied the following three feature selection schemes: (i) sequential forward selection (SFS), (ii) sequential backward selection (SBS), and (iii) F-score. Then we determined the most discriminative feature set. Finally, the support vector machine (SVM) classifier was trained from the features of the training set and tested by 10-fold cross-validation to obtain a reliable result. The final results show that the proposed methods for a CAD system can provide diagnostic help while distinguishing HCC from liver abscess with high accuracy (up to 89.25%) of identification.