Coronary Artery Disease Detection by Machine Learning with Coronary Bifurcation Features

: Background: Early accurate detection of coronary artery disease (CAD) is one of the most important medical research areas. Researchers are motivated to utilize machine learning techniques for quick and accurate detection of CAD. Methods: To obtain the high quality of features used for machine learning, we here extracted the coronary bifurcation features from the coronary computed tomography angiography (CCTA) images by using the morphometric method. The machine learning classifieralgorithms, suchaslogisticregression(LR),decisiontree(DT),lineardiscriminantanalysis(LDA), k-nearest neighbors (k-NN), artificial neural network (ANN), and support vector machine (SVM) were applied for estimating the performance by using the measured features. Results: The results showed that in comparison with other machine learning methods, the polynomial-SVM with the use of the grid search optimization method had the best performance for the detection of CAD and had yielded the classification accuracy of 100.00%. Among six examined coronary bifurcation features, the exponent of vessel diameter ( n ) and the area expansion ratio (AER) were two key features in the detection of CAD. Conclusions: This study could aid the clinicians to detect CAD accurately, which may probably provide an alternative method for the non-invasive diagnosis in clinical.


Introduction
Coronary artery disease (CAD) is one of the leading causes of death in the world [1,2]. Early detection of CAD could save patients' lives and reduce the cost of healthcare, which is of great clinical significance. Many tools have been developed for CAD diagnosis [3][4][5], among which the cardiac catheterization is the most direct and reliable approach [3]. However, cardiac catheterization is costly and time-consuming, and it is also an invasive and risky surgical operation. Hence, few CAD patients prefer to choose cardiac catheterization as their diagnostic method. Therefore, finding a reliable and non-invasive method for early detection of CAD is very desirable. Although some non-invasive diagnostic approaches like coronary computed tomography angiography (coronary CTA, CCTA), echocardiography, and nuclear magnetic resonance imaging (MRI) could accurately detecting CAD, they lack the ability to detect CAD at early stage. On the other hand, these methods usually require assistance from medical experts. Thus, they have not been effectively used for the early detection of CAD. Our previous study found that the changes in the morphological features of coronary arterial trees were highly correlated to the degrees of CAD lesion

Materials and Methods
We here proposed a morphometric methodology to collect the morphological features dataset for the detection of CAD. Then, we evaluated these features by different commonly used machine learning algorithms to find the best-fitted classifier for the detection of CAD. Finally, we applied the best algorithm to evaluate all the morphological features to seek the most important features for the detection of CAD.

Morphometric Features Data Collection and Selection
With the development of hardware and software of angiographic techniques, the CCTA imaging has been successfully applied to the visualization of coronary arteries in recent decades [17,18]. We can obtain many features from the CCTA images, such as geometric features, size of calcified plaque, and diameter stenosis rate, etc. Selecting the most important features has a significant impact on the medical diagnostic process. It helps to get an accurate and quick diagnosis. In this study, we selected the morphometric features data for building the machine learning classifiers. These features were measured from the CCTA images of the Southern Chinese populations. In this study, we totally collected morphometric features datasets with 1163 variables (features data), among which 571 variables were from patients with CAD lesion (CAD subjects) and 592 variables were from individuals without CAD lesion (non-CAD subjects). This study was approved by the Ethics Committee of the College of Basic Medicine, Southern Medical University and the Ethics Committee of the Guangdong General Hospital, Guangdong Academy of Medical Sciences, and was performed per the Declaration of Helsinki.
The morphological features used were obtained from the coronary bifurcations and the method of data collection was shown in Figure 1. Morphometric data of coronary arterial trees were extracted from CCTA images with MIMICS software (Materialize). In the MIMICS software, centerlines were formed by a series of center points that located at the center of the cross-sectional plane of the 3D coronary artery. Subsequently, the best fit diameter was calculated as twice of the average radius from the points of the centerlines to the contour of the 3D coronary artery. The original morphometric data of mother vessel diameters (D m ), larger daughter vessel diameters (D l ), smaller daughter vessel diameters (D s ), and bifurcation angles (α), were determined at all bifurcations of arterial trees. Table 1 showed the definitions of six morphological features (α, n, , and AER). The morphological features of n and AER represented the exponent of vessel diameter and area expansion ratio of the coronary bifurcations, respectively. They were calculated from mother vessel diameters and two daughter vessel diameters. These six morphological features were selected since they were highly correlated with atherosclerotic lesion in previous research [19,20] or were of potential clinical importance as indicated by medical experts.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 4 of 20 Figure 1. A schematic representation of the coronary bifurcation geometry and the method for morphological data measurement. The detailed information of , , and were shown in Table  1.  Figure 1. A schematic representation of the coronary bifurcation geometry and the method for morphological data measurement. The detailed information of D m , D l , D s and α were shown in Table 1. α Bifurcation angle between larger and smaller vessel axes. n

Features Definitions
The exponent of vessel diameter: D n m = D n l + D n s .
Ratio of diameters between smaller daughter and mother vessels.
Ratio of diameters between larger daughter and mother vessels.

Machine Learning Modeling Processes and Algorithms Evaluation
The aim of this study was to obtain the best machine learning classifier and to find the most important morphological features for the detection of CAD. Moreover, to evaluate the quality of the selected morphological features for the detection of CAD, several commonly used classifier algorithms were applied for classification, namely LR, DT, LDA, k-NN, ANN, and SVM, respectively. In this study, the classification performances of the SVM model for three different kernel functions (namely linear, polynomial, and radial basis function (RBF)) were first studied to find the best kernel function for the detection of CAD. We named these three sub-algorithms as linear-SVM, polynomial-SVM, and RBF-SVM, respectively. Moreover, the accuracies among the best sub-algorithm of SVM and the other machine learning algorithms were further compared to assess the capability of all algorithms we used. This aim was to select the best machine learning algorithm for the following up researches. To find the most important features for the detection of CAD, we then applied the selected best algorithm to evaluate the classification performance of the specific feature(s) or their combinations.
The process flow diagram of our proposed machine learning approach was shown in Figure 2. The main steps of the proposed approach were summarized below.
Step 1. Collecting the original morphological features; In this study, we totally collected six original morphological features (α, n, These morphological features were selected since they had been shown highly correlating with CAD.
Step 2. Dividing all the features into four equal groups; The morphological feature datasets for both CAD subjects and non-CAD subjects were randomly divided into four equal-sized subsets. The subsets of the non-CAD subjects were designated as A1, A2, A3, and A4; while the subsets of the CAD subjects were designated as B1, B2, B3, and B4 ( Figure 2). This part was aimed to prepare for studying the effect of data sampling on the classification performance of the machine learning model.
Step 3. Preprocessing of the original morphological features; To meet the requirement of the data format of the classifier algorithms, the original morphometric data were preprocessed. The preprocessing formula was described as follows: where x max and x min were the maximum and minimum values of each morphological feature, respectively. After data preprocessing, the values of original morphometric data were normalized to values ranging from −1 to 1.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 20 The process flow diagram of our proposed machine learning approach was shown in Figure 2. The main steps of the proposed approach were summarized below. Figure 2. The flow diagram of the machine learning modeling. The morphological data for both noncoronary heart disease (CAD) subjects and CAD subjects were divided into four equal-sized subsets. The subsets of the non-CAD subjects were designated as 1, 2, 3 and 4; and the subsets of the CAD subjects were designated as 1, 2, 3 and 4. non-CAD: non-coronary artery disease; CAD: coronary artery disease; PSO: particle swarm optimization; SVM: support vector machine.
Step 1. Collecting the original morphological features; In this study, we totally collected six original morphological features ( , , , ). These morphological features were selected since they had been shown highly correlating with CAD.
Step 2. Dividing all the features into four equal groups; The morphological feature datasets for both CAD subjects and non-CAD subjects were randomly divided into four equal-sized subsets. The subsets of the non-CAD subjects were designated as 1, 2, 3, and 4; while the subsets of the CAD subjects were designated as 1, 2, 3, and 4 ( Figure 2). This part was aimed to prepare for studying the effect of data sampling on the classification performance of the machine learning model.

Figure 2.
The flow diagram of the machine learning modeling. The morphological data for both non-coronary heart disease (CAD) subjects and CAD subjects were divided into four equal-sized subsets. The subsets of the non-CAD subjects were designated as A1, A2, A3 and A4; and the subsets of the CAD subjects were designated as B1, B2, B3 and B4. non-CAD: non-coronary artery disease; CAD: coronary artery disease; PSO: particle swarm optimization; SVM: support vector machine.
Step 4. Choosing a machine learning model, training the selected model, and parameter tuning; Different classification algorithms (LR, DT, LDA, k-NN, ANN, linear-SVM, polynomial-SVM, and RBF-SVM) were applied for training under the use of same morphological features (75% of data for training, the remaining for testing). The cross-validation was set as 10-fold (in subsets with proportional data quantities for both classes). For the SVM classifiers, two key parameters, C and g, were needed to be pre-optimized to establish the best SVM model. There were two common algorithms for parameter optimization, namely grid search and particle swarm optimization (PSO) [21]. In this study, the given range of grid search algorithm was set from 2 −8 to 2 8 , and the maximum iteration number of PSO algorithm was set to 200 to find the near-optimal parameters.
Step 5. Evaluating the models and making the predictions; The calculations of the statistical metrics on the results of models were defined as follows [22]: Accuracy showed the ratio of correctly classified samples to the total number of tested samples. TP, FN, TN, and FP represented true positives, false negatives, true negatives, and false positives, respectively.

Features Evaluation
In this study, the effects of the data sampling, the volume of the training dataset, and the dimension of input features, on the classification performances of machine learning were further analyzed.

Effects of Data Sampling
To examine the impact of data sampling on the classification performances, we randomly selected one subset from CAD subjects and one subset from non-CAD subjects as the testing datasets, respectively. All the remaining subsets were used as the training datasets. This process was repeated 16 times for each algorithm. In this part, we only studied the classification performances of the polynomial-SVM algorithm as it showed good performances (see Results below).

Effects of the Volume of Training Dataset
There were three cases in this section. Case 0 was that 75% of data (444 from non-CAD subjects and 429 from CAD subjects) were used for training, Case 1 was that 50% of data (296 from non-CAD subjects and 286 from CAD subjects) were used for training, and Case 2 was that 25% of data (148 from non-CAD subjects and 143 from CAD subjects) were used for training. Given that the different data sampling did not affect the classification performances (See Results below), in order to simplify the subsequent description, we further defined the subsets of non-CAD subjects as C1 = A1 + A2, C2 = A3 + A4; and the subsets of CAD subjects as D1 = B1 + B2, D2 = B3 + B4. Case 0: we randomly selected three subsets from (A1, A2, A3, A4) and (B1, B2, B3, B4) as training data sets, respectively, the remaining morphometric data were used as the testing data sets; Case 1: we randomly selected one subset from (C1, C2) and (D1, D2) as training data sets, respectively, the remaining subsets were used as the testing data sets; Case 2: similarly, we randomly selected one subset from (A1, A2) and (B1, B2) as training data sets, respectively, the remaining morphometric data (including (A3, A4) and (B3, B4)) were used as the testing data sets. This part performed by using the best classification algorithm amongst the algorithms given above.

Effects of the Dimension of Input Features
Six different morphological features were available to generate the classifier. There were C k 6 (k ∈ [1, 6]) possible combinations for the specific input features, where k was the number of morphological features selected from six morphological features. This part performed by using the best classification algorithm with 75% of morphometric data used for training.
The best classification algorithm was further applied to train each morphological feature to find the most important features for the detection of CAD.

Models Running Approaches
The proposed machine learning methods were performed on a PC with 2.60 GHz Intel Core i7 CPU, 16 GB RAM, and a windows 7 operating system. All the machine learning models were run in the MATLAB software (Math Works, Natick, MA, USA) with the use of the classification learner codes and the computational time of each model can be completed within~30 min. Moreover, libSVM [23], an open-source library, was applied to build the SVM models. The results of the machine learning classification were analyzed in the following subsections.

Polynomial-SVM Model with Grid Search Optimization Showed the Best Performance in the Detection of CAD
In this study, the classification performances of linear-SVM, polynomial-SVM, and RBF-SVM were first studied to choose the best SVM algorithm for the following investigation. Previous studies suggested that the SVM model could not achieve best classification outcomes if the kernel functions and parameters were not selected properly [21]. To find the best SVM algorithm for the detection of CAD, we compared the classification performances of linear-SVM, polynomial-SVM, and RBF-SVM algorithm with the use of three different parameter setting methods of default method, grid search and PSO ( Table 2). The results showed that although all SVM algorithms worked well with the default parameters, the classification performances of the SVM models were improved remarkably through parameter optimizations with both grid search and PSO. Moreover, for all the SVM models, the performances optimized by grid search were better than those by PSO ( Table 2). On the other hand, the classification performances of polynomial-SVM were much better than those of linear-SVM and RBF-SVM (Table 2). This suggested that polynomial-SVM with the parameter optimization method of grid search achieved the best performance among three given SVM models.
To compare the performances among the best SVM algorithm (polynomial-SVM) and the other machine learning algorithms, the accuracies of the polynomial-SVM algorithm and other machine learning algorithms (LR, DT, LDA, k-NN, and ANN) were further studied ( Table 3). The aim of this part was to assess the classification capability of all used algorithms. The results indicated that the polynomial-SVM model achieved the best performance in the detection of CAD, followed by ANN. The classification accuracy of LR, DT, LDA, k-NN, ANN, and polynomial-SVM were 96.30%, 97.00%, 92.30%, 95.70%, 98.40% and 100.00%, respectively. This suggested that polynomial-SVM was the best algorithm for the detection of CAD amongst all these algorithms. Hence, the following studies will be mainly based on the polynomial-SVM model (with the parameter optimization of grid search) for further research.
Here, LR, DT, LDA, k-NN, and ANN represented logistic regression, decision tree, linear discriminant analysis, k-nearest neighbors and artificial neural network classifiers, respectively. The parameter optimization method of the polynomial-SVM applied the grid search.  To examine whether the classification performances of the polynomial-SVM model were dependent on data sampling, 75% morphometric data from both non-CAD and CAD subjects were randomly selected as the training datasets, and the other morphometric data were selected as the testing datasets. The results in Table 4 indicated that the values of the classification performances (including training accuracy, testing accuracy, testing sensitivity, and testing specificity) for the polynomial-SVM model were similar in all sampling cases. Moreover, the fluctuation of the classification performances of the polynomial-SVM model was less than 1% . This suggested that the classification performances were not affected by the data sampling. Furthermore, the results also showed that all classification performances (accuracy, sensitivity, and specificity) of the polynomial-SVM model were close to 100%. This suggested that polynomial-SVM model was very effective and stable to detect CAD by combining the measured morphological features. Table A2 also showed the comparison of the classification performances among linear-SVM, polynomial-SVM, and RBF-SVM under using different sampling methods. The fluctuations of the classification performances of all the SVM models were within 5%. This further suggested that the classification performances of all SVM algorithms were not affected by the data sampling, especially for the polynomial-SVM algorithm (see Appendix A).

Adequate Training Data Volume Was Necessary and Sufficient to Obtain High Detecting Performance
Although polynomial-SVM algorithm had exhibited high performances in classification events, this model had speed limitation for training larger datasets [24]. However, excessively reducing the volume of training data might affect the classification performance of the polynomial-SVM model. A reasonable training data volume should not only guarantee the running efficiency but also retain the high classification performance. To study the effect of the training data volume on the classification performances, 75% (Case 0), 50% (Case 1) and 25% (Case 2) of morphometric data were selected as training datasets (see "Methods" section; please note that the results of Case 0 were presented in Table 4). As shown in Table 5, although the performance of the training accuracy was not affected by the training datasets volume, the performances of testing accuracy, testing sensitivity, and testing specificity were sensitive to the volume of training datasets. In addition, when the volume of training data was sufficient (≥50%), all the performances of the polynomial-SVM model reached the best (close to 100%) and were not affected by the training datasets volume and the sampling method (Tables 4 and 5 (Case 1)). These results suggested that adequate training datasets volume (≥50%) was necessary and sufficient to obtain high detecting performance. Note: Case 1 and Case 2 indicated the 50% and 25% volume of morphometric data for training, respectively.

The Effect of the Dimension and Combination of Morphological Features on the Classification Performances
To examine the effect of the input features dimension (features number of a combination) on the classification performances, all the combinations of the six morphological features were examined with the polynomial-SVM model. When the dimension was k, there were C k 6 possible combinations that can be used to build model (see "Methods" section). Each scatted point in Figure 3a-d represented a detection result for the specific combination. The results showed that the performances of training accuracy, testing accuracy, testing sensitivity, and testing specificity exhibited a similar trend. The classification performances of the model were improved as the dimension of input features increased (Figure 3). When the dimension ≥4, all the classification performances achieved a high level (over 95%) and were not susceptible to the combination approach. However, when the dimension <4, the classification performances increased rapidly as the number of the dimension increased, and the classification performances exhibited large dispersion among different feature combinations for a specific dimension. This suggested that the feature combination approach was also a factor affecting the classification performances of the machine learning model when the dimension <4.

Bifurcation Diameter Exponent ( ) and Area Expansion Ratio (AER) Were Two Key Features for the CAD Detection
To find out which morphological feature(s) was the most important feature for the detection of CAD, we compared the classification performances of the polynomial-SVM model among each single morphological feature ( Table 6). The results showed that the morphological feature exhibited the best performance among the six morphological features, followed by , while the had the worst performance. Moreover, it was worth noting that, in the cases of combinations of two input features, there were two points of the classification performances that were significantly higher than the other points (see Figure 3). This suggested that those two detection results of the polynomial-SVM model achieved pretty good performances than the other results under the two-dimensional combination. Hence, in order to find out which features were selected of these two detections, we further compared the classification performances of the polynomial-SVM model for the combinations with two features ( Table 7). The results showed that the combinations of ( & ) and ( & ) achieved the testing accuracy of 89.31% and 87.93%, which were significantly higher than the thirdranked combination of & (with the testing accuracy of 64.83%) ( Table 7). These results suggested that the polynomial-SVM model with the use of feature and may have great clinical application prospects in the early detection of CAD.

Bifurcation Diameter Exponent (n) and Area Expansion Ratio (AER) Were Two Key Features for the CAD Detection
To find out which morphological feature(s) was the most important feature for the detection of CAD, we compared the classification performances of the polynomial-SVM model among each single morphological feature ( Table 6). The results showed that the morphological feature n exhibited the best performance among the six morphological features, followed by AER, while the had the worst performance. Moreover, it was worth noting that, in the cases of combinations of two input features, there were two points of the classification performances that were significantly higher than the other points (see Figure 3). This suggested that those two detection results of the polynomial-SVM model achieved pretty good performances than the other results under the two-dimensional combination. Hence, in order to find out which features were selected of these two detections, we further compared the classification performances of the polynomial-SVM model for the combinations with two features ( Table 7). The results showed that the combinations of (n & AER) and (n &  Table 7). These results suggested that the polynomial-SVM model with the use of feature n and AER may have great clinical application prospects in the early detection of CAD.

Discussion
Diversity is one of the properties of medical features dataset and greatly increased the difficulty of medical data mining. In this study, we utilized the commonly used machine learning models (such as LR, DT, LDA, k-NN, ANN, linear-SVM, polynomial-SVM, and RBF-SVM) for the detection of CAD by using the image-based morphometric features. Six measured morphological features were applied to generate the machine learning models. These morphological features were representative of the detailed quantitative topological information of the coronary bifurcations. The results showed that the polynomial-SVM algorithm with the use of the grid search optimization method showed the best performance for the detection of CAD and yielded the classification accuracy of 100.00%. It was worth noting that adequate training data volume and input feature dimensions were necessary for obtaining high classification performances of machine learning. Moreover, when the volume of training dataset was large enough, the classification performances were not susceptible to the data sampling method. In addition, we further found that the exponent of vessel diameter (n) and the area expansion ratio (AER) were the most two important features for the detection of CAD, especially for their combination.
Remarkable progress have been made in applying different machine learning algorithms with medical features for detecting different diseases such as various types of cancer and cardiovascular diseases [12][13][14][25][26][27]. Abdar et al. proposed a nested ensemble nu-support vector classification (NE-nu-SVC) model for the diagnosis of CAD [26]. The proposed model provided accuracies of 94.66% and 98.60% for two different datasets (Z-Alizadeh Sani and Cleveland CAD datasets), respectively. Singh et al. [27] applied extreme learning machine (ELM) to detect CAD with 100% accuracy when constructing the classifier with 31 input features. However, the detection accuracy of ELM was drastically decreased to 68.48% when the number of input features was reduced to 24. Nasarian et al.'s summarized the existing machine learning methods for the detection of CAD with the use of various medical databases [12]. It showed that the detection accuracy of different algorithms varied with the medical databases. This suggested that both machine learning algorithms and features selection method would play significant roles in the medical diagnostic process. Table A1 (see Appendix A) summarized the comparisons of machine learning performances for the detection of CAD with different classifier algorithms and medical features, which also supported Nasarian et al.'s study that both features and machine learning algorithms could significantly affect the performances of machine learning. In this study, we hence comprehensively compared the prediction performance of existing commonly used algorithms to select the best algorithm for the evaluation of the features.
The SVM model is one of the most well-known supervised machine learning techniques that have been widely used in pattern recognition and binary classification problems [28]. Since the SVM algorithm was invented, a large number of studies have been done by previous researchers on its optimization [21,29,30]. Syarif et al. showed that SVM parameter optimization using grid search was powerful to improve classification accuracy [21]. Our findings showed that although SVM worked well with the default parameters, and the performances of SVM could be significantly improved by using parameter optimizations with both grid search and PSO. Liu et al. found that the RBF-SVM achieved the best accuracy in the classification of fasting plasma glucose level ≥ 126 mg/dl vs. fasting plasma glucose level < 126 mg/dL, and the linear-SVM performed the best in the classification of fasting plasma glucose level ≥ 100 mg/dL vs. fasting plasma glucose level < 100 mg/dL by applying support vector machine modeling to predict diabetes disease [29]. The study by Patle et al. revealed that the RBF-SVM was better than the polynomial-SVM for the classification task when the data set was very large [30]. In this paper, we found that the polynomial-SVM with grid search optimization method performed the best in the detection of CAD by using six measured morphological features to build the SVM classifiers (Table 2).
Previous studies showed that features used for building the machine learning classifier can be extracted from the diagnostic images (such as MRI, CTA and ultrasound) by automatic algorithms [31]. However, the clinical significance of these features was unknown and difficult to be interpreted. Moreover, for a given image, automatic algorithms can generate a large number of features. It was suggested that reducing the number of features in machine learning to speed up the training was of great importance especially when dealing with large datasets [32]. However, this process may degrade the classification results. A reasonable method for feature selection was of great significance for building a high-performance classification model. There are various automatic algorithms for the feature extraction or selection, such as F-score and SVM-RFE [33,34]. These methods were helpful to efficiently select the features for the machine learning. However, the F-score algorithm did not provide any mutual information among the features [33], while the SVM-RFE algorithm can only be used to linear kernel SVM [34]. In the present study, the six morphometric features were selected because they had exhibited great clinical significance in terms of atherosclerosis [20] or indicated by medical experts. The results showed that our features achieved high classification performances with using only six morphological features for building the machine learning models, especially for polynomial-SVM (close to 100.00%) (Tables 2-4 and Figure 3 and Table A2). This suggested that reasonable and targeted selection of features for building machine learning models could greatly improve the classification performances. To the best of our knowledge, this study was the first report that the machine learning models successfully detected CAD by using the measured morphological features from the reconstructed 3D coronary bifurcations. Trivedi et al. found that the SVM model achieved an accuracy of 98.5% with 1358 dimensions of feature while the accuracy below 85.2% with 158 dimensions of feature, by studying the effect of feature dimension on the ability to detect email spams [35]. Our present modeling strategy showed a similar phenomenon that the average testing accuracy with the lowest (one) and highest (six) dimensions of the feature were 51.83% and 99.98%, respectively ( Figure 3). Moreover, we found that the morphological feature n achieves the best classification performances among the six morphological features, followed by AER, while the  Table 6). The results also indicated that two combinations in the two-dimensional section achieved the testing accuracy as high as 90% (Figure 3). This classification accuracy was remarkably higher than that of the other combinations with two input features. Further studies suggested that these two combinations were (n &  Table 7). These results were consistent with our recent study which predicts that the n−3 3 and AER were the best two indicators for CAD prediction (data not shown). Our results further indicated that the volume of training data was another factor that can impact the classification performance (Table 5). A study by Rudd et al. indicated that the whole data were split into 80% (1652# of 2066#) for training and the remaining 20% for testing achieved high accuracy of 97.5% [31]. Our present strategy showed similar high performances when no less than 50% of the morphometric data were used for training. In addition, the performance did not improve significantly as the volume of training datasets further increases from 50% to 75% (Tables 4 and 5 (Case 1)). Specifically, 50% and 75% of morphometric data were selected for training resulting in a mean testing accuracy of 99.96% and 99.98%, respectively. These results were obtained by the polynomial-SVM model. These suggest that the polynomial-SVM model was a stable and powerful approach for the detection of coronary artery disease.
Although the polynomial-SVM model performed the best in this study, this algorithm could also have its inherent pros and cons. Compared to other models used in this study, the most pros of the polynomial-SVM model is that it has the best performances when combined with measured morphological features. However, it also has its cons side. For instance, to have the best classification performances, parameters tuning is necessary for the polynomial-SVM model. In addition, when dealing with large amounts of datasets, the polynomial-SVM model will be time-consuming [24]. A survey study by Leo et al. indicated that different types of features data could suit for different machine learning models [36]. Our present study demonstrated that, compared to other considered models, the polynomial-SVM model is the best for detecting the changes of the morphological features, however, whether it is also the best for detecting other types of features needs further exploration.

Conclusions
In this work, the detection of CAD based on machine learning models with the measured morphological features were proposed. The experimental results demonstrated that among all the considered machine learning models, the polynomial-SVM model performed the best; and, moreover, the exponent of vessel diameter (n) and the area expansion ratio (AER) were the two most important features for the detection of CAD; in addition, the combinations of (n & significantly decrease the dimension of input features without losing much detection accuracy. This study was proposed a new methodology that combined using machine learning techniques and imaging-based morphological measurement methods for the detection of CAD and obtained the high detection accuracy, which could aid the clinicians to detect CAD accurately in a non-invasive way at the early stages. The proposed methodology may also be applied to earlier detection of other diseases related to morphology change, such as carotid artery disease. Moreover, as the change of the morphological features of nervous tissue usually occurred before the disease happened [37], then it also could be potentially used for the early detection of neurological diseases, such as Parkinson's disease. For future work, we will be committed to developing an automatic method for the measurements of morphological features to speed up the original data acquisition process. Moreover, the proposed method will be applied for more different vascular disease detection to verify the feasibility of our given strategy.   Table A2 listed classification performances of linear-SVM, polynomial-SVM, and RBF-SVM, by using the parameter optimization method of grid search. The results indicated that the values of the classification performances (training accuracy, testing accuracy, testing sensitivity, and testing specificity) for the specific SVM model were similar in all sampling cases. Moreover, the fluctuations of the classification performances of all the SVM models were within 5%. The fluctuation of these classification performances was even less than 1% of the polynomial-SVM model. This suggested that the classification performances were not significantly affected by the data sampling, no matter what SVM algorithm was used. Moreover, the results showed that classification performances of accuracy, sensitivity, and specificity of all the SVM models were over 95%, and the values of polynomial-SVM were the best (almost all were 100%). This suggested that SVM models were very effective and stable to detect CAD by combining the measured morphological features, especially for polynomial-SVM.  Table A2. The classification performances of all SVM models (linear-SVM, polynomial-SVM, and RBF-SVM) with the parameter optimization method of grid search (75% of morphometric datasets for training). A(1,2,3) A(1,2,4) A(1,3,4) A(2,3,4) Mean Results SVM Kernel Functions CAD non-CAD A (1,2,3) A(1,2,4) A(1,3,4) A(2,3,4