Early Prediction in Classification of Cardiovascular Diseases with Machine Learning, Neuro-Fuzzy and Statistical Methods

Simple Summary Timely and accurate detection of cardiovascular diseases is critical to reduce the risk of myocardial infarction. This article proposes a methodology using machine learning, neuro-fuzzy and statistical methods to predict cardiovascular diseases. Our results show that the proposed methodology outperformed well known approaches, reaching a high prediction accuracy greater than 90%. Our methodology helps medical doctors to enhance diagnosis, quality of healthcare and efficacious prescriptions, decreasing the time for exams and minimizing expenses in clinical practice. Abstract Timely and accurate detection of cardiovascular diseases (CVDs) is critically important to minimize the risk of a myocardial infarction. Relations between factors of CVDs are complex, ill-defined and nonlinear, justifying the use of artificial intelligence tools. These tools aid in predicting and classifying CVDs. In this article, we propose a methodology using machine learning (ML) approaches to predict, classify and improve the diagnostic accuracy of CVDs, including support vector regression (SVR), multivariate adaptive regression splines, the M5Tree model and neural networks for the training process. Moreover, adaptive neuro-fuzzy and statistical approaches, nearest neighbor/naive Bayes classifiers and adaptive neuro-fuzzy inference system (ANFIS) are used to predict seventeen CVD risk factors. Mixed-data transformation and classification methods are employed for categorical and continuous variables predicting CVD risk. We compare our hybrid models and existing ML techniques on a CVD real dataset collected from a hospital. A sensitivity analysis is performed to determine the influence and exhibit the essential variables with regard to CVDs, such as the patient’s age, cholesterol level and glucose level. Our results report that the proposed methodology outperformed well known statistical and ML approaches, showing their versatility and utility in CVD classification. Our investigation indicates that the prediction accuracy of ANFIS for the training process is 96.56%, followed by SVR with 91.95% prediction accuracy. Our study includes a comprehensive comparison of results obtained for the mentioned methods.


Introduction and Bibliographical Review
Cardiovascular diseases (CVDs) are related to arrhythmia, blood vessel problems, heart failure, myocardial infarction, strokes and other cardiac issues, these being some of the leading causes of death in the world [1]. In 2019, as reported by some organizations [2], over 17 million persons died from these diseases, which is more than 30% of all deaths worldwide during the same year. Thus, in healthcare, particularly for CVDs, timely and accurate detection of diseases and determining the vital risk factors are critically important.
Fuzzy logic allows us to represent the common knowledge, mainly of the qualitative linguistic type, in a mathematical language through the theory of fuzzy sets and membership functions associated with them [22]. Fuzzy (or non-crisp) logic is a multivalued paraconsistent logic in which the true values of linguistic variables can be transformed into any real numbers between zero and one through probabilities. Therefore, it is employed to handle the concept of partial truth, where the truth value may range between entirely true and completely false. By contrast, in the Boolean (traditional or crisp) logic, the truth values of variables may only be the integer values zero or one. Fuzzy logic is based on the fact that people make decisions using imprecise and non-numerical information. This logic allows decisions based on intermediate degrees of compliance with a premise. Such logic is better suited to our real world, where our opinions are relative. Fuzzy logic is one of the best AI methods for coronary heart disease diagnosis. It is employed to predict disease occurrence with the help of linguistic variables and a membership function. Fuzzy logic uses linguistic variables, which usually enable us to measure by crisp numbers. Fuzzification is the next step for the identification of the variables. The rules are established with linguistic variables and their term sets which are the backbone of a fuzzy model called fuzzy inference systems.
A fuzzy model was proposed using membership functions (MFs) to find out the number of MFs affecting the outcome optimality and accuracy of a fuzzy model [23][24][25]. The performance of an adaptive neuro-fuzzy inference system (ANFIS) model is determined by how well the system parameters are chosen, the complexity they have and the type of training offered by ANNs [26]. An ANFIS method to classify the CVD degree was developed using seven factors along with the k-fold cross-validation method and the patient's heart disease degree was successfully estimated with a 92.3% accuracy rate [27]. ANFIS controllers were utilized to compare the feedback from the output of electrocardiogram (ECG) signals, determining a control scheme for people who suffer from CVDs [28]. A medical diagnostic system based on ANFIS and principal component analysis was investigated to forecast the CVD risk with a classification accuracy of 93.2% [29].
The gap in the investigation of CVDs is focused on enhancing the prediction accuracy using numerous factors with traditional classification methods [18][19][20]. Nevertheless, mainly the causes of CVDs are not known precisely. Age, BMI, cholesterol level, diabetes, eating habits, family history of heart problems, gender, high blood pressure, smoking, as well as an unhealthy and stressful lifestyle are the major factors affecting CVDs. In recent studies, the ANFIS and ML approaches were employed for predicting CVDs using factors such as age, BMI, cholesterol level (LDL/HDL), family history, F-glucose, gender, glucose level, high pressure, lifestyle, nationality, past medical history (PMH), red blood cell (RBC), smoking and stress level. As mentioned, some CVD factors are measurable, some are categorical and the response variable is also categorical. It is necessary to use the Gifi transformation method to balance the data and not oversimplify the complexity of the problem. This transformation includes categorical and measurable risk factors with non-linear interactions and converts the data into a measurable form. The prediction is based on previous learning and performs its duties best if the training data are not extrapolated [30]. It is possible to predict the patients who suffer the CVD. As a branch of AI, ML is increasingly utilized for predicting CVDs.

Contributions and Plan of the Article
Based on the complete bibliographical review presented above, we identified a gap that allows us to propose a methodology to improve predictive accuracy when detecting CVDs using numerous risk factors. We employ AI techniques based on ML, adaptive neuro-fuzzy and statistical approaches for the early prediction and classification of CVDs.
Consequently, the primary motivation for the present investigation is to provide timely medical treatment and diagnosis using an intelligent system based on current digital technologies. This system must besides effect efficient patient monitoring. Therefore, the main objective of the present investigation is to design and put into practice a system to improve predictive accuracy when detecting CVDs employing several risk factors.
Our contributions to the area can be summarized as follows: • ML, ANFIS and statistical classification tools supported by the Gifi method are utilized to predict CVDs early in a more precise way. • The effects of seventeen parameters on CVDs are investigated in depth using response surface methodology (RSM). • The obtained findings are matched with the state-of-the-art studies comprehensively. • Sensitivity analyses are carried out for ANFIS and SVR to determine the influence of significant factors such as age, BMI, glucose, cholesterol, RBC and HDL/LDL cholesterol levels on CVD. • The results of statistical approaches with the Gifi method are given using statistical classification tools and linear discriminant analysis. • The Nash-Sutcliffe model efficiency (NSE) coefficient is used to quantitatively describe and assess the model output's predictive accuracy. • We compare the capability of an adaptive elastic net logistic regression (AENLR) [31] and Gifi transformation with ML techniques (SVR, MARS, M5Tree and ANNs).
The plan of the present article is as follows. Section 2 points out the methodology used in the present investigation; Section 3 gives the results and findings of the present study. In Section 4, we discuss the performance of the applied approaches as well as some limitations of our study. In Section 5, conclusions related to the present investigation are provided.

Dataset and Framework of the Study and Patients
The dataset was collected retrospectively from the medical record system of family medicine and cardiology clinics at a university hospital in Saudi Arabia, including 159 patients over the age of 16 who visited the cardiology clinic and complained of heart disease symptoms for over four months between 6 June 2020 and 10 October 2020. Each patient was tested for biometric measurements, ECG and bold lab works (such as F-glucose, HBA1c, cholesterol levels and RBC). In addition, every individual was asked about all other historical diseases that they had. Then, the diagnosis of the presence or absence of cardiovascular diseases for a patient was determined according to the expert opinion of the medical doctor based on the hospital records for each patient.
For this retrospective observational study, the data were collected with no names or identification (ID) numbers to preserve the confidential records after obtaining administrative permission from the university hospital. Thus, this research depends on clinical and laboratory data collection; no experimental interventions were needed or applied. Typically, most of the measurable data could be collected online. Nevertheless, we used retrospective data to avoid wasting time and costs. We decided on our full criteria set from the early beginning of our work, considering the demographic constraints in Saudi Arabia. We did not exclude or include any other criteria during the work. These criteria were determined by an expert medical consultant, one of the authors of the present article. As a statistical analysis, CVD prediction and classification were made with ML approaches and elastic net modeling. The MATLAB software was used for all computations.
The dataset is collected to (i) classify and determine the best predictors (covariates) during analysis; (ii) build different ML classifiers and employ them for achieving an adequate model; and (iii) provide an appropriate analysis regarding the transparency of classifiers and the reasoning process to improve medical physicians' prediction accuracy. Table 1 describes the CVD dataset considering seventeen input variables corresponding to CVD risk factors, which play an essential role in CVDs and are specifically chosen by the experts in CVD and family medicine. Table 2 reports all the risk factors with their sources.

Gifi System for Data Transformation
The purpose of the Gifi method in this study is to convert categorical data into measurable data. The labels of the research's ordinal or nominal factors contain some metric properties. To transform data with the Gifi method, scaling and linear combination methods are used together. We assign the ideal scale values to each factor class depending on the procedure optimizing criterion. Qualitative variables are converted to measured variables in the optimal scaling method. In contrast, the linear combination method converts multidimensional categorical data into one-dimensional continuous space by linearly combining their classes.
When the categorical variables have a high dimension, the linear combination method is more helpful [49]. Let (s 1 , . . . , s m ) be an m × 1 vector holding the number of classes for each factor and p signify the dimensionality of the analysis required. We code each variable δ l , for l ∈ {1, . . . m}, into the n × s l matrix H l .
Let X be the object score, represented by an n × p matrix (often p ≤ m). If Y l is the quantifying of the variable classes δ l , H l y l indicates a modification or quantification for each of the n elements of the variable δ l . Objects in the same class acquire the same quantization without further requirements on Y l .
In a homogeneity analysis, the quantization for each variable is gathered in the s l × p matrices Y l . As a result, for the variable δ l , H l y l produces quantifications of the elements. For instance, we have that where H l y l denotes a single transformation generated by variable j on n objects. A homogeneity analysis minimizes a loss function given by σ X;Y 1 ,...,Y m = (1/m) ∑ m l=1 SS(X − H l y l ), where the sum of squares (SS) is the matrix elements. Under normalization, the loss function is reduced concurrently over object scores X and Y l using an iterative approach known as the alternating least squares algorithm; see [50][51][52][53][54] for more information on the Gifi transformation. Here, the categorical variables that need to be transferred to the Gifi systems are activity, gender, nationality, PMH, smoking and symptoms.

The Support Vector Machines Method
SVM is a powerful nonparametric ML approach that can predict and classify complex problems [55]. The method is effective for problems that have nonlinear relations between inputs and output variables. The input vector (X with observed values denoted by x) can be mapped into the output response using an N set of input (X i ) variables in SVR [56]. The nonlinear relation in SVR is defined using the expression Y( where ω 1 − ω N are the weights used to link the input and output data and b is the bias. Here, L(x, x j ) denotes the kernel function that transfers the input data from real space into Ndimensional feature space. The kernel is determined commonly by utilizing the Gaussian radial basis function to define the nonlinear relations stated as L(x, x j ) = exp(−0.5 x − x j 2 /σ 2 ), where σ is the kernel parameter. Here, γ is the value in the SVR procedure [57]. An optimization procedure determines the regression model to verify the unknown parameter weights using two slack variables, ζ and * namely, as Min where γ is the residual used to control the predicted value Y(x) and the observed value denoted by O, when |Y(x) − O| is less than γ and then the error is identified as zero. The Karush-Kuhn-Tucker conditions are applied and the optimum values of are determined by employing the Lagrange relation to maximize the regression function, where ϑ * j , ϑ j denote the corresponding multipliers. The SVR is approximated by The three main parameters of the SVR approach (D, γ, σ) were presented in (1) and they must be defined in the modeling process.

Fuzzy Rules and Membership Functions
MFs can be conveniently defined and expressed by mathematical equations. Parameters (D, σ) are used to identify Gaussian MFs, where D and σ are the MF center and width. Additionally, there are operators called hedges, such as (very, quite, more or less) and connectives, to change their meaning in fuzzy terms. We consider that µ(x 5 ) = µ normal = 0; for x < 1 and x > 5.5; where x j is the specific crisp input variable and µ(x j ) is its membership degree. The membership degrees can be used to specify the fuzzy variables identified by the linguistic terms numerically. A Sugeno fuzzy rule-based ANFIS model assumes that IF X 1 is A, X 2 is B, X 3 is C and so on, THEN Y j = f j (x 1 , . . . , x k ) = ax 1 + bx 2 + · · · + kx k + r j , where A, B and C are fuzzy terms in the premises part of the fuzzy rules, while Y j = f j (x 1 , . . . , x n ) are crisp outcomes in the consequent part of the fuzzy rules illustrating the output of the fuzzy model in this work. Such rules are utilized in a loop (inner) of the model to establish the ANFIS and obtain crisp outcomes of CVD cases.

The ANFIS Approach
For the inference procedure, fuzzy reasoning is used to obtain crisp outcomes from the 'IF-THEN' rule. The fuzzy stage is the preliminary step of the inference system using fuzzified inputs in a specified universe. The firing strength of the rules is important and not all rules need to be triggered (fired) to achieve the requested outputs. In this study, the ANFIS models under consideration have only one output: the CVDs. The mean absolute error (MAE), mean bias error (MBE), the root of the mean square error (RMSE), desirability function (DF) and NSE are employed in the training/testing process to determine the error rate of the model. The MSE is calculated as where y t and y t show the true output and forecasted value of the CVDs. The MSE shown in (4) produces a moderate error that may be preferable to one that usually has small errors and so the method can penalize large forecasting errors. The DF approach transforms the outcome values to a scale-free value, such as desirability, with values of 3, 4 and 5.

Response Surface Method for Factor Assessment and Sensitivity Analysis
The RSM is a mathematical and statistical optimization tool used to model and analyze problems in which several input factors influence the output response. The RSM solves the problems where the relation between input factors and the output response is unknown.
From the result analysis, the two modeling approaches (ANFIS and SVR) provide the robust capability for predicting the CVDs with the highest capability and lower error (high accuracy) among other modeling approaches. These two approaches are employed for the sensitivity analysis when effectively checking the primary influence of some input variables, such as age, BMI, glucose, cholesterol, RBC and LDL, on CVD prediction. The sensitivity analysis is computed using the differential predicted results as the marginal effect by increasing the input variables with ∆x. The increasing input by ∆x, as (x + ∆x), is given in the models for data predictions. The mean of differences between the old prediction of input data with no increase and the new prediction obtained by input data being increased by (x + ∆x) is compared for differential probability of CVDs stated as Considering the marginal effect of input variables using the differential method, the DP for several input variables, that is, age, BMI, glucose, cholesterol, RBC and LDL, are presented in Figures 1 and 2 for the SVR and ANFIS methods. Based on the results presented, we define the sensitivity factor (SF) through the relation given by The SF represents the negative or positive effects of CVD inputs, showing the influence of inputs on the CVD. The highest SF indicates the influence of inputs that is highly sensitive.

Statistical Approaches for CVD Classification
Here, we not only present a wide range of soft computational approaches for CVD classification but also aim to compare them with statistical classification methods. For this purpose, some statistical classification tools such as linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), k-nearest neighbor (kNN), naive Bayes (NB) and decision trees (DT) classifiers are used.
Different from classical classification algorithms, it has been proposed for the first time in the literature to determine the variables affecting the classification using AENLR analysis integrated with the the Gifi system data transformation method.
The AENLR model [31] is given by By allocating small weights to large coefficients and big weights to small coefficients, adaptive weights are intended to assure regularization. The penalty term stated in (5) is formulated as If we assume λ 2 > 0, then the expression established in (6) is strictly convex. Moreover, this penalty term can be written as where α ∈ (0, 1) is an elastic net tuning parameter that controls the mixing between the l 1 -norm and l 2 -norm terms in the penalty. It is commonly recommended to use a relatively large value of α and then use it in 10-fold cross-validation to choose λ defined in (7). After determining the variables contributing to the classification using AENLR, the results are compared with these variables using the five different classification procedures mentioned above. Figure 3 shows a scheme indicating that nine different AI approaches have been trained and tested for the dataset obtained from CVD patients. The performance of models is compared using the MAE, RMSE and MBE metrics.

Tables 3 and 4 and Figures 4-8 provide descriptive statistics of each variable under study.
Note that the gender ratio in the dataset is 61.01% (female) and 38.99% (male), with n = 159 patients. The distributions of the continuous variables are mostly asymmetrical, with high variability, and present some outliers.

ANFIS for CVD Prediction
The ANFIS modeling approach is built based on the different learning capabilities of ANN algorithms. In this study, a hybrid learning algorithm is employed to derive the Sugeno ANFIS framework using the learning capability of the BPNN algorithm. As mentioned, the dataset was collected from a university hospital and the ANFIS approach predicts CVDs with crisp numerical outcomes. The ANFIS modeling approach includes the input variables, the fuzzy rules set, the MFs, the designed inference system and the defuzzification procedure for predicting CVDs. Different combinations of input and output relationships are trained/tested to reach the most suitable model for predicting the patient suffering the CVD.       Similarly, the 2D relations of some inputs, such as the cholesterol level, glucose level, BMI and smoking, versus the response (CVD), are presented in Figure 10a-d, respectively. As seen in Figures 9 and 10, the relations of the input-output factors for CVDs are complex, ill-defined, unknown and remarkably nonlinear, which justifies the use of AI techniques. The 3D plots exhibit the full surface of the CVD output and the related input span. Hence, developing a mathematical model to solve this complex problem is difficult for decisionmaking. ML and ANFIS approaches can usually predict such complex problems. Hence, fuzzy methods and other intelligent modeling approaches, such as ANNs or hybrid intelligent systems, can be efficiently used with linguistic statements to solve imprecise and uncertain information [57] for predicting CVDs. Fuzzy and/or neuro-fuzzy modeling approaches can tell us more about the dynamics of CVDs by a set of linguistic associations with the help of input and output parameters. These associations use 'IF-THEN' rules to show the relationships of factors using variables related to linguistics and the corresponding terms. This 'IF-THEN' is the mapping of factors constituted from linguistic variables and terms, usually having two parts called antecedent and conclusion. The rule set is the backbone of an ANFIS. Gaussian memberships are utilized to detect the parameters and fuzzification process of the CVDs. The rule is utilized in a loop of the ANFIS model operating and obtaining crisp outputs for the classification of CVDs.
Our findings show that employing a small number of clusters (demonstrated by the rules) results in obtaining so many rules. In contrast, large cluster numbers generally produce fewer rules. Both are undesirable and must be avoided as they cause huge deviations in the prediction performance of the ANFIS model for CVD cases. Then, additional MFs do not increase the effectiveness of a fuzzy model [58]. Figure 11 shows the MFs which are fine-tuned for the input predictors: smoking (X 6 ), BMI (X 8 ), LDL cholesterol level (X 15 ) and cholesterol level (X 13 ). Different terms can be employed to identify fuzzy linguistic variables. For instance, the terms low, normal, high, very high; nonsmoker, average, highly smoking; low density, average and high HDL are the fuzzy linguistic terms used in this study.

The ANFIS Approach for CVD Prediction
As an inference procedure, fuzzy reasoning obtains crisp responses from the fuzzy 'IF-THEN' rules. In this study, the dataset is mixed, containing both data from categorical and continuous variables. To model the ANFIS approach and compare the results, the categorical data are transformed into continuous data using the Gifi system.
The input data was fuzzified to develop a fuzzy inference system in the specified universe. The second step was the MF formulation and the establishment of fuzzy rules. As seen in Figure 11a-d, in this work, we employ the Gaussian MFs, formulated based on the dataset obtained for the factors affecting the CVDs. Such rules are utilized in a loop (inner) of the model to establish the ANFIS and obtain crisp outcomes of CVD cases. Seven fuzzy rules were established based on the data available. Our model revealed that a small number of clusters (defined by rules) obtains too many rules. In contrast, many clusters generally caused a small number of rules. The RMSEs obtained from ANFIS models clearly show deviations in Table 5. Hence, the best ANFIS model producing the lowest RMSE is obtained when the number of MFs is seven. The fine-tuned MFs for the input variables: smoking (X 6 ), BMI (X 8 ), LDL cholesterol level (X 15 ) and cholesterol presence (X 13 ) are presented in Figure 11a-d. A fuzzy linguistic term set was established as 'rarely smoking, regularly smoking, smoking, heavily smoking, not smoking at all' and 'very low, low, normal, slightly high, high' for the fuzzy linguistic variables affecting the CVDs.
Gaussian MFs are utilized to identify the fuzzy linguistic variable 'LDL cholesterol level (X 15 )'. Its MFs are indicated by the fuzzy terms 'extremely low'; 'low'; 'normal'; 'high'; 'very high'; and enormously high'. The corresponding linguistic term 'normal' is mathematically stated in (2) and (3). The firing strength of each rule is necessary and it should be noted that not all the rules need to be fired to obtain the desired output. In this study, the ANFIS models under consideration have only one output: the CVDs. As seen in Table 6, eight ANFIS structures are developed and tested with several rules that were used to specify the CVD cases and minimize the prediction error. Initially, the error tolerance was set at 0.001 for the training process and 1000 iterations of the back-propagation multi-layer (BPML) algorithm were targeted. The MAE, RMSE and MBE approaches were used to assess ANFIS prediction performance. Additionally, the DF and NSE of the training and testing process were determined for an ANFIS structure; see Table 6.
The MAE, MBE, RMSE, DF and NSE are employed in training/testing processes to determine the error rate of the model and the results are given in Table 5. The ANFIS structure has average MSE, RMSE and MBE of 0.0165, 0.0679 and 0.0028, respectively, for the training process. As given in Table 5, the DF of the ANFIS model for the training process is the highest, with a value of 0.9829. The MAE, RMSE and MBE results show that the ANFIS approach reaches 0.0165, 0.0697 and 0.0028 error rates for the training process, respectively. For the testing process, the ANFIS approach obtains error rates of 0.2085, 0.3292 and 0.0062, respectively. The DF and NSE of the ANFIS approach are 0.9829 and 0.9656 for the training process, which are the highest rates among the other approaches.
The NSE coefficient is 0.9656, which quantitatively describes the predictive accuracy of the model output. The NSE for the training is high enough for the trained and tested ANFIS model. Similarly, the NSE coefficient is determined and used to describe and assess the predictive accuracy of model output quantitatively. The NSE is equivalent to the coefficient of determination (R 2 ), so its range is between zero and one. An NSE coefficient close to one indicates a model with more predictive capability. Figure 12 shows the distribution of true data points versus the training/testing outcomes of an ANFIS structure for CVDs. This structure provides the lowest mean error (ME) and standard deviation (SD) among the other approaches, whose values are 0.0034 and 0.1603, respectively, showing its superior prediction capability with low uncertainty and robust approximation.

Elastic Net Modeling for CVD Prediction
We compare the capability of the AENLR and Gifi transformation with the ML techniques (SVR, MARS, M5Tree and ANNs such as ANN-BR, ANN-SCG, ANN-BFG, ANN-LM, RBFNN) employed for predicting the CVDs. The accuracy of these methods was investigated and is presented in Table 6. SVR, M5Tree and MARS are called probit models in statistics. They are the regression models where the dependent variable (Y) takes only two values: with '1' showing the cardiac disease and '0' indicating no cardio disease in our study. These modeling approaches predict the probability that, if a patient carries some specific characteristic, she/he may fall into one of these two specific classes.
Eight different ML methods were trained and tested for the dataset of CVDs and the performances of the models were compared using MAE, RMSE and MBE. The results can be seen in Table 5. For checking the optimization of the responses, we employed DF. As seen in this table, the desirability of the SVR approach was found to be 0.9585 for the training. Similarly, the DFs of ANN-SCG and ANN-BFG are 0.9066 and 0.8987, respectively. In addition, the DF of the ANN-BR approach for the testing process is 0.7779, which is the highest among the approaches. Table 5 also shows that the other methods have slightly lower but closer DF values for the training and testing processes. For the ANN-SCG and ANN-BFG methods, the NSE coefficients are 0.8237 and 0.8079 for the training and testing processes, respectively. However, the other ML approaches have lower NSE coefficients for training and testing procedures. For instance, the ANN-BR has the highest testing coefficient of 0.6215, but the M5Tree and MARS have values of the NSE coefficient equal to 0.5730 and 0.5032, respectively.
As seen in Table 5, the SVR method gave 0.0387, 0.0389 and 0.0046 rates of error for the training process of CVD prediction using the MAE, RMSE and MBE methods, respectively. The desirability rate and NSE were found to be 0.9583 and 0.9195 for the training process. The testing process of the SVR method gave 0.2165, 0.2965 and −0.0041 error rates using MAE, RMSE and MBE, respectively. The DF and NSE are 0.7163 and 0.5382 for the testing process, respectively. In addition, ANN-SCG presents the best outcomes with 0.0847, 0.1232 and 0.0008 error rates of the training process using MAE, RMSE and MBE, respectively. The testing errors of the ANN-SCG approach are 0.2720, 0.3842 and −0.0344, considering MAE, RMSE and MBE, respectively.
As mentioned, the SVR is a powerful nonparametric approach of the ML method that is utilized for predicting the response of ill-defined problems with nonlinear relations between input and output factors. In the SVR approach, three main parameters (D, γ, σ) are identified in this work. Employing a method of trial-and-error, the three levels of parameters D ∈ {10, 100, 1000} and γ and σ are identified. Table 7 shows that the smallest RMSE values (a better response among other models) were obtained when D = 10, γ = 0.05 and σ = 0.75. However, increasing D and σ, the RSME did not change significantly. Here, γ in the objective function of the SVR model is an effective parameter for predicting CVDs. The best outcome of the RMSE (0.0389) was obtained when γ was equal to 0.05. Figure 13a shows the data distribution (blue color points and lines) of CVDs and the predicted data (red color points and lines) of CVDs obtained with training/testing processes of the SVR model. This model produces very close prediction outcomes of CVDs. Figure 13b depicts the distribution of true data points (blue color points and lines) for training/testing processes of the MARS method versus the red color points and lines showing the distribution of the predicted CVD cases. In this figure, '1' depicts the patients with cardiac problems and '0' illustrates the patients who do not have cardiac problems.
The blue color points and lines in Figure 13c illustrate the distribution of true data points versus red color points and lines showing the distribution of the predicted CVD outcomes by the M5Tree method for the training and testing process. Figure 14a displays the frequency of errors for the predicted CVD cases, with mean = 0.0028 and SD = 0.1375 when the SVR approach is employed. The error ranges of the SVR approach are smaller, whereas the shape of its histogram is leptokurtic, showing that the findings are better and the approach is superior to the MARS (b) and M5Tree (c) models. For an SVR approach, Table 7 presents ME = 0.0028 and SD = 0.1375. Figure 14b illustrates the distribution of the errors for the predicted CVD cases with mean = −0.0187 and SD = 0.3297 (Table 8). Note that the M5Tree method is a piecewise regression model used for binary decisions. The linear regression functions are developed as the terminal nodes (leaves) to provide the relation between predictors for the causes of CVD risk in the M5Tree model. Figure 14c shows the distribution of the errors for the predicted CVD cases with mean = 0.0165 and SD = 0.3315; see Table 8. Error ranges for prediction of the M5Tree method are high and the distribution is widespread.
As a statistical approach, the MARS method is used for predicting CVDs. In this method, the nonlinear regression is employed using the piecewise linear splines as a basic function, where a stepwise process is applied to explore the basic functions.

ANNs and Pattern Recognition
Multilayer ANNs are well known ML tools with the layers output, input and hidden. The ANNs provide a nonlinear mapping between the responses to the input parameters. We employ 1000 total iterations (epochs) for the training phase of the BPML algorithm with M-nodes (selected between 5 and 15) hidden to optimize the approach.
We utilize a method of trial-and-error to give the best results when predicting messagepassing neural network models. Moreover, we examine different optimization approaches for training the ANN models: Powell Beale conjugate gradient, BFG-BP and LM. The RMSEs for the training dataset using four BPML algorithms with various hidden nodes are presented in Figure 15.
The RBFNN is a fast-training algorithm that can be formulated efficiently to predict complex and ill-defined problems. An RBFNN model for predicting CVDs corresponding to various hidden nodes with different RBF parameters was compared in Figure 15 using RMSE. RBFNN is investigated for σ ∈ {0.25, 0.5, 1, 2, 5} and the number of hidden layers equal to 10,20,30,40,50,60 and 70, with M + 1 being unknown coefficients and M hidden nodes. Then, considering M as one of the main parameters in the RBFNN model, the M-centre of RBF is determined using the K-mean clustering approach. The best CVD prediction is calibrated with RBFNN when the hidden layers are 60 nodes and σ = 0.5 by comparing the RMSE values. The lowest RMSE value is 0.9176, as depicted graphically in Figure 15. The BR method provides stable results for different hidden nodes, but it is not a very accurate training approach compared to the others. The BFG is superior and provides accurate results for the training of ANNs compared to other optimization methods. The number of hidden nodes for the ANN model was selected as M > 10. In the present study, it is set to M = 11.
The ANN pattern recognition process is employed to find the regularities and similarities in a dataset using ML approaches. The similarities are investigated based on a statistical analysis of true historical data and the outcomes of algorithms. The best performance is selected from the iteration with a minimal validation error. After several training iterations, the error generally decreases. However, it may increase on the validation dataset as the network overfits the training data.
The outcomes of the ANN-BR algorithm are presented in Figure 16a. The figure shows the distribution of the true data points (blue points) versus red color points and lines showing the distribution of the predicted CVD outcomes with the ANN-BR approach for the training and testing phases. This algorithm employs the Jacobian matrix for minimizing the combination of weights and squared errors when determining the performance of responses, which produces a network that can oversimplify the optimization process. The ANN-BR network is trained for the inputs and outputs, whereas the best training performance is obtained at the 244 iterations.
From Figure 16b, note that the ANN-CG algorithm is one of the more successful methods for predicting CVDs.
The ANN-BFG is a deep learning algorithm. This is an alternative approach to the ANN-BFG methods for fast optimization. From Figure 16c, the algorithm produced successful outcomes with minimum error and SD for predicting CVDs in the training and testing phases. This figure illustrates the distribution of data points of the ANN-BFG approach. The blue color points and lines show the true data and the red color points and lines show the distribution of the CVDs predicted with this method.
The ANN-LM algorithm is also utilized for training/testing the CVD observation fitting problem and a two-layer feed-forward network is employed. The learning level of this algorithm is high and it is the fastest training algorithm. Moreover, the error rate is lower. Figure 16d shows the distribution of the true data points versus the training and testing outcomes of the ANN-LM algorithm depicted with red color points and lines of CVDs. Figure 17a shows the distributions of errors for the predicted CVDs with ME of −0.0156 and SD of 0.3386; see Table 9. The MSE and RMSE for the true and predicted CVD data are 0.1149 and 0.3389 for the ANN-BR algorithm, respectively. The ANN-CG algorithms search along the conjugate directions, which usually creates faster convergence than the steepest descent directions. The error level of this approach is found to be higher compared to the other algorithms. Figure 17b shows the frequencies of errors for the predicted data of CVDs using the ANN-CG approach. Based on this algorithm, as reported in Table 10, the ME and the SD are −0.0063 and 0.2044, respectively. Therefore, the error level of this approach seems reasonable compared to the other soft comparing methods.   Table 9. The correlation and SD of the true and predicted data of ML models; see abbreviations in Nomenclature.   Figure 17c depicts the distribution of the predicted error. Hence, we obtain ME = −0.0097 and SD = 0.2359; see Table 9. In addition, as presented in Table 11, the MSE and RMSE for the predicted CVDs are 0.0397 and 0.1994 for the ANN-BFG algorithm, respectively. This is one of the more successful algorithms for predicting CVDs. Figure 17d shows the distribution of the predicted error of the ANN-LM approach, whereas the ME and SD are presented in Table 8. This optimization approach reveals ME = −0.0296 with SD = 0.2359. The correlation and SD of ML models' true and predicted data are presented in Table 9, which shows the difference between the targets and the output values of observation. Figure 18 shows the true data points (blue) versus the training and testing outcomes (red color points and lines) of RBFNN approaches.

Response Surface Method for Factor Assessment
When a suitable approximation between the functional relationship of an output response (CVDs) and the nonlinear independent factors are determined, as we did for CVDs, an RSM-based polynomial approach might be a good approximation for a relatively small region problem. Figure 19 shows the plots of the main seventeen factors' effect on the CVDs, constructed based on the RSM. The interactional relationship of measurable and categorical factors indicating the CVD risk is shown in this figure. From Figure 19, note that the measurable factors related to pressure diastolic, age, F-glucose, HbA1c, HDL and the presence of cholesterol at a high level directly increase the risk of CVD. However, BMI, systolic blood pressure and RBC have a less significant effect on CVDs. In addition, an apparent CVD effect on gender is observed. It seems men suffer CVDs more than women. Shortness of breath slightly indicates a CVD problem. PMH is a strong indicator of CVD. Patients who have diabetes mellitus and hypertension can have a CVD. Even patients who have no abnormality detected may also have a CVD.
Smoking also affects CVD negatively. Physical activity seems to have a direct positive effect on CVD, which reduces the risk drastically. ECG seems to be an essential categorical variable that identifies the patients who suffer from CVD. Our study covers patients from different nationalities, indicating that Jordanians suffer the least CVDs and Yemenis suffers the most. Consequently, gender, nationality, PMH, BMI, smoking, lifestyle, average glucose, LDL/HD, family history, high pressure and stress increase CVD risk. As a result, the predicted outcomes obtained with such formulations are distinct and can be matched using the dual and triple effect of parameters on CVD using the RSM.

Sensitivity Analysis
Our findings of SF show, as shown in Figure 1a, that age increase of one year increases the CVD probability by 0.467% and 0.424% according to SVR and ANFIS approaches, respectively. Decreasing the BMI by 0.5 units decreases the CVD risk by about 0.152% and 0.132%; see Figure 1b. Nonetheless, increasing the glucose also increases CVD probability by about 6.183% and 6.763 according to SVR and ANFIS approaches, respectively; see Figure 1c. Similarly, the increase in cholesterol increases the CVD probability by 4.392% and 4.531% according to SVR and ANFIS approaches, respectively; see Figure 2a. In contrast, every 0.1 unit decrease in RBC decreases the CVD probability by about 4.562% and 4.623%; see Figure 2b. However, every 0.1 unit increase in LDL increases the CVD probability by about 4.353% and 3.214 according to the SVR and ANFIS approaches, respectively; see Figure 2c.

CVD Prediction
As a function of the regularization parameter λ, with a green circle and a dashed line, Figure 20a emphasizes the minimum-deviance location. The blue-circled point has the smallest variance plus one SD. We use the parameter α = 0.9 to encourage keeping groupings of strongly linked predictors rather than deleting all but one of them. The dotted line and green circle indicate the location of the least amount of the error employing a cross-validation method. Then, the location with the smallest error using a cross-validation method (plus one SD) is marked with a blue circle and a dotted line. The trace plot indicates non-zero model coefficients as a function of the regularization value. There are 17 curves in Figure 20b because there are 17 predictors in the linear model.
We indicate the point with a minimum error of a cross-validation method with the dotted line and blue circle (plus one SD) shown in Figure 20a. A trace plot with 17 curves is given in Figure 20b. As λ increases to the left, coefficients equal to zero are removed. We summarize the results in Table 11, which shows that the standard normal quartiles and the variables X 1 (gender), X 7 (activity) and X 17 (ECG), are the best choices with non-zero coefficients. According to the AELNR results, the factors affecting whether a person has cardiovascular disease or not were determined to be gender, activity and ECG. Therefore, using these coefficients for any statistical classification approach is recommended. The classification results obtained from the statistical methods are presented in Table 12.

Discussion
The Taylor diagram is a mathematical scheme designed to graphically indicate the representations of patterns to match the models' performance statistics simultaneously, that is, the correlation coefficient, SD and RMSE. These statistics can be plotted on a 2D graph to summarize the multiple aspects of models' performance related to one another; see Figure 21. The findings clearly show that the SVR provides the highest correlation coefficient and SD of 0.4370 from the observed data compared to the other ML models. Similarly, the ANFIS approach depicted correlation coefficient and SD outcomes of 0.9471 and 0.4951, respectively, which is the closest value to the observed data. ANN-BFG and ANN-CG followed these two approaches. The Taylor diagram established for different ML models presented in Figure 21 provides better compression of ML methods when depicting the accuracy and SDs. Therefore, the model with the lightest observation point shows the highest trend. We use the SD of the observed and predicted values of the models as a measure of variability. We measure the variability as the radial distance utilizing the origin of the plot. The SD in Figure 21 represents the modeling uncertainty for the ML approaches. A model providing the smallest SD difference from the observed value among others exhibits a better prediction and low tension, providing a robust approximation.
The overall predictive ability and limitations of ML-based algorithms in CVDs were summarized in [11]. A comprehensive search study was designed for the prediction of these diseases [59]. Hence, the limitations can be organized as follows: firstly, the data are arbitrary because there are no standard guidelines for utilization. Hospitals have different data repository systems. In addition, clinical data are heterogeneous and usually imbalanced. Secondly, technical parameter-related data are usually not disclosed to the public, leading to high statistical heterogeneity. Some parameters are measurable and some are categorical. Third, criteria selection methods and procedures are arbitrary and heterogeneous. Fourth, we could easily classify the ML algorithms based on their performance. Fifth, several studies have reported different evaluation matrices. Visits to the hospital during this investigation were restricted, especially since the time of compilation of the data coincided with the COVID-19 pandemic period. This is the reason for the relatively small number of samples used, which could be a limitation of this study.
We have held a sensitivity analysis for age, BMI, glucose, cholesterol, RBC and LDL. Though some ML algorithms and ANFIS approaches are robust, several studies have not reported a complete evaluation system of measurement. Then, some studies reported only the technical aspects without clinical aspects, likely due to a lack of clinician supervision. However, we determined our criteria and the data collection process under the supervision of an expert medical consultant. Table 13 presents the comparison of our findings with state-of-the-art methods. The ANN-LM algorithm showed the highest accuracy rate of 96.2%. The accuracy rate of the ANFIS approach is also high and is 94.7%. Table 13. Comparison of our findings with state-of-the-art methods.

Conclusions
CVDs correspond to the most common causes around the world related to mortality, affecting not only the heart and blood arteries but also heart failure, blood vessel diseases, stroke, arrhythmia, provoking a myocardial infarction. Determining the vital risk factors is crucial to intervening with the patient on time.
Relations between factors of CVDs are complex, ill-defined and nonlinear, justifying the use of artificial intelligence tools. These tools aid in predicting and classifying CVDs. In addition, mathematical/statistical models, such as those based on RSM, can solve complex problems in predicting CVDS when identifying 3D relations, which is not an easy task when making decisions.
ML can usually predict such complex problems and ANFIS approaches. Moreover, fuzzy logic and other intelligent models, such as ANNs or hybrid intelligent systems, can be used with linguistic statements to solve imprecise and uncertain information for predicting CVDs [57]. Moreover, fuzzy and/or neuro-fuzzy modeling approaches can tell us more about the dynamics of CVDs by a set of linguistic associations. In the present study, a comprehensive literature review was carried out using conventional ML and naive regression methods to detect well known risk factors of CVDs. To classify a patient as healthy or unhealthy, in our investigation, we used seventeen factors for predicting CVDs based on the M5Tree, SVR, MARS, feed-forward back-propagation, neural fitting, BR, SCG and ANFIS models.
We considered categorical and continuous variables, such as gender, age, nationality, PMH, BMI, smoking, lifestyle, F-glucose, cholesterol, average glucose, LDL/HDL, RBC, family history, blood pressure and stress levels. The Gifi system was used to convert the categorical data into continuous data. RSM was employed to determine the impacts of risk factors in 3D relations and very interesting conclusions were achieved.
MAE, RMSE and MBE were used to judge the performance of tools and approaches to simultaneously check the optimization of the outputs considering the desirability function. Moreover, the NSE coefficient was used to quantitatively describe and assess model prediction quality for the range between zero and one. In addition, a sensitivity analysis was performed to consider the marginal effects of factors such as age, BMI, glucose, cholesterol, RBC and LDL on CVDs, for two approaches. As detailed in the results and findings section, the age, BMI and glucose level were highly related to CVDs.
The ANFIS and SVR modeling approaches provided the highest prediction accuracy and tendency with the lowest error rates. The ANFIS prediction accuracy coefficient for the training process was 96.56%, followed by the SVR method with an NSE coefficient of 91.95%. The prediction accuracies of the other approaches were found as follows: ANN-CG 82.37% and ANN-BFG 80.79% for the training phase. The different ML approaches gave lower prediction accuracies for the NSE coefficient during the training. Furthermore, for the testing phase, the highest coefficient was found for the ANN-BR method at 62.15%, with the M5Tree, ANFIS and MARS models reaching values of 57.3%, 55.51% and 50.32%.
Based on linear discriminant analysis, the classification procedure obtained after the Gifi system application achieved a high classification performance, such as the successful ANN approaches. According to the results obtained from the prediction and classification using the linear discriminant analysis, gender, activity and ECG are the most significant variables that affect CVDs. As mentioned, our findings showed that employing a small number of clusters (demonstrated by the rules) resulted in obtaining so many rules. In contrast, large cluster numbers generally produce fewer rules. Both are undesirable and must be avoided as they cause huge deviations in the prediction performance of the ANFIS model for CVD cases. Then, additional membership functions do not increase the effectiveness of a fuzzy model [58].
This research covered a gap in CVDs as the researchers used numerous factors with traditional classification methods, which are imprecise indicators. For example, CVDs were classified [63] with traditional ML algorithms using 14 variables such as f-glucose, ECG and cholesterol from the healthcare dataset containing 76 attributes. The maximum accuracy rate of 91% was obtained by logistic regression. CVDs were estimated [64] with ML tools using 12 variables such as age, systolic and diastolic blood pressure. After making a variable selection using the correlation coefficient, the maximum accuracy rate they can reach with SVM was 78.84%. Using the UCI dataset containing 76 attributes, CVDs were classified [65] with many ML algorithms. After carrying out feature selection with the correlation map, the following accuracy rates were obtained for the traditional methods: 85% for LR, 81% for DT, 90% for SVM, 83% for RF, 90% for KNN and 84% for QDA. All these results showed that ANFIS combined with the Gifi data balance method has superior classification performance. Moreover, in terms of statistical methods, a high accuracy rate with only three variables after feature selection with AENLR is another outstanding result.
As the causes of CVDs are unknown, this uncertainty can be handled by employing fuzzy sets and systems to relate the factors and CVDs. The ML approaches help medical doctors enhance their diagnostic capability and accuracy, affecting patients' prediction, quality of healthcare and efficacious medication prescriptions. Moreover, using such techniques has significance for healthcare centers, decreasing the time for medical exams, minimizing expenses in the clinical practice and enhancing practitioners' efficiency [66].
Future research may improve statistical analysis, such as evaluating the computational complexity and ranking analysis of the models using statistical significance testing of post hoc methods. Moreover, applying other unsupervised ML methods, such as hierarchal clustering and anomaly detection with more data on nationalities, can significantly improve CVD prediction and classification. The methods derived in the present investigation are universal and based on artificial intelligence techniques. Therefore, they can be applied to practically all the areas where structured and unstructured data are available. Another future step might be developing an expert system to diagnose CVD patients.

Acknowledgments:
The authors would also like to thank the Editors and three reviewers for their constructive comments, which led to improvement in the presentation of this article.

Conflicts of Interest:
The authors declare no conflict of interest.

Nomenclature
The following abbreviations are used in this manuscript: