A Powerful Paradigm for Cardiovascular Risk Stratification Using Multiclass, Multi-Label, and Ensemble-Based Machine Learning Paradigms: A Narrative Review

Abstract Background and Motivation: Cardiovascular disease (CVD) causes the highest mortality globally. With escalating healthcare costs, early non-invasive CVD risk assessment is vital. Conventional methods have shown poor performance compared to more recent and fast-evolving Artificial Intelligence (AI) methods. The proposed study reviews the three most recent paradigms for CVD risk assessment, namely multiclass, multi-label, and ensemble-based methods in (i) office-based and (ii) stress-test laboratories. Methods: A total of 265 CVD-based studies were selected using the preferred reporting items for systematic reviews and meta-analyses (PRISMA) model. Due to its popularity and recent development, the study analyzed the above three paradigms using machine learning (ML) frameworks. We review comprehensively these three methods using attributes, such as architecture, applications, pro-and-cons, scientific validation, clinical evaluation, and AI risk-of-bias (RoB) in the CVD framework. These ML techniques were then extended under mobile and cloud-based infrastructure. Findings: Most popular biomarkers used were office-based, laboratory-based, image-based phenotypes, and medication usage. Surrogate carotid scanning for coronary artery risk prediction had shown promising results. Ground truth (GT) selection for AI-based training along with scientific and clinical validation is very important for CVD stratification to avoid RoB. It was observed that the most popular classification paradigm is multiclass followed by the ensemble, and multi-label. The use of deep learning techniques in CVD risk stratification is in a very early stage of development. Mobile and cloud-based AI technologies are more likely to be the future. Conclusions: AI-based methods for CVD risk assessment are most promising and successful. Choice of GT is most vital in AI-based models to prevent the RoB. The amalgamation of image-based strategies with conventional risk factors provides the highest stability when using the three CVD paradigms in non-cloud and cloud-based frameworks.

The PRISMA model was used for searching and selecting the final studies for the review. The search was done using Science Direct, Google Scholar, IEEE Xplore, and Pub-Med by adapting the following keywords "multiclass classification for CVD risk", "multilabel classification for CVD risk", "ensemble classification for CVD risk", "CVD risk using Machine Learning/Artificial Intelligence for multiclass", CVD risk using Machine Learning/Artificial Intelligence for multi-label, "CVD risk using Machine Learning/Artificial Intelligence for ensemble", "CVD risk assessment in ML/AI framework", and "Bias in ML/AI". The total number of ML/AI-based CVD studies is shown in Figure 1. An exhaustive search resulted in a total of 19,454 studies. The three criteria used for exclusion were (a) non-relevant studies (b) articles removed after search and screening of the studies (c) records rejected due to insufficient data. The implementation of exclusion criteria provides 19,084, 88, and 17 studies for exclusion showed by E1, E2, and E3 ( Figure 1). The important scientific knowledge from these final studies was gained and the statistical classification was drawn. Further, a comprehensive analysis of the studies was done between the three techniques with the determination of AI bias.

Statistical Distribution
The statistical distributions derived from the selected studies are shown in Figure 2.
The following attributes were used for the statistical distribution (a) types of CVD paradigms, (b) types of risk classes in multiclass CVD (c) ML-based CVD systems without/with feature extraction, (d) # GTs in multi-label-based CVD, (e) feature selection techniques, and (f) ML-based CVD publications.

Biological Link between Atherosclerosis and Cardiovascular Disease
The fundamental cause of CVD is the disease of atherosclerosis [124]. The process of plaque formation is known as atherogenesis as shown in Figure 3a(A-I) [125]. It is a process when the plaques develop in the arteries where there is low endothelial shear stress [126]. The shear stress depends on the flow velocity characteristics like type of flow, direction, and velocity. Leukocytes attack the epithelium in this region (Figure 3(bA)) [126]. Mainly there is the migration of monocytes into the sub-epithelial layer where it is oxidized by the low amount of low-density lipoprotein (LDL) cholesterol and turns into macr0ophage ( Figure 3(bB)) [127,128]. Eventually, these macrophages become large foam cells with oxidized LDL cholesterol leading to the formation of necrotic core (Figure 3(bC)). Microscopic calcium granules expand in the necrotic cells and forms lumps of calcium deposits. This necrotic core is separated from the blood vessel by a fibrous cap [129]. The blood remains uninterrupted when the plaque is small as the arteries do remodeling by themselves [130]. However, when the plaques increase, the lipid-core volume decreases leading to structural stabilization of plaque ( Figure 3a) [131]. number of gold standards are important during the multi-label paradigm, the pie-chart shows the statistical distribution of the different studies using the number of gold standards. The number of studies (given in curly braces) that used the following feature selection techniques were 2D convolutional neural network (CNN) (6) [71,79,81,89,101,111], continuous wavelet transform (1) [72], principal component analysis (PCA) (9) [76,79,84,98,102,112,114,119,121], Mel frequency cepstral coefficient (1) [77], amplitude magnitude (1) [78], gain ratio (1) [80], Matlab (1) [86], association technique (2) [87], SHAP (1) [90], extreme gradient boost (XGBoost) (1), genetic algorithm (5) [91,103,104,122,123], Tunicate Swarm (1) [116], chi-Square (2) [117], least absolute shrinkage and selection operation (LASSO) (1) [99] (Figure 2e). The increasing trend of CVD publications from the year 2009 to 2021 is shown in Figure 2f.

Biological Link between Atherosclerosis and Cardiovascular Disease
The fundamental cause of CVD is the disease of atherosclerosis [124]. The process of plaque formation is known as atherogenesis as shown in Figure 3a(A-I) [125]. It is a process when the plaques develop in the arteries where there is low endothelial shear stress [126]. The shear stress depends on the flow velocity characteristics like type of flow, direction, and velocity. Leukocytes attack the epithelium in this region (Figure 3(bA)) [126]. Mainly there is the migration of monocytes into the sub-epithelial layer where it is oxidized by the low amount of low-density lipoprotein (LDL) cholesterol and turns into macr0ophage (Figure 3(bB)) [127,128]. Eventually, these macrophages become large foam cells with oxidized LDL cholesterol leading to the formation of necrotic core ( Figure  3(bC)). Microscopic calcium granules expand in the necrotic cells and forms lumps of calcium deposits. This necrotic core is separated from the blood vessel by a fibrous cap [129]. The blood remains uninterrupted when the plaque is small as the arteries do remodeling by themselves [130]. However, when the plaques increase, the lipid-core volume decreases leading to structural stabilization of plaque ( Figure 3a) [131].  Progressive deposition of lipids results in the thinning of the fibrous cap leading to rupturing the plaque [132]. The ruptures of the cup result in healing by the platelets in the bloodstream, which leads to the formation of the clot of blood or thrombus which yields blocking of artery than atrial stiffness [133]. Due to this, the tissues become deprived of blood supply, causing cell death. If the coronary artery gets blocked, causing a myocardial infarction or CVD (Figure 3(bD)) [3,7].

Three Paradigms for Cardiovascular Risk Stratification
The core aim of this review is to understand the three kinds of paradigms for CVD risk stratification. This allows understanding the (a) types of gold standards used for different kinds of applications, (b) types of fundamental architectures used, and (c) finally the comparison between the three different types of paradigms.

Multiclass-Based Cardiovascular Disease Risk Stratification System
The most fundamental type of CVD risk stratification is the multiclass framework [134]. There are three main characteristics in multiclass framework, namely (i) it divides the outcome into two or more granular risk classes, (ii) the drug prescription is better controlled for CVD treatments based on which class the disease stage or risk lies, and (iii) the risk of CVD is better understood when divided into several stages such as low, mild, low-ofmoderate, high-of-moderate, low-of-high, and high-of-high.

Comparison between CVD Application and Non-CVD Application
The comparison between CVD and non-CVD applications [136] is shown in Table 2. Seven attributes were used for this comparison. The image modalities used in the CVDbased system were US, CT, MRI, and ECG (Table 2: row 4, CVD column). The architectures applied were ML and DL. DL provided better results due to its unique automated feature selection process (Table 2: row 6, CVD column). The defined number of classes was in the range of 3-9 (Table 2: row 5, CVD column) [69][70][71][72][73][74][75][76][77][78][79][80][81][82]. The multiclass approach for classification has been applied to non-CVD applications such as Alzheimer's prediction or different cancer types. The interpretation of multiclass in the non-CVD system can be thought of as different stages of the diseases, for example, in the case of Alzheimer's disease (AD), it can be categorized as different stages of memory loss with age. Similarly, in the case of cancer, it can be different stages or grades of cancer. Our observations show that the gold standard types in the non-CVD system are very different from the CVD systems. For example, for the early detection of AD/Mild Cognitive Impairment (MCI), the classification is done between (1) AD vs. normal control (NC), (2) MCI vs. NC, (3) AD vs. MCI, and (4) progressive MCI (PMCI) vs. Significant Memory Concern (SMCI) for Alzheimer's. In the case of breast cancer, GTs can be proliferation and non-proliferation cancer types.

Multiclass CVD Architecture for Office-Based CVD Risk Stratification
The architectures opted for multiclass prediction of CVD risk has very basic components (a) data collection (b) training system, and (c) testing system. The training system is basically used for training the ML system based on different covariates (or risk factors) [143,144], with the support of different ground truths while using the training-based classifiers. The system can be trained to identify the granular risk classes from no, low, and medium, to high class. Feature selection is also performed during the training of the system [145,146]. For prediction, the training model is applied to transform the testing features either in Seen AI framework or the Unseen AI framework [147]. Two types of architectures were described in this section in terms of the above-mentioned factors. A typical online system for multiclass CVD risk stratification is shown in Appendix A, Appendix A.1. BHI [139], OBBM [137,138,141,142], LBBM [137,138,141,142] 3 Disease Type CVD [69][70][71][72][73][74][75][76][77][78][79][80][81][82] Diabetes [142], Cancer (Breast, Lung, Brain) [138,139], Alzheimer's [138,141]  A generalized ML system is applied to office-based CVD or stress-test-based CVD systems as shown in Figure 4. Considering the office-based CVD system, the covariates were collected from OBBM, LBBM, CUSIP, and MedUSE [76], while for the CVD-based stress-test system, EEG was the input. The rest of the configuration remains the same which consists of four parts: Part A is the preprocessing of the input data (covariates) and augmentation for balancing the classes. Part B consists of a training system, Part C consists of a prediction system, and Part D consists of a performance evaluation system (Appendix E). In Part A, the objective is to balance the classes if there is a multiclass scenario, Part B consists of two subparts: (i) selection of the best feature given the set of covariates and (ii) model generation using (a) classifier, (b) selected features, and the (c) gold standard. Part C consists of the application of the trained model on the selected set of best features from the test data set by transforming the test features to compute the predicted label. Part D is used for performance evaluation of the ML system where the predicted labels are compared against the gold standard labels. Note that during the training system, the two ingredients are the classifier bank and the gold standard used. The classifier bank, for example, can be classifiers like SVM, XGBoost, KNN, NB, etc., while the gold standard is the coronary artery disease syndrome, such as coronary artery disease stages that include the four types of risk stages. Note that since the system is a K-fold (either of the K types such as K2, K3, K4, K5, and K10 can be used), every patient gets to be in the test pool, and then at the end of all the folds, the complete set can be used for performance evaluation. Further to note a classifier bank can be used during the design of the training model, that uses the gold standard (such as coronary risk scores derived from coronary angiography) and training covariates. The CVD example in Figure 4 uses four sets of covariates, which can be flipped to ECG signals [148][149][150] when using the stress test-based system for CVD risk assessment. The longitudinal ultrasound model is used typically for the collection of the CUSIP risk factors such as cIMT (max., min., and ave.), intima-media thickness variability (cIMTV), maximum plaque height (MPH), and total plaque area (TPA). Diagnostics 2022, 12, x FOR PEER REVIEW 10 of 48

Multiclass CVD Architecture for Cardiac Stress Laboratories
Another set of architecture for multiclass CVD risk prediction was used by Hussein et al. [75] ( Figure 5). The ECG signals [151][152][153] are obtained from the stress test laboratory for the analysis of CVD risk. The model uses the multiclass SVM classifier that takes the ECG signals as risk factors or covariates. And the ground truth used for the training system is myocardial infarction (MI). The multiclass outcomes that were identified were normal, low MI, and high MI. The feature of ST (it is the interval between ventricular depolarization and repolarization, and PR (the flat line that runs from the end of the P-wave

Multiclass CVD Architecture for Cardiac Stress Laboratories
Another set of architecture for multiclass CVD risk prediction was used by Hussein et al. [75] ( Figure 5). The ECG signals [151][152][153] are obtained from the stress test laboratory for the analysis of CVD risk. The model uses the multiclass SVM classifier that takes the ECG signals as risk factors or covariates. And the ground truth used for the training system is myocardial infarction (MI). The multiclass outcomes that were identified were normal, low MI, and high MI. The feature of ST (it is the interval between ventricular depolarization and repolarization, and PR (the flat line that runs from the end of the P-wave till the start of the QRS complex) were extracted from the time-frequency (TF) power spectrum. The created training model was the input to the prediction systems along with the test data and the final classifications were made into the normal, low MI, and high MI. till the start of the QRS complex) were extracted from the time-frequency (TF) power spectrum. The created training model was the input to the prediction systems along with the test data and the final classifications were made into the normal, low MI, and high MI. The general algorithm for multiclass CVD risk stratification is explained in form of pseudo-code. A detailed explanation is provided in Appendix A, Appendix A.2.

Multi-Label-Based Cardiovascular Disease Classification
The second technique used for CVD risk stratification is multi-label-based [154][155][156]. The ground truth is very important for the proper classification of CVD risk [157][158][159]. CVD risk prediction systems were said to be multi-label-based depending on the number of ground truth (GT) used in the system [160][161][162]. The paradigm was considered as a multi-label-based classification if more than one number of GT is used for CVD risk detection [90,[163][164][165][166][167]. The GTs, risk factors, and the architecture used were discussed in the next sub-sections. The pseudo-code that represents a multi-label-based risk stratification process can be referred to in Appendix B.
The risk factors used were OBBM, LBBM, CUSIP, MRI, and CT image phenotypes (input covariates column, Table 3). The algorithms used for the multi-label classifications were namely binary recursive (BR), label powerset (LP), multi-label adaptive resonance associative map (MLARAM), random k-labelset (RakEL), classifier chain (CC), multi-label k-nearest neighbor (MLkNN), seismocardiography (SCG-Z), gyrocardiography (GCG-Z), principal component analysis (PCA), DCT, consensus-based risk model. Other characteristics of this classification technique were described in Table 3. The general algorithm for multiclass CVD risk stratification is explained in form of pseudo-code. A detailed explanation is provided in Appendix A, Appendix A.2.

Multi-Label-Based Cardiovascular Disease Classification
The second technique used for CVD risk stratification is multi-label-based [154][155][156]. The ground truth is very important for the proper classification of CVD risk [157][158][159]. CVD risk prediction systems were said to be multi-label-based depending on the number of ground truth (GT) used in the system [160][161][162]. The paradigm was considered as a multi-label-based classification if more than one number of GT is used for CVD risk detection [90,[163][164][165][166][167]. The GTs, risk factors, and the architecture used were discussed in the next sub-sections. The pseudo-code that represents a multi-label-based risk stratification process can be referred to in Appendix B.

Multi-Label-Based Architectures for CVD Risk Stratification
The architecture design for the multi-label plays an important in the outcome results of the system. The basic component of the architecture for the CVD prediction system is training and testing. The proper choice of GT leads to non-biased results in the risk prediction of CVD. The architecture system used by Jamthikar et al. [84] is shown in Figure 6 below. The total number of ground truths considered for this system were three, namely (a) coronary artery disease, (b) acute coronary syndrome, and (c) a composite CVE, and the covariates used were OBBM, LBBM, and the CUSIP phenotype. Six types of classification techniques used include (i) four problem transformation methods (PTM) and (ii) two algorithm adaptation methods (AAM) are used for multi-label CVE prediction. The four PTM techniques were binary relevance (BR), label powerset (LP), classifier chain (CC), and random k-labelset (RAkEL). Under AAM-based, two techniques, namely multi-label k-nearest neighbor (MLkNN), and multi-label adaptive resonance associative map (MLARAM) were used. The details can be seen in Appendix B. Evaluation was performed by calculating the accuracy, sensitivity, specificity, F1-score, and AUC for all the classification techniques. The BR classification was found to be the best performer with the values for accuracy, sensitivity, specificity, F1-score, and AUC as 81.2%, 76.5%, 83.8%, 75.37, and 0.89 (p < 0.0001), respectively.
Another architecture [86] used for multi-label CVD classification is described in Figure 7. The mechanocardiography (MCG) data were used by the system. Four kinds of ground truth were used, namely AFib, non-AFib, ADHF, and non-ADHF. The covariates were gender, age, height, weight, BMI, given for the training and testing system. The ML classification algorithm used were random forest (RF), Xtreem Gradient Boost (XGB), and logistic regression (LR). RF gave the best performance among all the three ML classifiers. The system was validated by nested cross-validation. In this system, feature extraction was also performed using a feature vector. The hierarchal classification was also adapted in this system. Another paradigm that can use multiple classifiers at the same time is under the ensemble framework as presented in the next section. Diagnostics 2022, 12, x FOR PEER REVIEW 13 of 48

Ensemble-Based Cardiovascular Disease Classification
The ensemble-based technique was the third type of technique considered for CVD risk classification [169][170][171]. This classification was characterized by the fusion of different types of ML or DL classifiers (Table 4). It can be used with multiclass and multi-label classification [172][173][174]. Figure 8 shows the concept of the ensemble paradigm. There are two sets of strategies, namely homogeneous ensemble and heterogeneous ensemble (see the separation shown by dotted line). In homogenous ensemble, the conventional classifier techniques are combined using homogeneous ensemble algorithm to yield homogeneous ensemble classifier, which when trained using classifier A while using the gold standard. This homogeneous system yields the trained model A. The same protocol can be adapted for the heterogeneous ensemble paradigm yielding the trained model B. These trained models can be used by the prediction system on the test feature to produce prediction labels. Finally, the performance can be evaluated by comparing predicted labels to goldstandard labels yielding performance parameters. The key benefit of using an ensemble classifier is its superior performance compared to either multiclass or multi-label strategies. The pseudo-code that represents the ensemble-based risk stratification process can be seen in Appendix C. The ensemble technique can be applied to the CVD field, as well as to other fields, such as education, Alzheimer's, etc.

Ensemble-Based Cardiovascular Disease Classification
The ensemble-based technique was the third type of technique considered for CVD risk classification [169][170][171]. This classification was characterized by the fusion of different types of ML or DL classifiers (Table 4). It can be used with multiclass and multi-label classification [172][173][174]. Figure 8 shows the concept of the ensemble paradigm. There are two sets of strategies, namely homogeneous ensemble and heterogeneous ensemble (see the separation shown by dotted line). In homogenous ensemble, the conventional classifier techniques are combined using homogeneous ensemble algorithm to yield homogeneous ensemble classifier, which when trained using classifier A while using the gold standard. This homogeneous system yields the trained model A. The same protocol can be adapted for the heterogeneous ensemble paradigm yielding the trained model B. These trained models can be used by the prediction system on the test feature to produce prediction labels. Finally, the performance can be evaluated by comparing predicted labels to goldstandard labels yielding performance parameters. The key benefit of using an ensemble classifier is its superior performance compared to either multiclass or multi-label strategies. The pseudo-code that represents the ensemble-based risk stratification process can be seen in Appendix C. The ensemble technique can be applied to the CVD field, as well as to other fields, such as education, Alzheimer's, etc.

Ensemble-Based Cardiovascular Disease Classification
The ensemble-based technique was the third type of technique considered for CVD risk classification [169][170][171]. This classification was characterized by the fusion of different types of ML or DL classifiers (Table 4). It can be used with multiclass and multi-label classification [172][173][174]. Figure 8 shows the concept of the ensemble paradigm. There are two sets of strategies, namely homogeneous ensemble and heterogeneous ensemble (see the separation shown by dotted line). In homogenous ensemble, the conventional classifier techniques are combined using homogeneous ensemble algorithm to yield homogeneous ensemble classifier, which when trained using classifier A while using the gold standard. This homogeneous system yields the trained model A. The same protocol can be adapted for the heterogeneous ensemble paradigm yielding the trained model B. These trained models can be used by the prediction system on the test feature to produce prediction labels. Finally, the performance can be evaluated by comparing predicted labels to gold-standard labels yielding performance parameters. The key benefit of using an ensemble classifier is its superior performance compared to either multiclass or multi-label strategies. The pseudo-code that represents the ensemble-based risk stratification process can be seen in Appendix C. The ensemble technique can be applied to the CVD field, as well as to other fields, such as education, Alzheimer's, etc.

Comparison between the Three Types of CVD Risk Assessment Systems
All the architecture can be combined to achieve the functionality of all the three models, namely multiclass, multi-label [13], and ensemble. Both multiclass, multi-label modalities can be combined with the ensemble to acquire a better accuracy in the prediction of CVD risk. The comparison between the three has been shown in Appendix D, Table A1.

Performance Evaluation Metrics for Multiclass, Multi-Label, and Ensemble Techniques
Performance evaluation (PE) strategies are very vital for understanding the reliability of the ML-based CVD risk stratification systems. The main metrics used by the PE systems are sensitivity, specificity, accuracy, precision, F1-score, positive predictive value (PPV), negative predictive value (NPV), false-positive rate (FPR), false-negative rate (FNR), p-value, hamming loss, C-index in multiclass, multi-label, and ensemble-based CVD risk assessment systems. The formulae used for determining these parameters are described in Appendix E. These different PE strategies were analyzed in different techniques. It was found that PE for multi-label-based CVD is different as compared to multiclass and ensemble. There are two types of PE techniques for multi-label, namely label-based and instance-based PE. The label-based is done using micro and macro-averaging techniques. Details of these techniques can be seen in Appendix E. Figure 9 (top) shows the label-based and instance-based performance evaluation. The number of studies that used this PE parameter is the accuracy (46) followed by sensitivity (32), precision (27), F1-score (27), specificity (26), p-value (10), PPV (8), NPV (6), FPR (6), FNR (5), c-index (4), Hamming Loss (1). Hamming Loss has opted only for the ensemble-based CVD risk stratification [181][182][183][184]. The PE metrics used in the stress test-based (ECG) [185][186][187] techniques are area-under-the-curve (AUC), sensitivity, specificity, PPV, and NPV [188][189][190][191][192].
As seen from the above discussion, the most important characteristic of the multiclass paradigm is the selection of gold standards having greater than two classes. The highest flexibility in the multiclass framework is the amalgamation of different sources of covariates, namely OBBM, LBBM, CUSIP, and MedUSE. We could take characteristics of plaque in the carotid ultrasound such as information about plaque symptomatology. The same principle holds in the stress test-based CVD paradigm or non-CVD framework. The ML systems sometimes overestimate the accuracies in prediction and underestimate the scientific validation, which results in bias in the prediction systems that we discuss in Section 5.
stance-based PE. The label-based is done using micro and macro-averaging techniques. Details of these techniques can be seen in Appendix E. Figure 9 (top) shows the labelbased and instance-based performance evaluation. The number of studies that used this PE parameter is the accuracy (46) followed by sensitivity (32), precision (27), F1-score (27), specificity (26), p-value (10), PPV (8), NPV (6), FPR (6), FNR (5), c-index (4), Hamming Loss (1). Hamming Loss has opted only for the ensemble-based CVD risk stratification [181][182][183][184]. The PE metrics used in the stress test-based (ECG) [185][186][187] techniques are area-under-the-curve (AUC), sensitivity, specificity, PPV, and NPV [188][189][190][191][192]. As seen from the above discussion, the most important characteristic of the multiclass paradigm is the selection of gold standards having greater than two classes. The highest flexibility in the multiclass framework is the amalgamation of different sources of covariates, namely OBBM, LBBM, CUSIP, and MedUSE. We could take characteristics of plaque in the carotid ultrasound such as information about plaque symptomatology. The same principle holds in the stress test-based CVD paradigm or non-CVD framework. The ML systems sometimes overestimate the accuracies in prediction and underestimate the scientific validation, which results in bias in the prediction systems that we discuss in Section 5.

Bias Distribution in the ML System for Multiclass, Multi-Label, and Ensemble
The ML-based systems for CVD risk classification generate a bias due to various reasons [193][194][195]. Thus, it is important to understand the risk of bias (RoB) in these MLbased systems. As the ML systems were clustered in three different clusters, namely multiclass, multi-label, and ensemble, the bias nature was compared in three independent categories, and finally by considering all the three mixed together. For the RoB in the ML-

Bias Distribution in the ML System for Multiclass, Multi-Label, and Ensemble
The ML-based systems for CVD risk classification generate a bias due to various reasons [193][194][195]. Thus, it is important to understand the risk of bias (RoB) in these ML-based systems. As the ML systems were clustered in three different clusters, namely multiclass, multi-label, and ensemble, the bias nature was compared in three independent categories, and finally by considering all the three mixed together. For the RoB in the ML-based systems, the ML systems were ranked on the basis of the average mean score along with cumulative mean values ( Table 5). The mean and the cumulative score were generated by scoring the ML attributes for each study. There were 52 ML studies (14 in multiclass, 8 in multi-label, 30 in ensemble cluster) with 41 attributes each. The score was given to each AI attribute using a grading scheme [196]. In this grading scheme, a high-score was assigned to the AI attribute, if the AI attribute was adopted (used) in a particular study (publication). The score is between 0 and 5. For example, a high-score was given if the attribute "data size" had a value higher than 1000 patients, else a low-score was assigned. Similarly, as another example, a high-score of 5 was given to the attribute "feature extraction", if it was implemented in a study, else a score of 0 was assigned, if not implemented. Later the ML-based studies were clustered into low-bias, moderate-bias, and high-bias groups. The distributions were done on the basis of the two cut-offs values. The low-moderate (LM) and moderate-high (MH) cutoff values for each cluster of ML studies were determined based on the mean values along with the cumulative-mean values.
The cutoffs values obtained for the multiclass cluster are 1.8 and 1.35 for LM and MH respectively (Figure 10a). The studies belonging in the low-bias, the moderate-bias, and the high-bias bins are 4, 5, and 5, respectively. Similarly, the cutoffs for the multi-label cluster are LM: 1.9 and MH: 1.4 (Figure 10b). Multi-label-based CVD ML studies in low-bias group are 3, moderate-bias group are 3 and high-bias group is 2. The values of LM cutoff for the ensemble cluster are 1.8 and HM cutoff value is 1.6. The studies in low-bias bin are 8, in moderate-bias are 16 and high-bias bin is 6 respectively for ensemble-based ML studies ( Figure 10c). Alternatively, as all the studies are based on CVD risk prediction, the LM and MH cutoffs were determined by combining all the 52 studies. The LM, HM cutoff for the combined approach is 1.9 and 1.7 respectively (Figure 10d). Thus, we see that the ensemblebased ML CVD risk estimation systems are low-biased among all the selected studies followed by multiclass-based (moderate-biased) while the multi-label-based was found to be low-biased. The AI-based CVD risk stratification systems can be further improved by incorporating the mobile, cloud, and e-health infrastructure as discussed in the next Section 6.
Scientific validation (Column C12) was also performed for a high number of mobile and cloud-based CVD studies. Only one cloud-based CVD risk prediction system has been FDA approved (Column C6) [208]. All the characteristics are described in detail in Table A4. It can be noticed that the AI-based systems have gained the advantage of more accuracy, reliability with the addition of mobile and cloud-based infrastructure. It is also helpful in remote prediction, which is very much important in the COVID-19 framework.
Scientific validation (Column C12) was also performed for a high number of mobile and cloud-based CVD studies. Only one cloud-based CVD risk prediction system has been FDA approved (Column C6) [208]. All the characteristics are described in detail in Table A4. It can be noticed that the AI-based systems have gained the advantage of more accuracy, reliability with the addition of mobile and cloud-based infrastructure. It is also helpful in remote prediction, which is very much important in the COVID-19 framework. As the CVD prediction systems have evolved in the COVID-19 times, we, therefore, discuss this in the upcoming section.

Principal Findings
The main scope of this review was to compare comprehensively the three kinds of machine learning (ML) techniques mainly multiclass, multi-label, and ensemble in office-based settings. Further, the scope of the study had a limited discussion on (a) CVD risk prediction using ECG signals-based settings and (b) deep learning (DL) techniques for CVD risk prediction. Therefore, the main or principal findings from this review were (i) three types of CVD risk stratification techniques, namely (a) multiclass (b) multi-label, and (c) ensemble; (ii) types of covariates used where OBBM, LBBM, MedUSE, and CUSIP. The OBBM, LBBM, MedUSE were used widely when compared to image-based phenotypes (CUSIP), which is now evolving more rapidly since is a surrogate marker for coronary artery disease; (iii) ground truth is a very vital factor so as to avoid the risk of bias (RoB) during the ML-based CVD risk prediction; (iv) popularity of the classification techniques used in the field of CVD were in the order as multiclass-based, ensemble-based, multi-label-based; (v) clinical and scientific validation is another set of AI attributes that must be accompanied in any ML-based CVD risk prediction systems to prevent the AI bias from in such systems; (vi) the performance evaluation metrics used for the three techniques were analyzed. It was found that the most commonly used PE parameter was accuracy. The cloud-based AI techniques comprising all the three classifications techniques are more likely to be the future for CVD risk prediction. In the future, advanced computer-aided diagnosis techniques can be applied based on image processing [210]. Edge devices with mobile and cloud-based AI infrastructure are now highly emerging in the medical industry as it provides remote facility and is a much faster, the most necessary feature in the COVID-19 era. Table   Table 6 shows the benchmarking table with a comparison between eighteen review studies that focused on multiclass, multi-label, and ensemble techniques for CVD risk prediction. This table shows thirteen attributes (column C1 to column C13) for each of the eighteen studies [35,[211][212][213][214][215][216][217][218][219][220][221][222][223][224][225][226] corresponding to the rows R1 to R18. These thirteen attributes presented were the Author (C1), year of the study (C2), name of the journal (C3), data size (C4), the study belongs to CVD or not (C5), the domain of the study (C6), machine learning (C7), classifier type (C8), cross-validation protocol (C9), the studies are multiclass (C10), multi-label study (C11), ensemble study (C12), and finally the summary of the study (C13). The data size for each study is shown in column C4, which is ranging from 8 to 86,155, whereas our study (row R18) has used 94 studies. Column C5 describes whether the study is of CVD type or not. Studies (rows R2, R3, R5, R9, R10, R11, R12, R16, and R17) along with our study (row R18) are in the field of CVD while the rest are not. Column C6 describes the different domains for the studies (rows R1, R4, R6, R7, R8, R13, R14, and R15) which does not belong to CVD. The domains are EEG, blood pressure, education, statistics, software, chronic fatigue, and sickle cells. The technical approach of the studies is shown in column C7, i.e., whether machine learning (ML) or not. Most of the studies including our proposed study are ML (rows R1, R3, R4, R6, R7, R8, R9, R10,  R11, R12, R13, R14, R15, R16, R17, and R18). Column  (rows R1, R3, R6, R7, R9, R11, R12, R17) shown in column C10 along with our study (row R18). Column C11 shows multi-label studies (rows R8, R13, R14, R15, and R18) likewise column C12 shows the ensemble studies (rows R4, R6, R10, and R18). The last column C13 describes the keyword objectives of each study. The studies' objectives were classification and CVD risk prediction or stratifications.

A Special Note on Non-Linear CVD Risk Stratification
The conventional classification CVD risk assessment systems assume the linear relationship between the covariates and the gold standard. The linear systems typically use the covariates like OBBM and LBBM or ECG signals [228][229][230]. With the additions of CUSIP and MedUSE, the requirement becomes more stringent on CVD calculators. In today's times, it was observed that COVID-19 can play the role of a new covariate or risk factor due to its relationship with CVD [231,232]. The risk of CVD gets accelerated in the individual with COVID-19 [233,234]. This inclusion can result in a more non-linear classification paradigm for CVD risk prediction [235]. This can improve the reliability and the accuracy of the prediction results [236]. The AI/ML approaches help in understanding the non-linear relationship between the covariates and the ground truth. Hence there is a need for the development of non-linear classifiers in the ML/DL domain. It includes non-linear SVM classifiers [237], PCA, XGBoost [235], RF [233], generalized discriminant analysis (GDA), ELM, LDA [238]. Different non-linear methods which are applied in the CVD field are Poincare plot (PP), approximate entropy (ApEn) [235], quasi period density-prototype distance (QPD-PD) [239], fuzzy entropy [238], recurrence period density prototype distance (RPD-PD) [237], non-linear ensemble classifiers [233]. These are all out of the scope of the current study. The other application of non-linear classifiers are in the field of stroke [240] and sleep apnea [241]. The non-linearity can also be handled by using the DL approaches along with multiclass, multi-label, and ensemble-based techniques for CVD risk prediction in the future.

A Special Note on Time-to-Event for Cardiovascular Risk Prediction
This is one of the greatest assets of the machine learning system. The most important ingredient for accomplishing this solution is to ensure that we have a follow-up gold standard for the clinical data. This means one must have the gold standard (events) for the times such as 1st-year, 3rd-year, 5th-year, and 10th-year. Further, the risk factors (so-called covariates or variables) must be available for the development of the training model. Given the two pairs (covariates and the gold standard-even for that time), one can develop the machine learning model for that time-zone (1st-year, 3rd-year, 5th-year, and 10th-year). Should you intended to predict for 1st-year, 3rd-year, 5th-year, and 10th-years, it requires four kinds of machine learning models. Each time-event has to have its own machine learning model. The atherosclerosis disease which has transformed over different years and leads to the event needs to be used for the development of the training model. The only challenge with this setup is the length of time it takes to collect the event data. It is both expensive and tedious since we have to follow the patients over the 10-year period. Recently, Kakadiaris et al. [62] perused this strategy using the machine learning paradigm. The ML paradigm has the same fundamental concept of training and testing as shown in Figure 4. The left half is the training model where the gold standard will change as per the time-zone (1st-year, 3rd-year, 5th-year, and 10th-years), while the prediction will be applied for the patient for the corresponding time-zones (1st-year, 3rd-year, 5th-year, and 10th-years). It is painful to wait to accomplish this validation, since it is costly, and a large cohort is needed.
To overcome such a scenario, another way to predict the CVD risk is using the surrogate marker of carotid artery disease. Since the formation of the atherosclerotic disease in coronary artery has the same genetic make-up as the carotid artery disease, the surrogate artery can be used for the prediction of CVD or the coronary artery disease risk. Further, note that over time (1st-year, 3rd-year, 5th-year, and 10th-years), the plaque formation changes and so does the image phenotypes such as intima-media thickness, plaque burden, or plaque area/volume. Thus, one can compute the time-dependent image phenotypes which uses the ingredients which make the atherosclerotic disease. This includes rate of change of cIMT over time (age), obesity index over time (age), cholesterol change over (age), one can use this paradigm to predict the plaque burden in carotid artery-based age. This is sometimes called as vascular age of the patient. This has been shown by Khanna et al. [34]. Later, this was commercialized as AtheroEdge™ 2.0 (AtheroPoint™, Roseville, CA, USA) [36]. The CVD risk can be computed based on the intensity of the risk factors. This is called a non-ML method (also known as the statistical solution for the prediction of the 10th-year CVD risk.

A Special Note on the Advantages of Machine Learning-Based Cardiovascular Risk Stratification
Machine learning paradigm for CVD risk prediction has provided us with a way to obtain more accurate, early, and fast results. The ML systems offer following advantages against the previously published studies: (i) it handles the non-linear nature between the covariates and ground truths (GT) [31]; (ii) ability to predict the CVD risk in granular classes, such as six different risk classes (no-risk, low-risk, mild-risk, moderate-risk, high-risk, and very-high-risk) [34,35]; (iii) ability to augment the training data using popular augmentation paradigms such as adaptive synthetic (ADASYN) and synthetic minority over-sampling technique SMOTE [227]; (iv) incorporate the cohort's knowledge during training and predicting the CVD risk; (v) flexibility of amalgamating of different types of covariates such as OBBM, LBBM, CUSIP, and MedUSE during the design of the model training; (vi) ability to interface with different types of classification techniques like multiclass, multi-label and ensemble for improving the overall performance of the system; and (vii) ability to enhance the risk factor (or covariates) such as genetic and comorbidities such as cancer. Thus, all the above-mentioned factors puts ML-based system a very strong paradigm for CVD risk stratification, unlike the conventional statistical models.

A Special Note on Deep Learning-Based Cardiovascular Risk Stratification
The Deep learning (DL) paradigm has started to emerge in the field of CVD risk prediction. The DL approach can be applied for both (a) the office-based [242,243] and (b) stress-based test settings [244][245][246][247][248]. DL approaches have been applied for CVD risk stratification using multiclass [249], multi-label [250], and ensemble-based paradigms [116]. Even though there are evolving CVD risk stratification techniques in the DL framework, this review does not venture deep since it is not the main focus of this review. As a result, we have not analyzed publications related to the DL paradigm. Note that, the main advantage of DL techniques is (i) automated feature selection process from the input covariates (such as OBBM, LBBM, CUSIP, and EGC signals phenotype) and (ii) prediction of more accurate and reliable results due to a large number of layers in DL network. Advanced stochastic imaging methods can be applied [251] to improve the loss function during the training paradigm. This evolving DL paradigm will flourish more in the very near future in office-based imaging and stress-based test settings.

The Future of Cardiovascular Disease Risk Stratification
The CVD risk estimation at an early stage is very much important to reduce the mortality rate due to CVD [252,253]. As it was observed that not only ML but extreme machine learning (ELM) can also be applied and further developed for CVD risk stratification [254]. Moreover, COVID-19 accelerates the atherosclerosis condition due to which fast detection of CVD in COVID-19 patients is needed [255,256]. The above circumstances are leading to an evaluation in the CVD risk stratification techniques. In the near future, cloud-based AI modalities will be very much in use for CVD risk detection. It also promotes the remote and fast prediction of the risk of CVD. It also helps in reducing prediction errors. Other non-invasive imaging techniques like carotid, femoral, arterial imaging can be used as an indirect measure of plaque build-up in these arteries. Deep learning technologies will evolve in the field of CVD risk estimation [257]. This will also include pruning of weights using evolutionary techniques such as genetic algorithms in the Deep Learning framework [147]. Devices equipped with cutting edge technologies like mobile-based AI, cloud-based AI, multiclass, multi-label, and ensemble-based systems for CVD risk prediction will be emerging in the medical imaging industry market.

Conclusions
This was the first review study of its kind that presented three different kinds of AI-based CVD risk stratification, namely multiclass, multi-label, and ensemble, where multiclass was most popular and multi-label was least, which happened to be our first key contribution. The second contribution was exhaustive analysis by selecting the best 265 studies using the PRISMA model for understanding the three kinds of machine learningbased systems for prediction of the CVD risk. This was based on our hypothesis that there exists a biological link between atherosclerotic disease formation and the CVD risk. The third contribution was the identification of the top four covariates, namely OBBM, LBBM, CUSIP, and MedUSE for designing the training model using a machine learning framework. The fourth contribution was on the choice of the gold standard for an unbiased AI system design for CVD risk prediction, which leads to a robust and reliable CVD prediction system. The fifth finding and contribution required that the ML system undergo clinical and scientific validation for reliability, stability, and robustness of the system design. Lastly, we observed that with the advancement of telecommunication systems, mobile and cloud-based strategies are speedily penetrating the CVD risk stratification system designs. Low-powered edge devices like Rasberry Pi and Jetsen Nano are like to be adopted in the future.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Pseudo-Code for Multiclass Classification
Appendix A. 1

. Typical Online System for CVD Risk Stratification for Multiclass
This system shows the amalgamation of online covariates, which are then transformed by the ML-based training model using multiclass-based models. The output yields the multiclass risk marked in color (low, mild, moderate, and high risk).

Appendix A.2. Pseudo-Code for Multiclass
The pseudo-code describes the process used by the multiclass algorithm for CVD risk stratification into granular risk classes. It uses the "for" loop for training and prediction of each fold of data, which were divided into K folds. The training model is applied to the test data and the PE was predicted and stored in form of accuracy (ACC), ROC, sensitivity (Sen), specificity (Spec), F1-score, the area-under-the-curve (AUC), and precision.

Appendix A.2. Pseudo-Code for Multiclass
The pseudo-code describes the process used by the multiclass algorithm for CVD risk stratification into granular risk classes. It uses the "for" loop for training and prediction of each fold of data, which were divided into K folds. The training model is applied to the test data and the PE was predicted and stored in form of accuracy (ACC), ROC, sensitivity (Sen), specificity (Spec), F1-score, the area-under-the-curve (AUC), and precision. Figure A1. Typical online system for multiclass CVD risk stratification.

Appendix A.2. Pseudo-Code for Multiclass
The pseudo-code describes the process used by the multiclass algorithm for CVD risk stratification into granular risk classes. It uses the "for" loop for training and prediction of each fold of data, which were divided into K folds. The training model is applied to the test data and the PE was predicted and stored in form of accuracy (ACC), ROC, sensitivity (Sen), specificity (Spec), F1-score, the area-under-the-curve (AUC), and precision. Figure A2. Pseudo-code for multiclass technique.

Appendix B.1. Problem Transformation Methods for Multi-Label Prediction
The problem transformation method (PTM) makes the multi-label classification problem to one or more single label classification tasks. Basically, four PTM, namely BR, CC, LP, and RakEL were used as discussed below: Binary Relevance: In the BR technique, the problems get divided into one or more single-label classification problems. The single-label classification resembles the binary prediction. An example can be described as, say M is a set of "q" labels with M = {m1, m2, …, Mq}, the BR technique makes "q" single-label binary classifiers for each label. The multilabel training sets get converted to binary datasets ("q"), and Elj = 1…..q, where Elj has all Figure A2. Pseudo-code for multiclass technique.

Appendix B.1. Problem Transformation Methods for Multi-Label Prediction
The problem transformation method (PTM) makes the multi-label classification problem to one or more single label classification tasks. Basically, four PTM, namely BR, CC, LP, and RakEL were used as discussed below: Binary Relevance: In the BR technique, the problems get divided into one or more single-label classification problems. The single-label classification resembles the binary prediction. An example can be described as, say M is a set of "q" labels with M = {m1, m2, . . . , Mq}, the BR technique makes "q" single-label binary classifiers for each label. The multi-label training sets get converted to binary datasets ("q"), and Elj = 1 . . . q, where Elj has all samples of the original dataset but with single positive or negative values. The dataset gets divided into "q" single label datasets with classifier C and the next classifier set is obtained as Cj (E), j = 1 . . . q by the training set Elj. The label dependency was not considered by the BR classification algorithm. Thus, it shows less complexity in the computation as compared with other multi-label techniques. The process is shown in the following Figure A3 [258]. As shown in Figure A3 four examples were considered as multi-label dataset and label set M with four labels (m1, m2, m3, and m4) which is split as four single labels that are independent.
Classifier Chain: This algorithm also works in single-label classification. This technique takes a class of classifiers where the very initial classifier is trained with the dataset, which acts as the input, following that each classifier gets trained with the whole feature space. The feature set has an original dataset with the label set used in the earlier base classifier that is in the chain. Each base classifier uses the earlier label information for training and testing models. Thus, a correlation exists in the CC algorithm. Figure A4 describes the functioning of CC [259].
Classifier Chain: This algorithm also works in single-label classification. This technique takes a class of classifiers where the very initial classifier is trained with the dataset, which acts as the input, following that each classifier gets trained with the whole feature space. The feature set has an original dataset with the label set used in the earlier base classifier that is in the chain. Each base classifier uses the earlier label information for training and testing models. Thus, a correlation exists in the CC algorithm. Figure A4 describes the functioning of CC [259].  Label Powerset: It also converts the prediction situation to a single-label multiclass prediction technique. In this technique, all possible individual group of labels is given special or unique class. Such as if three types of labels are there, then eight different types of combinations can come into the picture. LP technique has eight types of labels that get nique takes a class of classifiers where the very initial classifier is trained with the dataset, which acts as the input, following that each classifier gets trained with the whole feature space. The feature set has an original dataset with the label set used in the earlier base classifier that is in the chain. Each base classifier uses the earlier label information for training and testing models. Thus, a correlation exists in the CC algorithm. Figure A4 describes the functioning of CC [259].  Label Powerset: It also converts the prediction situation to a single-label multiclass prediction technique. In this technique, all possible individual group of labels is given special or unique class. Such as if three types of labels are there, then eight different types of combinations can come into the picture. LP technique has eight types of labels that get Label Powerset: It also converts the prediction situation to a single-label multiclass prediction technique. In this technique, all possible individual group of labels is given special or unique class. Such as if three types of labels are there, then eight different types of combinations can come into the picture. LP technique has eight types of labels that get trained for prediction. This technique deals with a large number of classes that are related to small instances, and also consideration of correlation is done. The transformation was shown in Figure A5 [260]. In Figure A5  trained for prediction. This technique deals with a large number of classes that are related to small instances, and also consideration of correlation is done. The transformation was shown in Figure A5 [260]. In Figure A5 the 1st table shows the original datasets, and the 2nd table is showings the transformed datasets. Random k-label set: It is a type of combination technique used for multi-label prediction. Every combination method gets trained on a small size of the randomly selected subset of labels by a single-label-based classifier. This process is described as if L labels in the dataset (E), the RAkEL classifier turns this data to all the possible k-label sets (Lk). Each label set is then trained for prediction. Finally, the prediction is made into positive (1) and negative (0) values in accordance with the threshold (0.5). The further implementation can Random k-label set: It is a type of combination technique used for multi-label prediction. Every combination method gets trained on a small size of the randomly selected subset of labels by a single-label-based classifier. This process is described as if L labels in the dataset (E), the RAkEL classifier turns this data to all the possible k-label sets (L k ). Each label set is then trained for prediction. Finally, the prediction is made into positive (1) and negative (0) values in accordance with the threshold (0.5). The further implementation can be seen in [261].

Appendix B.2. Algorithm Adaptation Methods for Multi-Label Prediction
Multi-label KNN: This algorithm is basically an implementation of the KNN algorithm in multi-label datasets. The neighbors are selected from unseen training sets. Next, the labelset are found for the instance which are unseen in nature by utilizing the maximum of posteriori (MAP) principle. The full algorithm can be seen in [262].
Multi-label ARAM: It is associated with the neural network model based on resonance theory. The advantage of this algorithm is its fast learning ability. The detailed algorithm can be seen in [263].

Appendix B.3. Pseudo-Code for Multi-Label Classification Technique
Multi-label pseudo-code describes the multi-label algorithm where more than one multi-label endpoint was considered. For each multi-label endpoint, the risk class was defined. In this pseudo-code, two "for" loops are used one for multi-label and the next for multiclass prediction. Finally, the PE was determined as accuracy, sensitivity (Sen), specificity (Spec), area-under-the-curve (AUC), sample-based, and label-based metrics.

Pseudo-Code for Ensemble-Based Technique
Ensemble-based-CVD risk prediction uses combinations of multiple classifiers. The pseudo-code shows that the data are divided into testing and training with K folds. The prediction was done using each type of classifier for multiclass and multi-label prediction. Then each type of classifier is combined into an ensemble classifier and the final prediction was made.

Pseudo-Code for Ensemble-Based Technique
Ensemble-based-CVD risk prediction uses combinations of multiple classifiers. The pseudo-code shows that the data are divided into testing and training with K folds. The prediction was done using each type of classifier for multiclass and multi-label prediction. Then each type of classifier is combined into an ensemble classifier and the final prediction was made. Diagnostics 2022, 12, x FOR PEER REVIEW 31 of 48 Figure A7. Pseudo-code for ensemble-based technique.

Appendix D. Comparison between 3 Paradigms
Comparison of ML-Based Multiclass, Multi-Label, and Ensemble CVD Classification

Performance Evaluation Metrics Descriptions
The PE for the multiclass and ensemble basically have accuracy (ACC), sensitivity (Sen), specificity (Spec), AUC, F1-Score which were calculated using values of true positives (TPs), false positives (FPs), false negatives (FNs), and true negatives (TNs). The formulae can be referred from Table A2. The performance evaluation for multi-label-based CVD is different as compared to multiclass and ensemble. They are label-based, instance-based performance evaluations.
In the label-based techniques, the PE parameters are checked for each label by the values of TPs, FPs, FNs, and TNs. All the labels have their own values. S, these are calculated by averaging methods (i) macro-averaging and (ii) micro-averaging [181]. The performance metrics say β is calculated by the values of TPs, FPs, FNs, and TNs, the macroaveraging techniques, macro-averaging (β macro ) for all labels (L) is given by averaging β for each label "p", as shown in Equation (A1).
In the same manner, for the micro-averaging techniques, the PE metrics are computed for each individual label and finally obtaining the micro-average (β micro ) by using the Equation (A2).
For instance-based performance evaluation, the parameters are calculated for individual instances, then the average value is computed and final the performance metric is performed. The final metric has a hamming loss, precision, recall, F1-score, Jaccard similarity coefficient score, and accuracy.
The multi-label dataset is supposed to be |E| with multi-label examples (pi, Qi), i = 1 . . . |E|, and Qi ⊆ L, L is a set of all multiple labels. C is a multi-label classifier and Mi = C (pi) be the set of labels predicted by C. |E| indicates the features of the set E, while |Qi∩ Mi| indicates the feature of the intersection of true labels and the predicted labels. |Mi| indicates the features of predicted labels, and |Qi| indicates the features of the true labels.
Hamming loss shows the number of times when the label pair is misclassified. The lower value of Humming loss presents the better performance of the multi-label classifier. Jaccard score presents the ratio of the size of the intersection between predicted and the ground truth labels. Precision is the proportion of correct predictions out of all predictions. Likewise, recall is the ratio of correct predicted labels to the actual labels. F1-score is the combination of precision and recall Table A3. Table A2. Performance evaluation metrics used in CVD risk assessment.

Power Analysis for Multi-Label and Ensemble-Based CVD Risk Stratification
Power analysis can be done for multi-label and ensemble-based CVD systems. Its objective was to state the smallest data or sample size (s) needed to perform the multi-label, ensemble-based CVD risk classification. The parameters which are required for calculating power analysis are confidence interval, a margin error (e) as ±5%, and a sample proportion (q), the z-score (z * ) (taken standard z-table). Therefore, the formula used is shown in Equation (A3) [264,265].