You are currently viewing a new version of our website. To view the old version click .
Entropy
  • Article
  • Open Access

13 December 2020

Identification of the Framingham Risk Score by an Entropy-Based Rule Model for Cardiovascular Disease

,
,
and
1
Department of Information Management, Hwa Hsia University of Technology, New Taipei City 235, Taiwan
2
Department of Information Management, National Yunlin University of Science and Technology, Douliou, Yunlin 64002, Taiwan
3
National Museum of Marine Science & Technology, Keelung City 202010, Taiwan
*
Authors to whom correspondence should be addressed.
This article belongs to the Section Entropy and Biology

Abstract

Since 2001, cardiovascular disease (CVD) has had the second-highest mortality rate, about 15,700 people per year, in Taiwan. It has thus imposed a substantial burden on medical resources. This study was triggered by the following three factors. First, the CVD problem reflects an urgent issue. A high priority has been placed on long-term therapy and prevention to reduce the wastage of medical resources, particularly in developed countries. Second, from the perspective of preventive medicine, popular data-mining methods have been well learned and studied, with excellent performance in medical fields. Thus, identification of the risk factors of CVD using these popular techniques is a prime concern. Third, the Framingham risk score is a core indicator that can be used to establish an effective prediction model to accurately diagnose CVD. Thus, this study proposes an integrated predictive model to organize five notable classifiers: the rough set (RS), decision tree (DT), random forest (RF), multilayer perceptron (MLP), and support vector machine (SVM), with a novel use of the Framingham risk score for attribute selection (i.e., F-attributes first identified in this study) to determine the key features for identifying CVD. Verification experiments were conducted with three evaluation criteria—accuracy, sensitivity, and specificity—based on 1190 instances of a CVD dataset available from a Taiwan teaching hospital and 2019 examples from a public Framingham dataset. Given the empirical results, the SVM showed the best performance in terms of accuracy (99.67%), sensitivity (99.93%), and specificity (99.71%) in all F-attributes in the CVD dataset compared to the other listed classifiers. The RS showed the highest performance in terms of accuracy (85.11%), sensitivity (86.06%), and specificity (85.19%) in most of the F-attributes in the Framingham dataset. The above study results support novel evidence that no classifier or model is suitable for all practical datasets of medical applications. Thus, identifying an appropriate classifier to address specific medical data is important. Significantly, this study is novel in its calculation and identification of the use of key Framingham risk attributes integrated with the DT technique to produce entropy-based decision rules of knowledge sets, which has not been undertaken in previous research. This study conclusively yielded meaningful entropy-based knowledgeable rules in tree structures and contributed to the differentiation of classifiers from the two datasets with three useful research findings and three helpful management implications for subsequent medical research. In particular, these rules provide reasonable solutions to simplify processes of preventive medicine by standardizing the formats and codes used in medical data to address CVD problems. The specificity of these rules is thus significant compared to those of past research.

1. Introduction

This section explores the research background and problem in the relevant medical domains, the research gaps and motivations, and the study goals and research objectives.

1.1. Research Background and Research Problem

Cardiovascular disease (CVD) is one of the main causes of death [] in most countries and likely results in related problems in the blood vessels or the heart, such as cerebrovascular disease (i.e., stroke), congenital heart disease, coronary heart disease (CHD) [], heart failure, peripheral artery disease, raised blood pressure (i.e., hypertension), and rheumatic heart disease. It is estimated that 25 million new cases of heart disease each year are diagnosed. Thus, CVD has been a key cause of death by a serious illness in recent years [,]. Effective identification of the risk factors of CVD is therefore highly important for clinical research for long-term therapy and prevention to reduce wastage of medical resources. Based on past studies, the main risk factors of CVD include an unhealthy diet, harmful use of alcohol and tobacco, and physical inactivity. According to a report from the World Health Organization [], in 2015, there were 17.7 million deaths related to CVD, representing 31% of global deaths and a higher mortality risk than that of general diseases. Among these deaths, 6.7 million died from a stroke, and 7.4 million died from CHD. In 2015, 82% of premature deaths (under age 70) were caused by non-communicable diseases (NCDs) in low- and middle-income countries (LMICs), 37% of which resulted from CVD. In recent decades, the mortality rate of CVD has declined in high-income countries (HICs). Conversely, it has increased in LMICs surprisingly rapidly. However, it is possible that efforts to improve healthcare interventions against risk factors of CVD result in a significant reduction of deaths and medical resources and socioeconomic burdens. Thus, identification of the CVD problem is first emphasized in this study. This problem poses significant challenges and an opportunity in advanced preventive medicine for medical application research.
Given the above characterizations, it is clear that the issue of identifying CVD is urgent. Thus, the related issue of the CVD problem is a key research concern and is one of the core research goals of this study. The benefits of this study are an exploration of the risk factors of CVD fatal diseases and effective methods for their identification. In the context of past studies, these are novel findings.

1.2. Research Gaps and Motivations

Table 1 lists the top five causes of death in 2015 in Taiwan, as reported by the Ministry of Health and Welfare []. It is clear that CVD has been the second most common cause of death since 2001. CVD not only results in higher medical resource expenses but also greater long-term expenditure for countries and individual families. To tackle this severe problem, developing countries have dedicated funds to CVD prevention and treatment through national education and training resources, thereby reducing incidence and mortality rates []. However, due to changes in lifestyle and eating habits in recent decades, incidence rates of CVD among younger people have experienced a growing trend. Thus, early detection and prevention of CVD and closing this research gap related to younger people have become critical issues. From the perspective of preventive medicine, the design of an effective predictive (classification) model to help doctors in the early and accurate diagnosis of CVD is required, and the effectiveness and quality of prevention and treatment need to be improved. Thus, this study is motivated by the following. First, the classification function of machine learning techniques for mining meaningful CVD rules (knowledge) from large amounts of data is a valuable approach. Notably, classification techniques have been successfully applied in medical fields, such as breast cancer [,] and heart diseases [,], by studying past research. Furthermore, Boursalie et al. [] efficiently utilized a support vector machine (SVM) classifier to monitor CVD via effective features and characteristics. Second, it is clear from a limited literature review that using effective risk factors (e.g., Framingham risk attributes) to mine CVD data is efficient and effective []. Although past studies have used features of Framingham risk to help prevent CVD [], they have rarely used a hybrid model to integrate the Framingham risk score and classification techniques to address the issue of CVD for doctors and patients. Finally, the Framingham risk score is an effective instrument that can be used to build a hybrid prediction model to accurately diagnose CVD. Thus, this study is beneficial both theoretically and practically.
Table 1. Top five causes of death in 2015 in Taiwan.
Based on the above descriptions, serious diseases (e.g., CVD) require greater healthcare and the identification of potential preventable materials or approaches at both the country and individual family levels. The challenge related to CVD has thus attracted significant attention from practitioners and academics. Therefore, the study examines the noteworthy and important issue of effectively identifying the risk factors of CVD.

1.3. Study Goals and Research Objective

The CVD problem reflects an urgent issue. A high priority has been placed on long-term therapy and prevention to reduce the wastage of medical resources via the greater use of preventive medicine, particularly for developed countries. Furthermore, popular data-mining methods have been well learned and studied, with excellent performance in medical fields. Thus, identification of the risk factors of CVD using popular data-mining techniques is also a major goal of this study. In addition to addressing the issue of CVD, the study also aims to reduce the wastage of medical resources and financial expenditure.
In data-mining fields, various emerging machine learning models have been found to be superior in their application to healthcare issues []. Thus, in this study, a hybrid predictive model was first built to identify CVD using five well-performing classifiers in the medical domain, namely, decision tree (DT), multilayer perceptron (MLP), random forest (RF), rough set theory (RST), and SVM, and simultaneously measure the performance of the Framingham risk score with full attributes. In practice, the study used two real datasets collected from a Taiwan teaching hospital with CVD cases and a public Framingham dataset from the Internet for the further benefit of preventive treatment. To summarize, this study has three research purposes: (1) construction of a hybrid predictive model of CVD based on diverse classifiers and the Framingham risk score; (2) identification of the key determinants of CVD by attribute selection methods and extraction of comprehensible entropy-based rule sets based on DT; and (3) provision of analytical results and research findings with management implications to healthcare providers, physicians, and patients as a useful medical reference. The entropy-based method is beneficial for information gain [].
The remainder of this paper is organized as follows: A literature review of the study issues, including CVD, the Framingham risk score, and the five classification techniques, is provided in Section 2. The research concepts justifying the procedure of the proposed method are presented in Section 3. The analysis results and the core research findings from the experiments are provided in Section 4. Finally, Section 5 concludes and suggests subsequent research.

3. Proposed Method

To achieve the study goal, another key function of a classification technique is to classify an unknown category of given data objects corresponding to the known category for prediction purposes. The most challenging task of classification techniques is to learn how to select a suitable technique to improve the accuracy in the medical field. Previous studies of Framingham risk attributes have used statistical methods conducted with the SPSS software, in addition to the area under the curve (AUC) of the receiver operating characteristic (ROC) curve, sensitivity, and specificity, to show the prediction power for addressing CVD problems during the past 10 years []. However, they did not apply the Framingham risk score to intelligent machine learning classification techniques for identifying CVD and, in particular, for the extraction of knowledgeable rules. Thus, the above gaps should be bridged using well-defined methods of techniques used in medical research applications, which is the focus of this study. This study proposed using the Framingham risk score to filter attributes and then employing five classification techniques to build a hybrid model for identifying CVD. Two CVD datasets were compiled for the verification of the proposed model, and the performances of the Framingham risk score, Framingham attributes, and the full attributes in five classifiers were compared.
The proposed method has five steps: compile the dataset → preprocess the data → select the attributes → build the model → evaluate the results. Figure 2 shows a flow diagram of the algorithm proposed in this method, and detailed information about the proposed algorithms is provided in the following.
Figure 2. Research procedure of the proposed method.
  • Step 1: Collect dataset. An adequate dataset is key in the identification of CVD; thus, this study compiled two datasets with CVD data. The first was compiled from a real prevention and treatment plan of CVD in Taiwan’s regional teaching hospital and had 20 attributes with 1190 records (including 551 men and 639 women). It is referred to as the CVD dataset in this study. The second dataset is a Framingham public dataset collected with an original 18 attributes and extracted from the University of Washington (http://courses.washington.edu/b513/datasets/datasets.php?class=513) via a uniform resource locator. It is referred to as the Framingham dataset.
  • Step 2: Preprocessing data. In general, real data may have inconsistencies, gaps, errors, inaccuracies, impossible data combinations, and noise compared to the original dataset. Such data are not suitable for use by machine learning to discover hidden information/knowledge from large datasets. However, this step cannot remove these values. The average values of the data interpolation method are filled with missing values, and the dataset is then reformatted appropriately.
The first CVD dataset is employed to illustrate the proposed method. Based on a literature review, there are eight types of related CVDs, such as arrhythmia, cardiogenic shock, and diabetes mellitus. Twenty original attributes are reduced to 13 according to the expert opinion of physicians. Consequently, the 13 attributes include five physical exam attributes, seven blood test attributes, and one decision-attribute of the class with the eight CVD names noted above. There are two categories in the class, that is, Y (Yes): Have at least one CVD and N (No): None. These are listed in Table 5 below. Next, the second dataset has all of the attributes, including age, sex, serum cholesterol, DBP, SBP, Metropolitan relative weight, smoke, and CVDs (class), which is based on the expert recommendation of doctors. For further understanding of these datasets, their properties are shown in Table 6.
Table 5. The coding information for cardiovascular disease (CVD) dataset attributes.
Table 6. Properties of the two medical datasets used.
  • Step 3: Select attributes. First, the eight attributes of the Framingham risk score are age, gender, DBP, SBP, LDL-C, HDL-C, fasting glucose, and smoking, which are named the Framingham attributes. Next, these factors are selected to calculate and transform their score values based on the conversion method of the Framingham risk score discussed in Section 2.2 [] and shown in Table 2, Table 3 and Table 4 from the CVD dataset. Finally, all 13 attributes of the CVD dataset are selected and called the full attributes, which additionally include body mass index, waistline, red blood cells, and white blood cells.
  • Step 4: Build a model. This step designs a hybrid model of five classifiers (i.e., RST, DT, RF, MLP, and SVM) with various attribute components, which include the Framingham score attributes, the Framingham attributes, and the full attributes, to highlight and differentiate the performance of the proposed method based on a commonly used 67–33% training–testing ratio. This step can be divided into five sub-steps and is executed using different software packages. The procedure of the five sub-steps is as follows: First, the selected attributes are used as input variables. Second, for the percentage-split data method, the two training–testing sub-datasets are formed using the common 2:1 ratio to achieve a good and reasonable result in practice. Thus, 67% of the data is used as a training sub-dataset, and the remaining 33% of the data is used as the testing sub-dataset. Third, all of the default parameters are defined to implement each of the above classifiers. Fourth, RST is applied using the rough set exploration system (RSES) [], and DT, RF, MLP, and SVM are applied separately. Fifth, comprehensive knowledge-based rule sets are created using DT. For further details, the pseudo-code of the construction of the hybrid model is shown in Algorithm 1.
Algorithm 1: Pseudo Code of Building Hybrid Model
Input: SA = selected attributes list, CD = collected data, RSES = rough set exploration system
 Use attributes from SA;
 Create 67% of data for training from CD;
 Create the other 33% of data for testing from CD;
 Define all the default parameters;
 Apply RST classifier by RSES;
 Apply DT, RF, MLP, and SVM classifiers, respectively;
 Create tree-based rules sets by DT;
Output: knowledge(tree)-based rules sets
  • Step 5: Evaluate the results. To evaluate machine learning, most researchers use three common metrics: accuracy, sensitivity, and specificity. These are defined in Table 7 of the confusion matrix and Equations (7)–(9). In Table 7, a true positive (TP) means the truth is positive, and the prediction is positive; a false positive (FP) means the truth is negative, but the prediction is positive; a false negative (FN) means the truth is positive, but the prediction is negative, and a true negative (TN) means the truth is negative, and the prediction is negative.
    Table 7. Confusion matrix for classification.
a c c u r a c y = T P + T N T P + T N + F P + F N ,
s e n s i t i v i t y = T P T P + F N ,
s p e c i f i c i t y = T N T N + F P .

4. Experiment Result and Comparisons

This section verifies the proposed method and its algorithms with two medical datasets and compares the listing models to further evaluate the classification performance.

4.1. Introduction and Preprocessing of the Two Medical Datasets

To further explore the experimental CVD dataset, Table 8 shows the descriptive statistics for the first CVD dataset using the chi-squared test to summarize and compare the baseline characteristics of the continuous attributes and the categorical attributes.
Table 8. Descriptive statistics of the CVD dataset.

4.2. Experimental Results

The extracted entropy-based decision rules, core attributes, accuracy rate, sensitivity, and specificity were calculated from the two medical datasets after the experiments.

4.2.1. The CVD Dataset

After data preprocessing and attribute selection, the Framingham attributes and full attributes were identified, and the five different classifiers were used and compared with various evaluation standards for the overall accuracy rate using a training–testing ratio of 67–33% to measure the performance of the proposed method.
Consequently, the visualized tree structure of if–then–else control statements in the DT model was used to identify future CVD issues. Figure 3 lists the entropy-based rule results of the CVD dataset in a visualized tree structure. As shown in Figure 3, one case is exemplified and highlighted in red and green, and one key result of the core attributes is defined.
Figure 3. Entropy-based rule results of the cardiovascular disease (CVD) dataset in a visualized tree structure.
(1)
First, it is indicated that, if the score measurement of glu_cla (i.e., diabetes) is higher than 116, glu_cla is also higher than 155, else if glu_cla is less than 116, there is no CVD case; if the score measurement of hdl_1-F (i.e., Framingham high-density cholesterol) is higher than −2, then there is a CVD case; else if Age-F (score of Framingham age) is higher than 7, there is a CVD case, but if it is less than or equal to 7, there is no CVD case.
(2)
Second, the core attributes were obtained by dynamic reduction through discrete tables, and seven core attributes for Framingham risks were identified, including sex (i.e., gender), age_F (i.e., age), cho_1-F, hdl_1-F, BP_F, glu_cla, and smoke (i.e., smoking).
Accordingly, Table 9 shows the comparative results of various evaluation standards on the Framingham score attributes, the Framingham attributes, and the full attributes of the CVD dataset with five classifiers. In Table 9, it is clear that SVM outperforms the other four classifiers in terms of accuracy (99.67%), sensitivity (99.93%), and specificity (99.71%) for the three aspects of the Framingham score attributes, the Framingham attributes, and the full attributes. This implies that SVM is more suitable for identifying and addressing CVD than the other methods examined in this study.
Table 9. The comparative results of various evaluation standards in the CVD dataset.

4.2.2. The Framingham Dataset

The second Framingham dataset with 18 original attributes was also considered. Similarly, after the preprocessing and identification of the Framingham attributes and the full attributes, this dataset was used with the five different classifiers to assess the evaluation performance of the proposed method in terms of average accuracy rate for 10 repetitions, also using a training–testing ratio of 67–33%.
DT was also used to visualize the tree structure of if–then–else control statements for identifying CVD. Figure 4 shows the entropy-based rule results of the Framingham dataset in a visualized tree structure. In Figure 4, two key points can be determined. First, a case from the tree-like structure is highlighted in red and green. Next, key attributes are accordingly identified and determined.
Figure 4. Entropy-based rule results of the Framingham dataset in a visualized tree structure.
(1)
In Figure 4, it is indicated that, if the score measurement of the Framingham attribute of age (age_f) is higher than 1, else if age is less than 1, there is a CVD case; if the score measurement of the Framingham attribute of blood pressure (bp_f) is higher than 2, and the sex (gender) score is higher than 1, then the age (age_f) score is higher than 6, and there is a CVD case; else if the score of Framingham age is less than or equal to 6, then there is no CVD case.
(2)
Five key attributes were also obtained by dynamic reduction through discrete tables: sex, age_f (i.e., age), cho_f, bp_f, and smoke (i.e., smoking).
Table 10 lists the analytical results for the Framingham attributes and the full attributes in the Framingham dataset after the experiments. In Table 10, the five classifiers are compared with the three evaluation standards in terms of accuracy, sensitivity, and specification. It is clear that the RS method has the highest performance in terms of accuracy (85.11%), sensitivity (86.06%), and specificity (85.19%) in all Framingham score attributes, the Framingham attributes, and the full attributes for the Framingham dataset. This information implies that the rough set theory is more suitable for identifying CVD in the Framingham dataset than the other four classifiers examined in this study.
Table 10. The analytical results of the Framingham attributes and the full attributes for the Framingham dataset.

4.3. Findings

Three key findings and management implications follow from the experimental results:
(1)
The advantage of Framingham attributes with classification techniques: Although previous studies have used Framingham attributes and statistical methods to process CVD, they did not use a hybrid model to integrate Framingham score attributes and the five noted classification techniques for the identification of CVD or differentiation of various classifier performances. This study closes these gaps, and the proposed method not only aids understanding of comprehensive CVD tree-like entropy-based rules but also helps to prevent or even solve CVD problems. In addition, the score use of Framingham attributes can be an effective clinical reference for doctors and health care workers to improve the identification of CVD. Thus, a key specificity and novelty in this study are that the Framingham risk attribute scores can be calculated and used to produce entropy-based decision rules. This has not been undertaken in previous research based on our limited literature review.
(2)
The identification of key attributes for CVD: The key attributes of the original Framingham risk data were identified by Dr. Rupert Payne, University of Edinburgh, and were uniformly compared with those of the first and second medical datasets in the experiments of this study. Table 11 shows their comparative results from the following three perspectives for the CVD issue: (a) Table 11 shows that the key attributes of the Framingham risk, the CVD dataset, and the Framingham dataset are in the order of 9, 7, and 5, respectively, for identifying CVD. It was found that the number of Framingham key attributes significantly and positively affects the overall classification accuracy, which can be proven from the accuracy rates in Table 9 and Table 10. (b) The two datasets are missing key attributes, resulting in poor classification performance. If sufficient key attributes are addressed in the future, overall accuracy can improve. (c) The importance of the key attributes for identifying CVD is listed in a top-down order, which may provide helpful information.
(3)
The technical implications of the classifiers used: Based on the experimental results of the two medical datasets, no classifier or model is suitable for all practical datasets used in different applications. Thus, an appropriate classifier must be first found and defined to address specific data in the machine learning community.
(4)
The management implications of healthcare issues: Regarding management implications, a set of standards was provided from the entropy-based decision rules of a tree-like structure (e.g., Figure 3 and Figure 4) to help prevent or solve CVD problems in advance. (a) First, this can indirectly remind CVD patients how to self-manage. (b) Second, the experimental results allow doctors to provide patients with managerial suggestions, such as changes in eating habits, self-measurement and self-control of blood pressure, cessation of smoking, and increased exercise. (c) Finally, the Framingham risk attributes used to identify CVD with the classification techniques listed in this study can be regarded as an effective prediction model for processing CVD. The analytical results can be stored for doctors and health personnel as clinical references.
Table 11. Key attributes of comparison for CVD issues.

5. Conclusions

This study proposes a hybrid method to integrate and model Framingham risk attributes and five novel classification techniques—RST, DT, RF, MLP, and SVM—for the identification of key attributes that influence CVD and to highlight preventive practices in healthcare services. The study’s contribution consists of calculating the score of Framingham risk attributes and identifying CVD using a suitable classifier for different datasets in hybrid medical applications. This contribution differs from those of previous studies []. For verification using the three evaluation criteria (accuracy, sensitivity, and specificity), 1190 instances in the CVD dataset available from Taiwan’s regional teaching hospital and 2019 examples from the public Framingham dataset were used. SVM showed the best performance in terms of accuracy (99.67%), sensitivity (99.93%), and specificity (99.71%) in all of the F-attributes in the CVD dataset, and RS showed the best performance in the accuracy (85.11%), sensitivity (86.06%), and specificity (85.19%) in most of the F-attributes for the Framingham dataset. Consequently, three main points can be made regarding the contribution, specificity, and novelty of this study: (1) Regarding its contribution, this study supports meaningful entropy-based knowledgeable rules for visualizing a tree structure and differentiates the classifiers from the two datasets, resulting in three useful research findings and three helpful management implications for subsequent medical research and other interested parties. (2) Regarding its novelty, the study results provide novel evidence that indicates no classifier or model is suitable for all practical datasets of medical applications. Thus, finding an appropriate classifier to address specific medical data is highly important. Furthermore, this study is the first to calculate and identify the use of key Framingham risk attribute scores, integrated with the five classification techniques noted above and with the DT technique, to produce entropy-based decision rules of knowledge sets. This has not been achieved in previous studies. (3) Regarding the specificity of the study, the knowledgeable rule sets created by DT provide reasonable solutions to simplifying the processes of preventive medicine by standardizing the formats and codes used in medical data to address CVD problems. The specificity of these rules is thus significant compared to those of past research.
Five conclusive and important directions are indicated by the analytical results of the experiments using the two medical datasets:
(1)
According to the empirical results, the CVD and Framingham datasets were processed into useful entropy-based tree information/knowledge by applying five classifiers to knowledge discovery in the databases. Furthermore, machine learning tools were found to be useful in medical applications with the integration of Framingham risk attributes. Regarding the experimental results, the support vector machine method showed a better performance using Framingham score attributes and five classifier techniques in terms of accuracy, sensitivity, and specificity in the CVD dataset. However, the rough set method outperformed the other classifiers in the Framingham dataset. In addition, through the entropy-based decision rule’s visualization of trees, the if–then–else control statement provides an understanding of how all decision items with a class status of “Y” or “N” (i.e., Yes or No) are identified in the two medical datasets to address CVD problems.
(2)
Based on the literature review, machine learning and knowledge discovery using medical databases have attracted a significant amount of interest from academia and other fields. Given this interest in CVD applications, this study provides insights into data-mining techniques of machine learning and statistical methods with industry databases.
(3)
This study concerns practical CVD applications and is a trial of machine learning techniques. It involves the extraction of helpful entropy-based decision rules, the anticipation of decision-making challenges for real-life applications of knowledge discovery, understanding of current results and insights, and exploration of future research directions.
(4)
In Taiwan, the number of CVD patients has increased during the past 10 years due to changes in lifestyle and eating habits. Furthermore, CVD, cerebrovascular disease, diabetes, and hypertension were ranked second, third, fifth, and tenth among the top causes of death. It is clear that CVD is a serious issue; thus, early diagnosis and prevention strategies of CVD are important for healthcare and to decrease the use of medical resources.
(5)
Several issues can be examined in subsequent research: (a) to further evaluate the proposed method, more attributes of Framingham risk can be collected to better predict CVD; (b) more classification techniques can be used and measured to predict CVD; (c) an alternative to the proposed method can be constructed with a variety of evaluation standards; and (d) more knowledge-based decision rules, which are not based on a variation of DT entropy, such as RSES of the rough set theory, can be provided for medical applications.

Author Contributions

Conceptualization, J.-Y.J.; methodology, C.-H.C. and Y.-S.C.; software, S.-F.C.; validation, Y.-S.C. and S.-F.C.; formal analysis, S.-F.C.; investigation, Y.-S.C. and C.-H.C.; resources, Y.-S.C.; data curation, S.-F.C.; writing—original draft preparation, J.-Y.J.; writing—review and editing, Y.-S.C. and C.-H.C.; visualization, Y.-S.C. and C.-H.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported and funded by the Ministry of Science and Technology, Taiwan, grant number MOST 109-2221-E-146-003.

Acknowledgments

The authors would like to cordially express many thanks for the above financial support of this paper and also really appreciate the editors of Entropy and the anonymous reviewers for providing their constructive suggestions for the improvement of paper quality.

Conflicts of Interest

The authors declare that there are no conflict of interest.

References

  1. Hojat, M.; Jahromi, M.K.; Koshkaki, S.R.; Rahmanian, M. Comparison of risk factors of cardiovascular diseases in male and female nurses. J. Educ. Health Promot. 2019, 8, 19. [Google Scholar] [CrossRef]
  2. Suárez, C. Baseline characteristics of patients with cerebrovascular disease in the REACH registry: The Spanish contribution. Cerebrovasc. Dis. 2007, 24, 89–95. [Google Scholar] [CrossRef]
  3. Jahromi, M.K.; Hojat, M.; Koshkaki, S.R.; Nazari, F.; Ragibnejad, M. Risk factors of heart disease in nurses. Iran. J. Nurs. Midwifery Res. 2017, 22, 332–337. [Google Scholar]
  4. World Health Organization. Cardiovascular Diseases (CVDs). Available online: http://www.who.int/ mediacentre/factsheets/fs317/en/ (accessed on 1 May 2020).
  5. Ministry of Health and Welfare, Executive Yuan, Taiwan. Available online: http://www.mohw.gov.tw/CHT/ DOS/Statistic.aspx?f_list_no=312&fod_list_no=6201 (accessed on 1 May 2020).
  6. McPherson, R.; Frohlich, J.; Fodor, G.; Genest, J. Canadian Cardiovascular Society position statement–recommendations for the diagnosis and treatment of dyslipidemia and prevention of cardiovascular disease. Can. J. Cardiol. 2006, 22, 913–927. [Google Scholar] [CrossRef]
  7. Simsekler, M.C.E.; Qazi, A.; Alalami, M.; Ellahham, S.; Ozonoff, A. Evaluation of patient safety culture using a random forest algorithm. Reliab. Eng. Syst. Saf. 2020, 204, 107186. [Google Scholar] [CrossRef]
  8. Sri, M.N.; Priyanka, J.H.; Sailaja, D.; Murthy, M.R. A comparative analysis of breast cancer data set using different classification methods. In Smart Intelligent Computing and Applications; Satapathy, S., Bhateja, V., Das, S., Eds.; Springer: Singapore, 2019; Volume 104. [Google Scholar]
  9. Amin, M.S.; Chiam, Y.K.; Varathan, K.D. Identification of significant features and data mining techniques in predicting heart disease. Telemat. Inform. 2019, 36, 82–93. [Google Scholar] [CrossRef]
  10. Nilashi, M.; Ahmadi, N.; Samad, S.; Shahmoradi, L.; Ahmadi, H.; Ibrahim, O.; Asadi, S.; Abdullah, R.; Abumalloh, R.A.; Yadegaridehkordi, E.; et al. Disease diagnosis using machine learning techniques: A review and classification. J. Soft Comput. Decis. Support Syst. 2020, 7, 19–30. [Google Scholar]
  11. Boursalie, O.; Samavi, R.; Doyle, T.E. M4CVD: Mobile machine learning model for monitoring cardiovascular disease. Proc. Comput. Sci. 2015, 63, 384–391. [Google Scholar] [CrossRef]
  12. Santos, A.S.A.C.; Rodrigues, A.P.S.; Rosa, L.P.S.; Sarrafzadegan, N.; Silveira, E.A. Cardiometabolic risk factors and Framingham risk score in severely obese patients: Baseline data from DieTBra trial. Nutr. Metab. Cardiovasc. Dis. 2020, 30, 474–482. [Google Scholar] [CrossRef] [PubMed]
  13. Arslan, A.K.; Colak, C.; Sarihan, M.E. Different medical data mining approaches based prediction of ischemic stroke. Comput. Methods Prog. Biomed. 2016, 130, 87–92. [Google Scholar] [CrossRef] [PubMed]
  14. Servadio, J.L.; Convertino, M. Optimal information networks: Application for data-driven integrated health in populations. Sci. Adv. 2018, 4, 1701088. [Google Scholar] [CrossRef] [PubMed]
  15. Mayosi, B.M.; Cupido, B.; Lawrenson, J. 2—Cardiovascular diseases. In Hunter’s Tropical Medicine and Emerging Infectious Diseases, 10th ed.; Ryan, E.T., Hill, D.R., Solomon, T., Endy, T.P., Aronson, N., Eds.; Elsevier: Amsterdam, The Netherlands, 2020; pp. 8–15. [Google Scholar] [CrossRef]
  16. Yeh, D.Y.; Cheng, C.H.; Chen, Y.W. A predictive model for cerebrovascular disease using data mining. Expert Sys. Apps. 2011, 38, 8970–8977. [Google Scholar] [CrossRef]
  17. Henriksson, H.; Henriksson, P.; Tynelius, P.; Ekstedt, M.; Berglind, D.; Labayen, I.; Ruiz, J.R.; Lavie, C.J.; Ortega, F.B. Cardiorespiratory fitness, muscular strength, and obesity in adolescence and later chronic disability due to cardiovascular disease: A cohort study of 1 million men. Eur. Heart J. 2020, 41, 1503–1510. [Google Scholar] [CrossRef] [PubMed]
  18. Flack, J.M.; Adekola, B. Blood pressure and the new ACC/AHA hypertension guidelines. Trends Cardiovasc. Med. 2020, 30, 160–164. [Google Scholar] [CrossRef] [PubMed]
  19. McManus, R.; Constanti, M.; Floyd, C.N.; Glover, M.; Wierzbicki, A.S. Managing cardiovascular disease risk in hypertension. Lancet 2020, 395, 869–870. [Google Scholar] [CrossRef]
  20. Burkhardt, R. Hyperlipidemia and cardiovascular disease: New insights on lipoprotein (a). Curr. Opin. Lipidol. 2019, 30, 260–261. [Google Scholar] [CrossRef]
  21. Banks, E.; Joshy, G.; Korda, R.J.; Stavreski, B.; Soga, K.; Egger, S.J.; Day, C.; Clarke, N.; Lewington, S.; Lopez, A.D.; et al. Tobacco smoking and risk of 36 cardiovascular disease subtypes: Fatal and non-fatal outcomes in a large prospective Australian study. BMC Med. 2019, 17, 128. [Google Scholar] [CrossRef]
  22. World Heart Federation. Cardiovascular Disease Risk Factors. Available online: http://www.world-heart-federation.org/cardiovascular-health/cardiovascular-disease-risk-factors/tobacco/ (accessed on 1 May 2020).
  23. Framingham Heart Study (FHS). Framingham Risk Functions. Available online: https://www. framinghamheartstudy.org/index.php (accessed on 2 May 2020).
  24. Mosley, J.D.; Gupta, D.K.; Tan, J.; Yao, J.; Wells, Q.S.; Shaffer, C.M.; Kundu, S.; Robinson-Cohen, C.; Psaty, B.M.; Rich, S.S.; et al. Predictive accuracy of a polygenic risk score compared with a clinical risk score for incident coronary heart disease. JAMA 2020, 323, 627–635. [Google Scholar] [CrossRef]
  25. Sultani, R.; Tong, D.C.; Peverelle, M.; Lee, Y.S.; Baradi, A.; Wilson, A.M. Elevated triglycerides to high-density lipoprotein cholesterol (TG/HDL-C) ratio predicts long-term mortality in high-risk patients. Heart Lung Circ. 2020, 29, 414–421. [Google Scholar] [CrossRef]
  26. Patterson, K.A.E.; Ferrar, K.; Gall, S.L.; Venn, A.J.; Blizzard, L.; Dwyer, T.; Cleland, V.J. Cluster patterns of behavioural risk factors among children: Longitudinal associations with adult cardio-metabolic risk factors. Prev. Med. 2020, 130, 105861. [Google Scholar] [CrossRef]
  27. O’Connor, S.D.; Graffy, P.M.; Zea, R.; Pickhardt, P.J. Does nonenhanced CT-based quantification of abdominal aortic calcification outperform the Framingham risk score in predicting cardiovascular events in asymptomatic adults? Radiology 2019, 290, 108–115. [Google Scholar] [CrossRef] [PubMed]
  28. Wilson, P.W.; D’Agostino, R.B.; Levy, D.; Belanger, A.M.; Silbershatz, H.; Kannel, W.B. Prediction of coronary heart disease using risk factor categories. Circulation 1998, 97, 1837–1847. [Google Scholar] [CrossRef] [PubMed]
  29. Pawlak, Z. Rough sets. Int. J. Comput. Inf. Sci. 1982, 11, 341–356. [Google Scholar] [CrossRef]
  30. Acharjya, D.P. A hybrid scheme for heart disease diagnosis using rough set and cuckoo search technique. J. Med. Syst. 2020, 44, 27. [Google Scholar]
  31. Liu, J.; Bai, M.; Jiang, N.; Yu, D. Structural risk minimization of rough set-based classifier. Soft Comput. 2020, 24, 2049–2066. [Google Scholar] [CrossRef]
  32. Huang, Q.; Li, T.; Huang, Y.; Yang, X.; Fujita, H. Dynamic dominance rough set approach for processing composite ordered data. Knowl. Based Syst. 2020, 187, 104829. [Google Scholar] [CrossRef]
  33. Nabwey, H.A. An intelligent mining model for medical diagnosis of heart disease based on rough set data analysis. Int. J. Eng. Res. Technol. 2020, 13, 355–363. [Google Scholar] [CrossRef]
  34. Jain, K.; Kulkarni, S. Multi-reduct rough set classifier for computer-aided diagnosis in medical data. In Advancement of Machine Intelligence in Interactive Medical Image Analysis; Verma, O., Roy, S., Pandey, S., Mittal, M., Eds.; Springer: Singapore, 2020. [Google Scholar] [CrossRef]
  35. Abdolrazzagh-Nezhad, M.; Radgohar, H.; Salimian, S.N. Enhanced cultural algorithm to solve multi-objective attribute reduction based on rough set theory. Math. Comput. Simul. 2020, 170, 332–350. [Google Scholar] [CrossRef]
  36. Song, W.; Li, J.; Li, H.; Ming, X. Human factors risk assessment: An integrated method for improving safety in clinical use of medical devices. Appl. Soft Comput. 2020, 86, 105918. [Google Scholar] [CrossRef]
  37. Cekik, R.; Uysal, A.K. A novel filter feature selection method using rough set for short text data. Expert Sys. Apps. 2020, 160, 113691. [Google Scholar] [CrossRef]
  38. Bhuvaneshwarri, I.; Tamilarasi, A. Optimization of big data using rough set theory and data mining for textile applications. In Artificial Intelligence and Evolutionary Computations in Engineering Systems; Dash, S., Lakshmi, C., Das, S., Panigrahi, B., Eds.; Springer: Singapore, 2020; Volume 1056. [Google Scholar] [CrossRef]
  39. Mehdizadeh, M. Integrating ABC analysis and rough set theory to control the inventories of distributor in the supply chain of auto spare parts. Comput. Ind. Eng. 2020, 139, 105673. [Google Scholar] [CrossRef]
  40. Lazim, Y.M.; Rahman, M.N.A.; Mohamed, F. Clustering model of multimedia data by using rough sets theory. In Proceedings of the International Conference, Computer & Information Science (ICCIS), Kuala Lumpur, Malaysia, 12–14 June 2012; pp. 336–340. [Google Scholar]
  41. Cheng, C.H. A DWPT domain transform and COM statistics method combined with rough set for images classification. Multimed. Tools Appl. 2020, 79, 29845–29864. [Google Scholar] [CrossRef]
  42. Abosuliman, S.S.; Abdullah, S.; Qiyas, M. Three-way decisions making using covering based fractional Orthotriple fuzzy rough set model. Mathematics 2020, 8, 1121. [Google Scholar] [CrossRef]
  43. Tharwat, A.; Darwish, A.; Hassanien, A.E. Rough sets and social ski-driver optimization for drug toxicity analysis. Comput. Methods Prog. Biomed. 2020, 197, 105702. [Google Scholar] [CrossRef]
  44. Wu, S.; Yang, S.; Wang, Q. Classification of open pit iron mine rock mass blastability based on concept lattice and rough set. Geotech. Geol. Eng. 2020, 38, 449–458. [Google Scholar] [CrossRef]
  45. Zhou, J.; Zhang, B.; Tan, R.; Tseng, M.L.; Lin, R.C.W.; Lim, M.K. Using neighborhood rough set theory to address the smart elderly care in multi-level attributes. Symmetry 2020, 12, 297. [Google Scholar] [CrossRef]
  46. Yang, S.B.; Chen, T.L. Uncertain decision tree for bank marketing classification. J. Comput. Appl. Math. 2020, 371, 112710. [Google Scholar] [CrossRef]
  47. Bhargavi, M.V.; Mudunuru, V.R.; Veeramachaneni, S. Colon cancer stage classification using decision trees. In Data Engineering and Communication Technology; Raju, K., Senkerik, R., Lanka, S., Rajagopal, V., Eds.; Springer: Singapore, 2020; Volume 1079. [Google Scholar]
  48. Abdelsalam, A.M.; Elsheikh, A.; Chidambaram, S.; David, J.P.; Langlois, J.M.P. POLYBiNN: Binary inference engine for neural networks using decision trees. J. Signal Process. Syst. 2020, 92, 95–107. [Google Scholar] [CrossRef]
  49. Musharraf, M.; Smith, J.; Khan, F.; Veitch, B. Identifying route selection strategies in offshore emergency situations using decision trees. Reliab. Eng. Syst. Saf. 2020, 194, 106179. [Google Scholar] [CrossRef]
  50. Ghasemi, E.; Gholizadeh, H.; Adoko, A.C. Evaluation of rockburst occurrence and intensity in underground structures using decision tree approach. Engine. Comput. 2020, 36, 213–225. [Google Scholar] [CrossRef]
  51. Höppner, S.; Stripling, E.; Baesens, B.; Van den Broucke, S.; Verdonck, T. Profit driven decision trees for churn prediction. Eur. J. Oper. Res. 2020, 284, 920–933. [Google Scholar] [CrossRef]
  52. Golbayani, P.; Florescu, I.; Chatterjee, R. A comparative study of forecasting corporate credit ratings using neural networks, support vector machines, and decision trees. N. Am. J. Econ. Financ. 2020, 54, 101251. [Google Scholar] [CrossRef]
  53. Alam, F.; Mehmood, R.; Katib, I. Comparison of decision trees and deep learning for object classification in autonomous driving. In Smart Infrastructure and Applications; Mehmood, R., See, S., Katib, I., Chlamtac, I., Eds.; Springer: Cham, Switzerland, 2020. [Google Scholar] [CrossRef]
  54. Khan, Z.; Gul, A.; Perperoglou, A.; Miftahuddin, M.; Mahmoud, O.; Adler, W.; Lausen, B. Ensemble of optimal trees, random forest and random projection ensemble classification. Adv. Data Anal. Classif. 2020, 14, 97–116. [Google Scholar] [CrossRef]
  55. Xu, Z.; Shen, D.; Nie, T.; Kou, Y. A hybrid sampling algorithm combining M-SMOTE and ENN based on random forest for medical imbalanced data. J. Biomed. Inform. 2020, 107, 103465. [Google Scholar] [CrossRef]
  56. Struye, J.; Latré, S. Hierarchical temporal memory and recurrent neural networks for time series prediction: An empirical validation and reduction to multilayer perceptrons. Neurocomputing 2020, 396, 291–301. [Google Scholar] [CrossRef]
  57. Jung, S.; Moon, J.; Park, S.; Rho, S.; Baik, S.W.; Hwang, E. Bagging ensemble of multilayer perceptrons for missing electricity consumption data imputation. Sensors 2020, 20, 1772. [Google Scholar] [CrossRef]
  58. Heidari, A.A.; Faris, H.; Mirjalili, S.; Aljarah, I.; Mafarja, M. Ant lion optimizer: Theory, literature review, and application in multi-layer perceptron neural networks. In Nature-Inspired Optimizers; Mirjalili, S., Song, D.J., Lewis, A., Eds.; Springer: Cham, Switzerland, 2020; Volume 811. [Google Scholar] [CrossRef]
  59. Rather, S.A.; Bala, P.S. A hybrid constriction coefficient-based particle swarm optimization and gravitational search algorithm for training multi-layer perceptron. Int. J. Intell. Comput. Cybern. 2020, 13, 129–165. [Google Scholar] [CrossRef]
  60. Lorencin, I.; Anđelić, N.; Španjol, J.; Car, Z. Using multi-layer perceptron with Laplacian edge detector for bladder cancer diagnosis. Artif. Intell. Med. 2020, 102, 101746. [Google Scholar] [CrossRef]
  61. Vapnik, V. The Nature of Statistical Learning Theory; Springer: New York, NY, USA, 1995. [Google Scholar]
  62. Dhara, S.; Dang, T.; Parial, K.; Lu, X.X. Accounting for uncertainty and reconstruction of flooding patterns based on multi-satellite imagery and support vector machine technique: A case study of Can Tho City, Vietnam. Water 2020, 12, 1543. [Google Scholar] [CrossRef]
  63. Ezzahar, J.; Ouaadi, N.; Zribi, M.; Elfarkh, J.; Aouade, G.; Khabba, S.; Er-Raki, S.; Chehbouni, A.; Jarlan, L. Evaluation of backscattering models and support vector machine for the retrieval of bare soil moisture from sentinel-1 data. Remote Sens. 2020, 12, 72. [Google Scholar] [CrossRef]
  64. Li, L.L.; Zhao, X.; Tseng, M.L.; Tan, R.R. Short-term wind power forecasting based on support vector machine with improved dragonfly algorithm. J. Clean. Prod. 2020, 242, 118447. [Google Scholar] [CrossRef]
  65. Richhariya, B.; Tanveer, M. A reduced universum twin support vector machine for class imbalance learning. Pattern Recognit. 2020, 102, 107150. [Google Scholar] [CrossRef]
  66. Wang, M.; Chen, H. Chaotic multi-swarm whale optimizer boosted support vector machine for medical diagnosis. Appl. Soft Comput. 2020, 88, 105946. [Google Scholar] [CrossRef]
  67. Simsek, S.; Kursuncu, U.; Kibis, E.; AnisAbdellatif, M.; Dag, A. A hybrid data mining approach for identifying the temporal effects of variables associated with breast cancer survival. Expert Sys. Apps. 2020, 139, 112863. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.