C-MWCAR: Classification Based on Multiple Weighted Class Association Rules
Abstract
:1. Introduction
2. Background
3. Materials and Methods
3.1. Formalization of Basic Associative Classifier
3.2. Implementation of the Proposed C-MWCAR
3.2.1. Workflow of C-MWCAR
3.2.2. Dictionary Order-Based CAR Mining Algorithm (DOCMA)
Algorithm 1: Dictionary order-based CAR mining algorithm (DOCMA) |
Input: User-specified support (userSup), user-specified confidence (userConf). All the frequent 1-items, i.e., L1. |
Output: The complete set of CARs, which is denoted as RuleSet. |
1. RuleSet = Ø; CCARk = Ø; ICCARk = Ø; CARk = Ø; Ck= Ø; // initialization |
2. for (k = 2; Lk−1 ≠ Ø; k++) do |
3. // delete frequent (k – 1)-itemSets that cannot generate frequent k-itemSets |
4. for (I = 1; I ≤ Lk−1.size(); i++) do |
5. numTemp = the number of k-supersequence of in Lk−1 |
6. if (numTemp < k – 1) then |
7. delete all the k-supersequence of from Lk−1 |
8. end if |
9. end for |
10 // optimized connection of frequent (k – 1)-itemSets |
11. for (I = 1; I < Lk−1.size(), i++) do |
12. for (j = 2; j = Lk−1.size(); j++)do |
13. if ( and are connectable) then |
14. Ck = Ck + ⋈ |
15. else |
16. break; |
17. end if |
18. end for |
19. end for |
20. // select the CCARk in current pass |
21. CCARk = {c ∈ Ck | c has only one item derived from class attribute} |
22. // optimized pruning of ICCARk |
23. for (I = 1; I ≤ CCARk.size(); i++) do |
24. for (j = 1; j ≤ .(k – 1)-subsequence.size(); j++)do |
25. if (the jth (k – 1)-subsequence of is not in Lk−1) then |
26. ICCARk = ICCARk + CCARi |
27. break; |
28. end if |
29. end for |
30. end for |
31. // remove some ICCARs from CCARk in advance |
32. FCCARk = CCARk – ICCARk |
33. // select the CARk in current pass based the values of sup and conf |
34. for (I = 1; i≤CCARk.size(); i++) do |
35. if (.sup ≥ userSup && .conf ≥ userConf) then |
36. CARk = CARk + |
37. end if |
38. end for |
39. RuleSet = RuleSet∪ CARk |
40. end for |
3.2.3. Branch-Based CAR Selection Algorithm (BCSA)
Algorithm 2: Branch-based CAR selection algorithm (BCSA) |
Input:
|
Output:
|
1. branches = RuleSet.divide(k) // divide RuleSet into k-branches |
2. // select CARs with biggest Conf or biggest Sup from each k-branch |
3. for (i = 1; i ≤ branches.size(); i++) do |
4. rCars = rCars + branches(i).biggestConf() //rCars is initially empty |
5. rCars = rCars + branches(i).biggestSup() |
6. end for |
7. sortedCars = sort(rCars) // sortedCars is sorted representative CARs |
8. FinalCARs = Ø, CV = 0; // CV is the computed coverage value |
9. for (i = 1; i ≤ sortedCars.size(); i++) do |
10. Num = 0; |
11. for (j = 1; j < T.size(); j++) do |
12. if (T(j) matches sortedCars(i)) then |
13. Num++; |
14. end if |
15. end for |
16. if (Num > 0 && sortedCars(i).redundancy < R && CV < C) then |
17. FinalCARs = FinalCARs + sortedCars(i) |
18. end if |
19. update(CV); //update the current coverage value |
20. if (CV ≥ C) then |
21. break; |
22. end if |
23. end for |
24. return (FinalCARs); |
3.2.4. Multiple Weighted CARs-Based Classifier (MWCC)
4. C-MWCAR Applied to Hypertension Diagnosis
4.1. Data Collection and Preprocessing
4.2. Feature Extraction and Discretization
4.2.1. Feature Extraction
4.2.2. Feature Discretization
4.3. Hypertension Classification
5. Experimental Evaluation
5.1. Experimental Setup
5.2. Evaluation Results
5.2.1. Classification Performance Analysis
5.2.2. Efficiency of the Proposed DOCMA
5.2.3. Parameter Optimization
5.2.4. Visualization of the Mined CARs
6. Discussion
7. Conclusions and Future Work
- To accelerate the extraction of CARs, a dictionary order-based CAR mining algorithm (DOCMA) was designed by optimizing the Apriori algorithm [26]. To be specific, we designed and proved a set of theorems and corollaries, by which the most time-consuming steps, i.e., the connection of frequent k − 1 itemSets and the pruning of infrequent k itemSets, can be greatly simplified. In addition, it can also pre-prune the set of frequent k − 1 itemSets, which further reduces the number of connection operations required. Experimental results showed that the proposed DOCMA saved up to 81.2% of time and outperformed three state-of-the-art CAR mining methods.
- In order to deal with the numerous CARs, a branch-based CAR selection algorithm (BCSA) was designed to select the most representative and concise CARs. It first grouped all the CARs into non-overlapping branches, and then selected the most representative CARs from each of them, which made the selected CARs more representative. After that, we designed two metrics, named coverage and redundancy, by which the CARs contained little new information but much redundant information was filtered out, making the final set of selected CARs more concise and effective.
- To obtain higher classification performance, we proposed a multiple weighted CARs-based classifier (MWCC) that classifies an instance by using multiple weighted CARs. Specifically, it first picks out a set of CARs that are most similar to the given instance, then it fuses their weighted classification abilities to compute the final classification result. This strategy can not only solve the problem of no matching CARs but can also achieve more accurate and robust classification results.
- We applied the proposed C-MWCAR to a real classification task, i.e., hypertension diagnosis. Experimental results showed that C-MWCAR outperformed four state-of-the-art baselines, and achieved 93.3%, 93.8%, and 92.7% in terms of accuracy, sensitivity, and specificity, respectively. In addition, we found that the generated CARs can intuitively reflect the patient’s physiological status, which signifies that C-MWCAR is interpretable to some extent.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Liu, F.; Zhou, X.; Wang, Z.; Ni, H.; Wang, T. OSA-weigher: An automated computational framework for identifying obstructive sleep apnea based on event phase segmentation. J. Ambient. Intell. Humaniz. Comput. 2019, 10, 1937–1954. [Google Scholar] [CrossRef]
- Ma, J.; Sun, L.; Wang, H.; Zhang, Y.; Aickelin, U. Supervised anomaly detection in uncertain pseudoperiodic data streams. ACM Trans. Internet. Technol. 2016, 16, 4. [Google Scholar] [CrossRef] [Green Version]
- Ren, S.; He, K.; Girshick, R.; Zhang, X.; Sun, J. Object detection networks on convolutional feature maps. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1476–1481. [Google Scholar] [CrossRef] [Green Version]
- Wu, Z.; Xu, Q.; Li, J.; Fu, C.; Xuan, Q.; Xiang, Y. Passive indoor localization based on csi and naive bayes classification. IEEE Trans. Syst. Man Cybern. Syst. 2017, 48, 1566–1577. [Google Scholar] [CrossRef]
- Paul, S.K.; Nine, M.S.Q.Z.; Hasan, M.; Amin, M.A. Cognitive Task Classificaiton from Wireless EEG. In Proceedings of the 8th International Conference, BIH 2015, London, UK, 30 August–2 September 2015; pp. 13–22. [Google Scholar]
- Liu, F.; Zhou, X.; Cao, J.; Wang, Z.; Wang, H.; Zhang, Y. Arrhythmias classification by integrating stacked bidirectional LSTM and two-dimensional CNN. In Proceedings of the 23rd Pacific-Asia Conference, PAKDD 2019, Macau, China, 14–17 April 2019; pp. 136–149. [Google Scholar]
- Liu, B.; Hsu, W.; Ma, Y. Integrating classification and association rule mining. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 27–31 August 1998; pp. 27–31. [Google Scholar]
- Li, W.; Han, J.; Pei, J. CMAR: Accurate and efficient classification based on multiple class-association rules. In Proceedings of the 2001 IEEE International Conference on Data Mining, California, CA, USA, 29 November–2 December 2001; pp. 369–376. [Google Scholar]
- Wen, J.; Zhong, M.; Wang, Z. Activity recognition with weighted frequent patterns mining in smart environments. Expert Syst. Appl. 2015, 42, 6423–6432. [Google Scholar] [CrossRef] [Green Version]
- Veloso, A.; Meira, W.; Zaki, M.J. Lazy associative classification. In Proceedings of the Sixth International Conference on Data Mining (ICDM’06), Hong Kong, China, 12–18 December 2006; pp. 645–654. [Google Scholar]
- Kliegr, T.; Kuchař, J.; Sottara, D.; Vojíř, S. Learning business rules with association rule classifiers. In Lecture Notes in Computer Science, 1st ed.; Springer: Cham, Germany, 2014; Volume 8620, pp. 236–250. [Google Scholar]
- Nguyen, L.T.; Vo, B.; Hong, T.-P.; Thanh, H.C. CAR-Miner: An efficient algorithm for mining class-association rules. Expert Syst. Appl. 2013, 40, 2305–2311. [Google Scholar] [CrossRef]
- Huynh-Thi-Le, Q.; Le, T.; Vo, B.; Le, B. An efficient and effective algorithm for mining top-rank-k frequent patterns. Expert Syst. Appl. 2015, 42, 156–164. [Google Scholar] [CrossRef]
- Kargarfard, F.; Sami, A.; Ebrahimie, E. Knowledge discovery and sequence-based prediction of pandemic influenza using an integrated classification and association rule mining (CBA) algorithm. J. Biomed. Inform. 2015, 57, 181–188. [Google Scholar] [CrossRef] [Green Version]
- Park, S.H.; Reyes, J.A.; Gilbert, D.R.; Kim, J.W.; Kim, S. Prediction of protein-protein interaction types using association rule based classification. BMC Bioinf. 2009, 10, 36. [Google Scholar] [CrossRef] [Green Version]
- Soni, J.; Ansari, U.; Sharma, D. Intelligent and effective heart disease prediction system using weighted associative classifiers. Int. J. Comput. Sci. Eng. 2011, 3, 2385–2392. [Google Scholar]
- Soni, S.; Vyas, O. Using associative classifiers for predictive analysis in health care data mining. Int. J. Comput. Appl. 2010, 4, 33–37. [Google Scholar] [CrossRef]
- Nguyen, D.; Vo, B.; Le, B. Efficient strategies for parallel mining class association rules. Expert Syst. Appl. 2014, 41, 4716–4729. [Google Scholar] [CrossRef]
- Poddar, M.G.; Kumar, V.; Sharma, Y.P. Linear-nonlinear heart rate variability analysis and SVM based classification of normal and hypertensive subjects. J. Electrocardiol. 2013, 46, e25. [Google Scholar] [CrossRef]
- Ni, H.; Cho, S.; Mankoff, J. Automated recognition of hypertension through overnight continuous HRV monitoring. J. Ambient Intell. Hum. Comput. 2018, 9, 2011–2023. [Google Scholar] [CrossRef]
- Melillo, P.; Izzo, R.; Orrico, A. Automatic prediction of cardiovascular and cerebrovascular events using heart rate variability analysis. PLoS ONE 2015, 10, e0118504. [Google Scholar] [CrossRef] [Green Version]
- Zhao, M.; Cheng, X.; He, Q. An algorithm of mining class association rules. In Proceedings of the 4th International Symposium on Intelligence Computation and Applications, ISICA 2009, Huangshi, China, 23–25 October 2009; pp. 269–275. [Google Scholar]
- Vo, B.; Le, B. A novel classification algorithm based on association rules mining. In Proceedings of the Pacific Rim Knowledge Acquisition Workshop, PKAW 2008, Hanoi, Vietnam, 15–16 December 2008; pp. 61–75. [Google Scholar]
- Nguyen, L.T.; Vo, B.; Hong, T.-P.; Thanh, H.C. Classification based on association rules: A lattice-based approach. Expert Syst. Appl. 2012, 39, 11357–11366. [Google Scholar] [CrossRef]
- Dong, G.; Zhang, X.; Wong, L.; Li, J. CAEP: Classification by aggregating emerging patterns. In Proceedings of the Second International Conference, DS′99, Tokyo, Japan, 6–8 December 1999; pp. 30–42. [Google Scholar]
- Agrawal, R.; Srikant, R. Fast algorithms for mining association rules. In Proceedings of the 20th VLDB, Santiago de, Chile, Chile, 12–15 September 1994; pp. 487–499. [Google Scholar]
- World Health Organization. A global brief on hypertension: Silent killer, global public health crisis. In World Health Day 2013; WHO: Geneva, Switzerland, 2013; pp. 1–39. [Google Scholar]
- Poddar, M.G.; Kumar, V.; Sharma, Y.P. Automated diagnosis of coronary artery diseased patients by heart rate variability analysis using linear and non-linear methods. J. Med. Eng. Technol. 2015, 39, 331–341. [Google Scholar] [CrossRef]
- Poddar, M.G.; Kumar, V.; Sharma, Y.P. Heart rate variability based classification of normal and hypertension cases by linear-nonlinear method. Def. Sci. J. 2014, 64, 542–548. [Google Scholar] [CrossRef]
- Sommermeyer, D.; Zou, D.; Eder, D.N.; Hedner, J.; Ficker, J.H.; Randerath, W.; Priegnitz, C.; Penzel, T.; Fietze, I.; Sanner, B.; et al. The use of overnight pulse wave analysis for recognition of cardiovascular risk factors and risk: A multicentric evaluation. J. Hypertens. 2014, 32, 276–285. [Google Scholar] [CrossRef]
- Tejera, E.; Areias, M.J.; Rodrigues, A.I.; Nieto-Villar, J.M.; Rebelo, I. Blood pressure and heart rate variability complexity analysis in pregnant women with hypertension. Hypertens. Pregnancy 2012, 31, 91–106. [Google Scholar] [CrossRef]
- Yue, W.-W.; Yin, J.; Chen, B.; Zhang, X.; Wang, G.; Li, H.; Chen, H.; Jia, R.-Y. Analysis of heart rate variability in masked hypertension. Cell Biochem. Biophys. 2014, 70, 201–204. [Google Scholar] [CrossRef] [PubMed]
- Liu, F.; Zhou, X.; Wang, Z. Identifying Obstructive Sleep Apnea by Exploiting Fine-Grained BCG Features Based on Event Phase Segmentation. In Proceedings of the 2016 IEEE 16th International Conference on Bioinformatics and Bioengineering (BIBE), Taichung, Taiwan, 31 October–2 November 2016; pp. 293–300. [Google Scholar]
- Inan, O.T.; Migeotte, P.-F.; Park, K.-S.; Etemadi, M.; Tavakolian, K.; Casanella, R.; Zanetti, J.; Tank, J.; Funtova, I.; Prisk, G.K.; et al. Ballistocardiography and seismocardiography: A review of recent advances. IEEE J. Biomed. Health Inform. 2014, 19, 1414–1427. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Li, W.; Wang, R.; Huang, D. Assessment of Micro-movement Sensitive Mattress Sleep Monitoring System (RS611) in the detection of obstructive sleep apnea hypopnea syndrome. Chin. J. Gerontol. 2015, 35, 1160–1162. [Google Scholar]
- Vollmann, D.; Sossalla, S.; Schroeter, M.R. Renal artery ablation instead of pulmonary vein ablation in a hypertensive patient with symptomatic, drug-resistant, persistent atrial fibrillation. Clin. Res. Cardiol. 2013, 102, 315–318. [Google Scholar] [CrossRef] [Green Version]
- MAP Health Watcher. Available online: https://maphealthwatch.com/ (accessed on 20 December 2018).
- Singh, J.P.; Larson, M.G.; Tsuji, H.; Evans, J.C.; O’Donnell, C.J.; Levy, D. Reduced heart rate variability and new-onset hypertension: Insights into pathogenesis of hypertension: The Framingham Heart Study. Hypertension 1998, 32, 293–297. [Google Scholar] [CrossRef] [Green Version]
- Shyma, P.; Pal, G.K.; Habeebullah, S.; Shyjus, P. Decreased total power of HRV with increased LF power in early part of pregnancy predicts development PIH in Indian population. Biomedicine 2008, 28, 104–107. [Google Scholar]
- Mussalo, H.; Vanninen, E.; Ikäheimo, R.; Laitinen, T.; Laakso, M.; Länsimies, E.; Hartikainen, J. Heart rate variability and its determinants in patients with severe or mild essential hypertension. Clin. Physiol. Funct. Imaging 2001, 21, 594–604. [Google Scholar] [CrossRef]
Symbols | Definition |
---|---|
li[j] | The jth item of the ith itemSet |
Lk | The set of all the frequent k-itemSets |
The ith element of the Lk | |
Ck | The set of the candidate k-itemSets |
The ith element of the set of the Ck | |
CCARk | The set made up of all the CCARs that contain k items |
ICCARk | The set made up of all the ICCARs that contain k items |
CARk | The set made up of all the CARs that contain k items. |
RuleSet | The complete set of the extracted CARs |
D | The dataset, with Di representing the ith instance in D |
Y | The class label set, with Yi representing the ith label in Y |
I | The set of the items, with Ii representing the ith items in I |
Subject Information | Hypertensive | Normotensive |
---|---|---|
Number | 61 | 67 |
Sex (Male/Female) | 33/38 | 35/32 |
Age (years | 55.6 ± 7.9 | 53.2 ± 9.2 |
Heart Rate (bpm) | 77.1 ± 9.2 | 73.6 ± 8.3 |
Body Mass Index (kg/m2) | 24.3 ± 3.6 | 23.7 ± 3.3 |
Systolic blood pressure (mmHg) | 155.6 ± 11.2 | 112.1 ± 15.7 |
Diastolic Blood Pressure (mmHg) | 103.6 ± 8.2 | 74.4 ± 6.3 |
Type | Features | Definition | Description |
---|---|---|---|
TD 1 | Max | The maximum value of the heart-beat intervals | To describe the distribution of the heat-beat intervals |
Min | The minimum value of the heart-beat intervals | To describe the distribution of the heat-beat intervals | |
MEAN | The mean value of the heart-beat intervals | To describe the distribution of the heart-beat intervals | |
MEDIAN | The median value of the heart-beat intervals | To describe the distribution of the heart-beat intervals | |
SDNN | The SD 4 of the heart-beat intervals | To measure the overall variation of the heart-beat interval sequence | |
RMSSD | The root mean square of the heart-beat intervals | To reflect the rapid variation of the heart-beat interval sequence | |
PNN50 | The percentage of heart-beat intervals longer than 50 ms | To describe the degree of change in the heart-beat interval sequence | |
FD 2 | vLF | The power in the 0.0033 Hz–0.04 Hz band | To reflect vascular mechanisms caused by negative emotions |
LF | The power in the 0.04 Hz–0.15 Hz band | To reflect sympathetic modulation of heart rate | |
HF | The power in the 0.15 Hz band | To reflect parasympathetic activity | |
LF/HF | The ratio of power in the LF and HF bands | To reflect the balance of sympathetic nerve and parasympathetic nerve | |
ND 3 | SampEn | The Sample Entropy value with r = 0.15 × SD | To investigate the complexity of heart-beat interval sequence |
DFA | The short-term coefficient of detrended fluctuation analysis | To investigate the inner correlation of successive heart-beat intervals | |
SDS | The SD of the short-term variability of the Poincare plot | To reflect the short-term variability of the heart-beat intervals | |
SDL | The SD of the long-term variability of the Poincare plot | To reflect the long-term variability of the heart-beat intervals |
Type | Features | Normotensive | Hypertensive |
---|---|---|---|
Time domain features | Max | 1.1704 ± 0.1596 | 1.0767 ± 0.1804 |
Min | 0.6800 ± 0.1296 | 0.7206 ± 0.1152 | |
MEAN | 0.8977 ± 0.1034 | 0.9010 ± 0.1301 | |
MEDIAN | 0.8860 ± 0.1311 | 0.9102 ± 0.1405 | |
SDNN | 0.1034 ± 0.0594 | 0.0539 ± 0.0401 | |
RMSSD | 0.1351 ± 0.0854 | 0.0698 ± 0.0628 | |
PNN50 | 0.9496 ± 0.0331 | 0.8689 ± 0.0845 | |
Frequency domain features | vLF | 5.7261 ± 4.0433 | 3.6910 ± 3.6003 |
LF | 3.3063 ± 3.2327 | 1.1621 ± 1.7799 | |
HF | 6.0722 ± 6.3137 | 1.9928 ± 3.7635 | |
LF/HF | 0.6995 ± 0.3277 | 0.9367 ± 0.6504 | |
Non-linear domain features | SampEn | 0.63 ± 0.14 | 0.6831 ± 0.1538 |
DFA | 0.7 ± 0.2 | 0.65 ± 0.23 | |
SDS | 22.99 ± 2.56 | 18.67 ± 1.76 | |
SDL | 51.87 ± 6 | 37.39 ± 3.42 |
Type | Por.* 1 | Pro. 2 | Pro. 3 | Por. 4 | Por. 5 | Por. 6 | Por. 7 | Por. 8 | Por. 9 | Por. 10 | SUM |
---|---|---|---|---|---|---|---|---|---|---|---|
Hypertensive | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 7 | 61 |
Normotensive | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 6 | 68 |
Subtotal | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 128 |
Combinations | ACC | SEN | SPE |
---|---|---|---|
NONE | 78.8% | 79.6% | 78.3% |
BCSA | 83.8% | 85.2% | 83.7% |
MWCC | 85.9% | 87.7% | 84.6% |
BCSA + MWCC | 93.3% | 93.8% | 92.7% |
Algorithms | Strategies Used to Mine CARs | Time Overhead 1 | ||
---|---|---|---|---|
Sup = 0.2 | Sup = 0.25 | Sup = 0.3 | ||
DOCMA | Use the Apriori-like algorithm with optimized strategies to generate rules | 3976.3 s | 1369.7 s | 129.9 s |
CAR-Miner [12] | Mine CARs based on the modified ECR-tree with Obidset | 4123.5 s | 2156.8 s | 421.8 s |
ECR-CARM [22] | Use equivalence class rule tree (ECR-tree) to generate candidate rules | 6425.8 s | 3482.5 s | 761.3 s |
PMCAR [24] | Mine CARs with parallel and distributed approaches | 3379.4 s | 1624.4 s | 249.5 s |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, G.; Liu, F.; Wu, C.; Yao, Y.; Wu, G.; Wang, Z.; Zhang, Y. C-MWCAR: Classification Based on Multiple Weighted Class Association Rules. Appl. Sci. 2023, 13, 8082. https://doi.org/10.3390/app13148082
Li G, Liu F, Wu C, Yao Y, Wu G, Wang Z, Zhang Y. C-MWCAR: Classification Based on Multiple Weighted Class Association Rules. Applied Sciences. 2023; 13(14):8082. https://doi.org/10.3390/app13148082
Chicago/Turabian StyleLi, Gui, Fan Liu, Cheng Wu, Yuan Yao, Guangxin Wu, Zhu Wang, and Yanchun Zhang. 2023. "C-MWCAR: Classification Based on Multiple Weighted Class Association Rules" Applied Sciences 13, no. 14: 8082. https://doi.org/10.3390/app13148082
APA StyleLi, G., Liu, F., Wu, C., Yao, Y., Wu, G., Wang, Z., & Zhang, Y. (2023). C-MWCAR: Classification Based on Multiple Weighted Class Association Rules. Applied Sciences, 13(14), 8082. https://doi.org/10.3390/app13148082