A Speedy Cardiovascular Diseases Classifier Using Multiple Criteria Decision Analysis

Each year, some 30 percent of global deaths are caused by cardiovascular diseases. This figure is worsening due to both the increasing elderly population and severe shortages of medical personnel. The development of a cardiovascular diseases classifier (CDC) for auto-diagnosis will help address solve the problem. Former CDCs did not achieve quick evaluation of cardiovascular diseases. In this letter, a new CDC to achieve speedy detection is investigated. This investigation incorporates the analytic hierarchy process (AHP)-based multiple criteria decision analysis (MCDA) to develop feature vectors using a Support Vector Machine. The MCDA facilitates the efficient assignment of appropriate weightings to potential patients, thus scaling down the number of features. Since the new CDC will only adopt the most meaningful features for discrimination between healthy persons versus cardiovascular disease patients, a speedy detection of cardiovascular diseases has been successfully implemented.


Introduction
Electrocardiogram (ECG) signals, characterized by P waves, Q waves, S waves, QRS complexes and T waves, are important information for cardiovascular disease diagnosis by cardiologists. Such a diagnosis requires the development of a cardiovascular diseases classifier (CDC). Generally, a CDC mainly comprises feature vectors extraction and building a classifier via machine learning algorithms like an Artificial Neural Network or Support Vector Machine. Features can be divided into three categories: non-fiducial features, fiducial features, and hybrid features. Non-fiducial features normally refer to features that do not characterize the ECG signals using P waves, Q waves, S waves, QRS complexes and T waves [1][2][3][4][5], and vice versa for fiducial features [6,7]. Hybrid features refer to feature vectors constructed by both non-fiducial and fiducial features [8][9][10].
In this investigation, a Support Vector Machine (SVM) is utilized to construct the CDC for the four most common types of cardiovascular diseases, namely bundle branch block, myocardial infarction, heart failure, and dysrhythmia. Seven criteria, including overall accuracy (OA), sensitivity (Se), specificity (Sp), area under the curve (AUC), training time (Tr), testing time (Te), and number of features (Nf), which are features indicative of the speed and accuracy of detection, are used as the essential parameters to compute the analytic hierarchy process (AHP) score to aid the multiple criteria decision analysis (MCDA) for the evaluation of the optimal CDC. Traditional work usually aims at the highest overall accuracy and/or lowest testing time. In reality, every end user has to specify the weights between criteria. It is not uncommon to find a ratio setting by intuition or simply a direct 1:1 assignment is adopted. It is noted that the practical needs of volunteers are neglected or not targeted. In the new method, assignments of criteria are devised for AHP analysis. The incorporation of AHP analysis in the classifier enables the consideration of the need of volunteers. This letter is organized as follows: the design of an optimal CDC is presented in Section 2. Multiple criteria decision analysis of the optimal CDC is given in Section 3. In Section 4, the AHP is formulated and a performance score is obtained from which the performance is analyzed and compared to traditional schemes. Finally, conclusions are drawn in Section 5.

Data Preprocessing and Features Construction
The data is obtained from an online and open access database [11,12]. A group of healthy candidates as well as candidates with the four most common types of cardiovascular diseases are selected: 52 candidates from health control, 15 bundle branch block candidates, 148 myocardial infarction candidates, 18 heart failure candidates and 14 dysrhythmia candidates. The unequal sample size in each class will lead to a bias of the SVM classifier [13]. The Lead I ECG signal is further partitioned into 30 s sub-signals to obtain 500 samples of healthy candidates and 125 samples of unhealthy candidates (of each type of cardiovascular disease). This process aims at equalizing the number of samples in each class (healthy and unhealthy). Before the introduction of these four diseases, the notations are briefed. Denote RR-interval to be the consecutive R points between consecutive ECG signals, QRS complex is the time between Q wave and S wave where point R is between Q wave and S wave. Similarly, QT interval refers to the time between point Q wave and T wave. The background of these four diseases is presented as follows: (i) Myocardial Infarction: Irregular heartbeat and thus irregular RR-interval may occur in the ECG signal of the patients [14]; (ii) Bundle Branch Block: Patients have QRS complex with value exceeding 0.12 ms [15]; (iii) Dysrhythmia: The heartbeat can be more than 100 beats per minute or less than 60 beats per minute. Thus, RR-interval is different from the normal ECG signal. Also, the QT interval may increase if the type of cardiovascular disease is ventricular arrhythmias [16]; (iv) Heart Failure: A finding of prolonged QT interval in the ECG signals of the patients [17].
As a result, Q wave, R wave and S wave, QRS complex, and RR-interval are representative features to identify between healthy persons versus cardiovascular patients. The feature vector consists of 10 features using the average and standard deviation of these five parameters. Before detecting and computing the features, the ECG signals will undergo data preprocessing [18]. The maximum frequency of an ECG signal is typically less than 60 Hz, thus a bandpass filter with cutoff frequencies at 1 Hz and 60 Hz is implemented. A derivative filter is then applied to sharpen the Q, R, and S wave. Finally, signal squaring and sliding window integration are utilized for the location of Q, R, and S wave.

Cardiovascular Diseases Classifier Construction
The CDC is constructed by employing SVM with a 10-dimensional feature vector. This algorithm uses a Lagrange Multiplier with a set of support vectors, a set of weighting and an offset bias [19,20]. This report focuses on the design of CDC.
The performance of CDC is dictated by OA, Se, Sp, AUC, Tr, Te, and Nf. It directly classifies the ECG signal into healthy (negative response) candidates and unhealthy (positive response) candidates. OA, Se, Sp, and AUC are related to the accuracy of CDC. Tr is the time required to train the CDC and Te is the time needed to detect the ECG signal. In this investigation, CDC will be trained up and validated with the ECG datasets. For the analysis of positive response-Class 0, 500 healthy patients are used. For the analysis of positive response-Class 1, 125 bundle branch block patients, 125 myocardial infarction patients, 125 heart failure patients and 125 dysrhthmia patients are retrieved from the database. Table 1 lists the datasets for CDC with binary classifier. The CDC utilizes a 10-fold cross validation for performance evaluation [21] and the polynomial kernel function (third order) is utilized for SVM analysis. There is a total of 1023 combinations (  (iii) Feature vector: The maximum dimensionality is 10, which consists of: {Q wave average, Q wave standard deviation, R wave average, R wave standard deviation, S wave average, S wave standard deviation, QRS complex average, QRS complex standard deviation, RR-interval mean, RR-interval standard deviation}; (iv) Kernel function: 3rd order polynomial; (v) Fold of cross validation: Ten-fold 1023 classifiers are constructed in 1023 configurations; the results are tabulated in Table 2.

Multiple Criteria Decision Analysis of the Optimal CDC
In Table 2, seven criteria, namely OA, Se, Sp, AUC, Tr, Te, and Nf, are employed for performance evaluation of the 1023 scenarios. Multiple criteria decision making (MCDM) has been utilized in many areas since the 1990s [22]. It entails using the particular characteristics of cardiovascular diseases. By allocating appropriate weightings, the analytic hierarchy process (AHP) is adopted to evaluate and analyze the best scenarios among the 1023 scenarios investigated. The allocation of weightings confronts the feedback from an AHP analysis of 200 volunteers from which a pairwise comparison 7 × 7 matrix Am (m = 1, …, 200) is formulated. It is intuitively understood that Te should be as low as possible and that the accuracy should be kept to an acceptable level. Since the speed of detection is the prime factor of importance, the analysis on MCDA reveals that high weightings should be assigned to OA, Se, Sp, AUC, Te. These five parameters are referred as primary parameters. While Nf is typically preferred to be small for speedy detection, it is noted that Tr will not affect the detection time. Hence Nf and Tr are classified as the secondary parameters.
The volunteers are required to fill in the am,ij , where i and j are between 1 and 7, in Table 3. The AHP based MCDA CDC is referred as the new classifier (NC). Traditional classifiers (TC) in [3,7,8] are also evaluated. Both the NC and the TC are applied to the three feature groups (non-fiducially features, fiducially features and hybrid features in [3,7,8]. The performance comparison between the NC and the TC is tabulated in Table 4. Based on the discussion for AHP formulation, the assignment of values of am,ij are based on the following guidelines: (i) Write 1 if equal importance of i and j; (ii) Write 3 if i is slightly more important than j; (iii) Write 5 if i is more important than j; (iv) Write 7 if i is strongly more important than j; (iv) Write 9 if i is absolutely more important than j.  The pairwise comparison 7 × 7 matrix Am is then normalized, and Anormm can be obtained by modifying the matrix entries am,ij in Am into matrix entries anormm,ij in Anormm: To avoid inconsistency in the construction of pairwise comparison matrices, the optimal CDC is concluded from the highest value of AHPq [23]. It is evaluated that the optimal CDC is obtained from scenario f652, with feature vector composes of average of Q, standard deviation of Q, standard deviation of S, average of QRS mean, standard deviation of QRS, average of RR-interval, and standard deviation of RR-interval, with AHP652 as follows: OA = 0.988, Se = 0.992, Sp = 0.985, AUC = 0.982, Tr = 4.5 s, Te = 2.8 s, Nf = 7.
The analysis reveals that in the NC, the speed of detection has been increased by 30%-40% while the accuracy is retained at ~99%-99.5% of the TC. It is seen that there the reduction of OA, Se, and Sp are less than 1%. Thus the AHP based MCDA CDC is a reliable and speedy detection scheme for cardiovascular diseases.

Conclusions
In this letter, an optimal cardiovascular diseases classifier (CDC) has been proposed and implemented by using an analytic hierarchy process (AHP) to facilitate multiple criteria decision analysis (MCDA). The four most common types of cardiovascular diseases, namely bundle branch block, myocardial infarction, heart failure, and dysrhythmia are considered. Seven criteria, namely OA, Se, Sp, AUC, Tr, Te, and Nf are carefully considered and chosen to be the criteria for deriving the AHP score of MCDA to achieve the optimal CDC. The optimal CDC, the new classifier, achieves the following scores: OA = 0.988, Se = 0.992, Sp = 0.985, AUC = 0.982, Tr = 4.5 s, Te = 2.8 s, Nf = 7. Analysis and comparison with previous works show that the speed of detection cardiovascular diseases has been increased by 30%-40% while the accuracy is retained at ~99%-99.5% of traditional classifiers. In conclusion, the AHP based MCDA CDC is a reliable and speedy detection scheme for cardiovascular diseases.