Detection of Coronary Artery Disease Using Multi-Domain Feature Fusion of Multi-Channel Heart Sound Signals

Heart sound signals reflect valuable information about heart condition. Previous studies have suggested that the information contained in single-channel heart sound signals can be used to detect coronary artery disease (CAD). But accuracy based on single-channel heart sound signal is not satisfactory. This paper proposed a method based on multi-domain feature fusion of multi-channel heart sound signals, in which entropy features and cross entropy features are also included. A total of 36 subjects enrolled in the data collection, including 21 CAD patients and 15 non-CAD subjects. For each subject, five-channel heart sound signals were recorded synchronously for 5 min. After data segmentation and quality evaluation, 553 samples were left in the CAD group and 438 samples in the non-CAD group. The time-domain, frequency-domain, entropy, and cross entropy features were extracted. After feature selection, the optimal feature set was fed into the support vector machine for classification. The results showed that from single-channel to multi-channel, the classification accuracy has increased from 78.75% to 86.70%. After adding entropy features and cross entropy features, the classification accuracy continued to increase to 90.92%. The study indicated that the method based on multi-domain feature fusion of multi-channel heart sound signals could provide more information for CAD detection, and entropy features and cross entropy features played an important role in it.


Introduction
Coronary artery disease (CAD) has been the leading cause of death in cardiovascular disease globally [1] and is still increasing at an alarming rate. Therefore, there is an urgency to develop convenient and accurate options for CAD detection of large-scale populations. Coronary angiography (CAG) [2] is widely regarded as the gold standard for detecting CAD. But it is not suitable as a routine examination method for early screening due to its invasive and high price defects. Medical research has confirmed that when blood flows through the stenosis of a blood vessel, it will impact the wall of the blood vessel and form turbulence. The turbulence can cause murmurs in heart sound signals [3]. Therefore, as a non-invasive detection method, heart sound analysis has the potential to become a cost-effective screening tool to achieve the early detection of CAD [4].
The correlation between diastolic murmur and stenosis was proved by Akay et al. [5]. In that research, four analysis methods were used, including fast Fourier transform, autoregressive, autoregressive moving average, and minimum norm. The results obtained by Semmlow et al. [6] also indicated that an above-normal percentage of high-frequency en-ergy is closely related to narrowed coronary arteries. Some researchers used heart sound-based risk assessment to help detect CAD, and the results demonstrated the potential use of heart sound to identify CAD [7,8]. In order to make further use of diastolic murmurs for CAD detection, many scholars analyzed diastolic heart sound signals in the frequency domain. Schmidt et al. [9] analyzed the frequency distribution of the diastolic period, and identified new features to describe an increase in low-frequency power in CAD patients. Gauthier et al. [10] used the energy ratio of high and low-frequency components as classification features. Zhao et al. [11] proposed a novel approach based on Hilbert-Huang transform to analyze the diastolic murmurs of CAD. In addition to frequency-domain features, wavelet-based feature sets in the time-frequency domain are also used to classify abnormal heart sounds [12]. Nonlinear analysis is an effective way to re-flect nonlinearity and complexity. It was proven that the correlation dimension can be used for CAD detection [13]. As a nonlinear feature, entropy is very suitable for the analy-sis of non-stationary signals. Akay et al. [14] used approximate entropy of heart sounds to identify CAD. Among these studies based on single-channel heart sound signals, the highest accuracy of detecting CAD is 78%. The accuracy of detecting CAD based on heart sound signals needs to be further improved.
Considering multiple auscultation areas, researchers have collected heart sound signals from multiple locations on the chest for disease detection [15]. Akanksha et al. [16] used a cross power spectrum to analyze the heart sound signals collected from four positions of the chest for CAD detection. Rujoie et al. [17] diagnosed and assessed the severity of tricuspid regurgitation using heart sounds recorded in seven channels. Pathak et al. [18] collected multi-channel heart sounds and eliminated environmental noise by using the delayed propagation of heart sounds between different channels. Griffel et al. [19] evaluated the effect of automutual information function for CAD detection by using two-channel heart sound signals at three positions measured in sequence. The above studies confirmed that the results of the joint analysis of multi-channel heart sound sig-nals are better than those of a single channel.
The analysis of entropy measures can provide a valuable tool for quantifying the regularity of physiological time series [20]. Sample entropy (SampEn) and fuzzy entropy (FuzzyEn) are widely used in physiological signals for they overcome the shortcomings of ApEn, such as bias and relative inconsistency [21,22]. However, SampEn and FuzzyEn need to manually set parameters according to the data, which depends on experience and is not conducive to the standardization of the formula. Distribution entropy (DistEn) can preclude the dependence upon input parameters, and it has shown superiority for the analysis of short-term physiological signals compared to SampEn and FuzzyEn [23,24]. Cross entropy analysis can enable the measurement of the synchrony or similarity of patterns between two channel signals. Previous studies have shown that cross-sample entropy (XSampEn) [25], cross fuzzy entropy (FuzzyEn) [26], and joint distribution entropy (JDistEn) [27] have great potential for physiological signal analysis. These three cross entropy features have been developed from the above three entropy features, respectively, so these six features are explored in this study.
To improve the accuracy of detecting CAD based on heart sound signals, this paper collected five-channel heart sound signals, and then proposed a method based on multidomain feature fusion of multi-channel heart sound signals to detect CAD. First, the time-domain, frequency-domain, entropy, and cross entropy features of heart sound signals were extracted as features, and different feature sets were composed of these features. Then, recursive feature elimination based on support vector machine (SVM-RFE) was used for feature selection, as it iteratively obtains the optimal feature subset. Meanwhile, information gain was also used for feature ranking. Subsequently, support vector machines (SVM) were used for classification. Results showed that this study provided an effective computer-aided method for the identification of CAD patients. Multi-channel feature, entropy feature, and cross entropy feature were all helpful for classification performance. Figure 1 depicts a system block diagram for detecting CAD using multi-domain feature fusion of multi-channel heart sound signals. information gain was also used for feature ranking. Subsequently, support vector machines (SVM) were used for classification. Results showed that this study provided an effective computer-aided method for the identification of CAD patients. Multi-channel feature, entropy feature, and cross entropy feature were all helpful for classification performance. Figure 1 depicts a system block diagram for detecting CAD using multi-domain feature fusion of multi-channel heart sound signals.

Data Acquisition
This study was conducted under the principles of the Helsinki Declaration and its subsequent amendments and obtained the approval of the Institutional Review Board (No. 034). All the subjects were from Qi Lu Hospital of Shandong University, and were provided with informed consent before participation. The inclusion criterion was subjects that were scheduled to undergo a CAG within two days. Three types of subjects were excluded from the study: (a) subjects who had previously undergone percutaneous coronary intervention or coronary artery bypass surgery, (b) subjects who had valvular heart disease verified by echocardiography, (c) subjects who had acute myocardial infarction. Subjects with ≥50% stenosis in at least one of three main coronary artery branches (i.e., left anterior descending, left circumflex, and right coronary artery) were categorized as CAD, otherwise as non-CAD. This study enrolled 36 subjects, including 21 CAD patients and 15 non-CAD subjects. All CAD patients had left anterior descending stenosis, in which there were 7 subjects with first diagonal branch stenosis, 2 subjects with second diagonal branch stenosis, 1 subject with septal artery stenosis, 1 subject with middle branch stenosis, and 16 subjects with left circumflex artery stenosis. The basic characteristics of all subjects are given in Table 1. Mann-Whitney U tests were used for continuous variables, for they did not conform to the normal distribution. Since the gender group is a binary categorical variable, and considering the sample size, this study adopted Fisher's exact test as the statistical test method, and the p value is listed in Table 1.

Data Acquisition
This study was conducted under the principles of the Helsinki Declaration and its subsequent amendments and obtained the approval of the Institutional Review Board (No. 034). All the subjects were from Qi Lu Hospital of Shandong University, and were provided with informed consent before participation. The inclusion criterion was subjects that were scheduled to undergo a CAG within two days. Three types of subjects were excluded from the study: (a) subjects who had previously undergone percutaneous coronary intervention or coronary artery bypass surgery, (b) subjects who had valvular heart disease verified by echocardiography, (c) subjects who had acute myocardial infarction. Subjects with ≥50% stenosis in at least one of three main coronary artery branches (i.e., left anterior descending, left circumflex, and right coronary artery) were categorized as CAD, otherwise as non-CAD. This study enrolled 36 subjects, including 21 CAD patients and 15 non-CAD subjects. All CAD patients had left anterior descending stenosis, in which there were 7 subjects with first diagonal branch stenosis, 2 subjects with second diagonal branch stenosis, 1 subject with septal artery stenosis, 1 subject with middle branch stenosis, and 16 subjects with left circumflex artery stenosis. The basic characteristics of all subjects are given in Table 1. Mann-Whitney U tests were used for continuous variables, for they did not conform to the normal distribution. Since the gender group is a binary categorical variable, and considering the sample size, this study adopted Fisher's exact test as the statistical test method, and the p value is listed in Table 1. A cardiovascular function detection device (CVFD, Huiyironggong Technology Co., Ltd., Jinan, China) was used to record the heart sound signals. Since CAD mostly occurs in the left coronary artery, under the recommendations of the guidelines and cardiovascular experts, an acquisition channel on the left was added on the basis of the original four auscultation areas. The five detectors of electronic stethoscope were respectively placed in the second intercostal space on the right edge of the sternum, the second intercostal space on the left edge of the sternum, the third intercostal space on the left edge of the sternum, the fourth intercostal space on the left edge of the sternum, and the intersection of the fourth intercostal space and the midclavicular line. For each subject, heart sound signals in five different locations were simultaneously recorded for 5 min at a sampling rate of 2 kHz. The collected data were numbered as channel 1 to channel 5.

Signal Preprocessing
In order to remove the interference of respiration, filtering is a necessary step in heart sound signal preprocessing. The advantage of the Butterworth filter is that the amplitude-frequency characteristic is flat and monotonous in the passband [28]. Although the attenuation of this filter in stopband is relatively slow, the fifth-order filter is still acceptable in this study. Therefore, a fifth-order Butterworth high pass filter with a cut-off frequency of 30 Hz was applied to remove the low-frequency noise and the baseline drift. Subsequently, a 50 Hz notch filter was used to remove power frequency interference. The comparison before and after preprocessing is shown in Figure 2. To enlarge the sample size, each five-minute recording was cropped to 30 segments lasting 10 s. Segments with wheezing of asthma or serious noise interference were considered unqualified. After quality evaluation and elimination [29], a total of 991 samples were generated, including 553 CAD and 438 non-CAD samples. The heart sounds from channel 1 to channel 5 of a non-CAD subject and a CAD patient are given in Figure 3. It can be observed that there are differences between the five channels, such as the amplitude ratio of first heart sound and second heart sound. There are also differences between the non-CAD subject and the CAD patient. The heart sound signals of the CAD group have more heart murmur.
A cardiovascular function detection device (CVFD, Huiyironggong Technology Co., Ltd., Jinan, China) was used to record the heart sound signals. Since CAD mostly occurs in the left coronary artery, under the recommendations of the guidelines and cardiovascular experts, an acquisition channel on the left was added on the basis of the original four auscultation areas. The five detectors of electronic stethoscope were respectively placed in the second intercostal space on the right edge of the sternum, the second intercostal space on the left edge of the sternum, the third intercostal space on the left edge of the sternum, the fourth intercostal space on the left edge of the sternum, and the intersection of the fourth intercostal space and the midclavicular line. For each subject, heart sound signals in five different locations were simultaneously recorded for 5 min at a sampling rate of 2 kHz. The collected data were numbered as channel 1 to channel 5.

Signal Preprocessing
In order to remove the interference of respiration, filtering is a necessary step in heart sound signal preprocessing. The advantage of the Butterworth filter is that the amplitudefrequency characteristic is flat and monotonous in the passband [28]. Although the attenuation of this filter in stopband is relatively slow, the fifth-order filter is still acceptable in this study. Therefore, a fifth-order Butterworth high pass filter with a cut-off frequency of 30 Hz was applied to remove the low-frequency noise and the baseline drift. Subsequently, a 50 Hz notch filter was used to remove power frequency interference. The comparison before and after preprocessing is shown in Figure 2. To enlarge the sample size, each five-minute recording was cropped to 30 segments lasting 10 s. Segments with wheezing of asthma or serious noise interference were considered unqualified. After quality evaluation and elimination [29], a total of 991 samples were generated, including 553 CAD and 438 non-CAD samples. The heart sounds from channel 1 to channel 5 of a non-CAD subject and a CAD patient are given in Figure 3. It can be observed that there are differences between the five channels, such as the amplitude ratio of first heart sound and second heart sound. There are also differences between the non-CAD subject and the CAD patient. The heart sound signals of the CAD group have more heart murmur.

Features Extraction
The segmentation of the fundamental heart sounds is an essential step in the automatic analysis of the heart sound signal. There are two main components in a cardiac cycle: The first heart sound (S1), caused by the closure of the mitral and tricuspid valves and their vibrations; the second heart sound (S2), generated by the closure of the aortic and pulmonary valves and their vibrations. The systole interval is the window between S1 and S2, and the diastole interval is from S2 to the beginning of S1 in the next heart cycle. For each cardiac cycle, the PCG signal was segmented into four states: S1, systole, S2, and diastole, using the algorithm proposed by Springer et al. [30]. The segmentation diagram is described in Figure 4. In this study, 20 time-domain, 16 frequency-domain, and 12 entropy features of each channel were extracted, and there were 240 single-channel features. 3 cross entropy features were extracted from every two channels. Since there were 10 combinations of five channels, 30 cross entropy features were extracted in this study. Heat sound signal Derived states Diastole S2 Systole S1 Figure 3. Collected heart sound signals. (a1-a5) The heart sound signals from channel 1 to channel 5 of a non-CAD subject; (b1-b5) The heart sound signals from channel 1 to channel 5 of a CAD patient.

Features Extraction
The segmentation of the fundamental heart sounds is an essential step in the automatic analysis of the heart sound signal. There are two main components in a cardiac cycle: The first heart sound (S1), caused by the closure of the mitral and tricuspid valves and their vibrations; the second heart sound (S2), generated by the closure of the aortic and pulmonary valves and their vibrations. The systole interval is the window between S1 and S2, and the diastole interval is from S2 to the beginning of S1 in the next heart cycle. For each cardiac cycle, the PCG signal was segmented into four states: S1, systole, S2, and diastole, using the algorithm proposed by Springer et al. [30]. The segmentation diagram is described in Figure 4. In this study, 20 time-domain, 16 frequency-domain, and 12 entropy features of each channel were extracted, and there were 240 single-channel features. 3 cross entropy features were extracted from every two channels. Since there were 10 combinations of five channels, 30 cross entropy features were extracted in this study.

Features Extraction
The segmentation of the fundamental heart sounds is an essential step in the automatic analysis of the heart sound signal. There are two main components in a cardiac cycle: The first heart sound (S1), caused by the closure of the mitral and tricuspid valves and their vibrations; the second heart sound (S2), generated by the closure of the aortic and pulmonary valves and their vibrations. The systole interval is the window between S1 and S2, and the diastole interval is from S2 to the beginning of S1 in the next heart cycle. For each cardiac cycle, the PCG signal was segmented into four states: S1, systole, S2, and diastole, using the algorithm proposed by Springer et al. [30]. The segmentation diagram is described in Figure 4. In this study, 20 time-domain, 16 frequency-domain, and 12 entropy features of each channel were extracted, and there were 240 single-channel features. 3 cross entropy features were extracted from every two channels. Since there were 10 combinations of five channels, 30 cross entropy features were extracted in this study.   The duration of each cardiac activity state often reflects changes in the state of the heart, since they are generated by the specific cardiac activities. The amplitude of heart sound can represent the intensity of cardiac mechanical activity, which may be potentially helpful for the detection of CAD. In this study, the mean value and standard deviation (SD) of interval durations, duration ratios, and average amplitude ratios were calculated [31]. The details are given in Table 2. Table 2. Extracted time-domain features during a cardiac cycle of each channel.

CC
The cardiac cycle duration IntS1 The S1 interval duration IntS2 The S2 interval duration IntSys The systolic interval duration IntDia The diastolic interval duration Ratio_SysCC The ratio of systolic interval to the cardiac cycle duration Ratio_DiaCC The ratio of diastolic interval to the cardiac cycle duration Ratio_SysDia The ratio of systole interval to the diastole interval Ratio_Amp_SysS1 The ratio of average amplitude during systole to that during S1 Ratio_Amp_DiaS2 The ratio of average amplitude during diastole to that during S2

Frequency-Domain Features (16 × 5 Features)
Frequency spectrum analysis is the most widely used approach in heart sound analysis. The fast Fourier transform (FFT) was used in this study. The normal heart sound signal generally had a frequency band below 200 Hz, and the noise related to diseases was generally ranging between 200 and 800 Hz [32]. At the same time, heart sound signals of CAD patients and non-CAD subjects were also significantly different at the low-frequency power of 25-60 Hz, especially at 31.5 Hz [9]. According to the existing research conclusions [33], 200 Hz and 50 Hz were used as the thresholds of high and low frequency. The spectrum ratios were extracted as frequency domain features, and a detailed description is presented in Table 3. First, the spectrum values of each state were calculated using fast Fourier transform. Then, the proportions of high frequency (above 200 Hz) and low frequency (below 50 Hz) components in the spectrum of four state spectra were obtained separately. Their mean values and SD were calculated as features. Table 3. Extracted frequency-domain features during a cardiac cycle of each channel.

HFAll_S1
The proportion of high-frequency component in total spectrum S1s LFAll_S1 The proportion of low-frequency component in total spectrum S1s HFAll_S2 The proportion of high-frequency component in total spectrum S2s LFAll_S2 The proportion of low-frequency component in total spectrum S2s HFAll_Sys The proportion of high-frequency component in total spectrum systoles LFAll_Sys The proportion of low-frequency component in total spectrum systoles HFAll_Dia The proportion of high-frequency component in total spectrum diastoles LFAll_Dia The proportion of low-frequency component in total spectrum diastoles 2.3.3. Entropy Features (12 × 5 Features) Entropy features were extracted including SampEn, FuzzyEn, and DistEn. For this work, the mean value and SD of entropy features in systole and diastole were calculated, and the details are given in Table 4.

SampEn_Sys
The sample entropy of systolic SampEn_Dia The sample entropy of diastolic FuzzyEn_Sys The fuzzy entropy of systolic FuzzyEn_Dia The fuzzy entropy of diastolic DistEn_Sys The distribution entropy of systolic DistEn_Dia The distribution entropy of diastolic 1.
SampEn is a nonlinear feature to calculate the probability of generating new patterns in signals [20]. It is also a common method to measure the complexity of time series [21]. SampEn can be calculated as follows: where N is the length of signals, m is the embedding dimension, r is the threshold parameter, and B (m) i (r) is the probability that any two epochs match each other.

2.
FuzzyEn [34] is actually a refined algorithm of SampEn. The difference between them lies in the thresholding procedure. The fuzzy membership function to determine the fuzzy similarity S m ij between X m i and X m j is: where d ij is the distance between X m i and X m j . In this study, for SampEn and FuzzyEn, the pattern length m was set to 2, and matching tolerance r was set to 0.2 times the SD of the input time series [35]. It has been shown in published studies that the introduction of the fuzzy membership function significantly improves the stability and consistency of the algorithm [36].

3.
DistEn [23] uses empirical probability distribution functions (ePDF) to achieve the global measurement of the distance matrix, avoiding the parameter dependence caused by local evaluation. The ePDF of d m ij is estimated using a histogram with a predefined bin number B. Then DistEn is defined by the Shannon formula for entropy: Thus, the range of DistEn should be within [0, 1]. In this study, B was set to 2ˆ8.

Cross Entropy Features (3 × 10 Features)
Coupling, also known as synchrony, was first proposed by Huygens [37]. XSampEn, XFuzzyEn, and JDistEn used in this study are accepted methods to measure coupling [38]. As the cross entropy features were extracted from every two channels, there were 10 combinations of five channels. Therefore, 30 cross entropy features were extracted in this study.

1.
XSampEn [20] is developed from SampEn. It measures the synchronization of two signals by focusing on the similarity of patterns between two signals. XSampEn is defined as: where m is the embedding dimension, τ is the time delay and r is the threshold parameter. 2.
XFuzzyEn [26] has the same algorithm framework as XSampEn. FuzzyEn substituted a Gaussian function for the Heaviside function as the membership function, i.e., the The parameter m was set to 2 in this study. Since the signal is normalized, the SD is 1.
In order to find out the best parameter r, the XSampEn and XFuzzyEn with r = 0.1, r = 0.15, r = 0.2, r = 0.25, and r = 0.3 were calculated. After comparing the results, r was set to 0.2.

3.
The JDistEn algorithm [27] is developed by combining the joint distance matrix and DistEn. The ePDF of d m ij is estimated by histogram with a predefined bin number B, which is denoted by P t where t = 1, 2, . . . B. Then JDistEn is defined by the Shannon formula for entropy: JDistEn has been shown to have especially good performance in short-length data [27].
In this study, the number of histogram bins B was set to 2ˆ8.

Feature Set Construction
In order to explore whether the multi-channel signal features perform better and whether the two types of entropy features can improve the classification accuracy, five types of feature sets were established. Single-channel feature set 1 was composed of one-channel features without entropy features, which was abbreviated as 'Sin-feature set 1'. Single-channel feature set 2 was composed of one-channel features with entropy features, which was abbreviated as 'Sin-feature set 2'. Multi-channel feature set 1 included five-channel features without entropy features, which was abbreviated as 'Mul-feature set 1'. Multi-channel feature set 2 included five-channel features with entropy features, which was abbreviated as 'Mul-feature set 2'. Multi-channel feature set 3 included fivechannel features with entropy features and cross entropy features, which was abbreviated as 'Mul-feature set 3'. Sin-feature set 1 and Sin-feature set 2 represented five feature sets from channel 1 to channel 5, respectively.

Statistical Analysis
The generalized linear mixed model (GLMM) [39] is used for the statistical analysis in this paper. GLMM can be regarded as the fusion of the generalized linear model and linear mixed model, whose dependent variable need not satisfy the normal distribution. GLMM is suitable for processing repeated measurement data. The dependent variable of this study was the binary categorical variable, so the distribution of the fitted mixed model was set as binomial distribution, and the link function was set as a logit function. Statistical significance was set a priori at p < 0.05.

Feature Selection
Feature selection is particularly important. It is difficult to obtain a satisfactory performance by directly inputting the features into the classifier. This study used two feature selection methods including information gain and SVM-RFE to reduce feature dimension and enhance classification performance.

1.
Information gain [40] is a statistic used to describe the ability to distinguish data samples. Features with larger information gain values are considered to contribute more to classification. Information gain is defined as information entropy minus conditional entropy. 2. SVM-RFE can repeatedly build SVM models to obtain the optimal feature subset. Features with the lowest contribution are iteratively eliminated from the training set, and the ranking from salient to non-salient features is generated [41]. Thus, the optimal feature subset is constructed by selecting the appropriate feature number.

Classification
The task of CAD classification is a typical binary classification problem. SVM was chosen in this study because of its excellent performance in small sample binary classification problems [42]. In n-dimensional space, SVM separates input data into the classes using hyperplanes. When the sample cannot be divided linearly, the kernel function is used to map the sample to a higher latitude space, and then find the hyperplane. The radial basis function kernel is a common kernel function of SVM, which contains two important hyper-parameters: C and gamma. The cost parameter C is used to control the overfitting of the model, and gamma is used to control the non-linear degree of the model [43]. According to previous experimental experience and relevant research [44], the detailed parameter configuration of the SVM classifier is shown in Table 5.

Performance Evaluation
Five-fold cross validation was performed in this work, and the final classification result was the average of five cross validations to make the evaluation more realistic. In order to ensure that the segments of the training group and the validation group came from completely different subjects, the recordings were divided into five parts firstly, and then every recording in each part was cropped into 30 segments lasting 10 s. Stratified sampling was used to ensure the balance of positive and negative samples.
In this study, the standard metrics including sensitivity (Se.), specificity (Sp.), and accuracy (Acc.) were used to measure the classification performance [45]. The equations associated with these metrics are calculated as where TP, TN, FP, and FN stand for the number of the true positives, true negatives, false positives, and false negatives, respectively.

Results
In this study, the data pre-processing, feature extraction, and machine learning code were executed in Matlab R2019a. The entire experiment was implemented on a PC with a 3.70 GHz Intel Core i7-8700 k CPU, 16 GB of RAM, and a Windows 10 operating system.

Results Based on Statistical Analysis
In this paper, all the features were fitted by GLMM for statistical analysis. A total of 31 features were proven to be statistically different, including 5 time-domain features, 10 frequency-domain features, 2 entropy features, and 18 cross entropy features. The details of features with statistical differences are shown in Table 6. In the statistical analysis, 1 represented subject with CAD and 0 represented subject without CAD. Therefore, the odds ratio represented the increment of CAD odds for each 1 unit increased in the feature. In the comparison of the features of different domains, cross entropy features accounted for the largest proportion of the features with statistical differences, although the number of them was the least. The feature with the largest odds ratio was the frequency-domain feature. Note: 'Cro-en' is short for 'Cross entropy', the odds ratio is represented by scientific notation, and the suffix number indicates the channel.
The boxplots of entropy and cross entropy features are shown in Figure 5. The abscissa '1_s' in (a)-(f) means systolic period of channel 1, and '1_d' means diastolic period of channel 1. The abscissa '1-2' in (g)-(h) represents the cross entropy feature extracted jointly by channel 1 and channel 2. Features marked with * are statistically significantly different. It can be seen that the XSampEn and XFuzzyEn on most channels had statistically significant differences between CAD patients and non-CAD subjects. The eigenvalues of XSampEn, XFuzzyEn, and JDistEn of CAD patients were generally larger than those of non-CAD subjects. Entropy 2021, 23, x FOR PEER REVIEW 11 of 18

Ranking Results Based on Information Gain
The value of information gain reflects the importance of features. In this study, information gain values of 270 features were calculated, and were sorted from large to small. In order to explore the importance of different domain features for CAD detection, the numbers of different domain features in the top 10, top 20, and top 30 are counted and shown in Figure 6. In Figure 6, Mul-feature set 1 was used in (a), Mul-feature set 2 was used in (b), and Mul-feature set 3 was used in (c). It can be seen that the cross entropy

Ranking Results Based on Information Gain
The value of information gain reflects the importance of features. In this study, information gain values of 270 features were calculated, and were sorted from large to small. In order to explore the importance of different domain features for CAD detection, the numbers of different domain features in the top 10, top 20, and top 30 are counted and shown in Figure 6. In Figure 6, Mul-feature set 1 was used in (a), Mul-feature set 2 was used in (b), and Mul-feature set 3 was used in (c). It can be seen that the cross entropy features and frequency domain features perform excellently in feature ranking, while entropy features perform mediocrely. As always, the performance of frequency domain features is better than that of the time domain feature. features and frequency domain features perform excellently in feature ranking, while entropy features perform mediocrely. As always, the performance of frequency domain features is better than that of the time domain feature.

Classification Performance
The information gain and SVM-RFE method were used to select features. After being sorted and selected, the features were put into the SVM classifier to compare whether the features from multi-channel signals performed better and whether entropy and cross entropy features can improve the classification accuracy. The number of features from single-channel feature sets was selected incrementing at step size 2, and the number of features from multi-channel feature sets was at step size 10. In order to explore the impact of multi-channel features on classification performance, classification accuracy based on single-channel feature sets and multi-channel feature sets were compared. The results are shown in Figure 7. It is worth noting that the number of features from different feature sets is different, so the abscissa of Figure 7 is a percentage of the total number of features. It can be clearly seen that multi-channel feature sets have advantages over single-channel feature sets in detecting CAD. Figure 8 uses the same data as Figure 7, it is drawn to explore the role of entropy and cross entropy features in classification. For the single-channel feature set, the highest classification accuracy among the five channels was used to draw Figure 7 and Figure 8. When using information gain, the feature set of channel 3 had the highest classification accuracy. When using SVM-RFE, the feature set of channel 2 had the highest classification accuracy. But the classification accuracy of the two channels differed by only 0.85%. It can be seen from Figure 8 that the accuracy of classification is improved by adding entropy and cross entropy features to either the single-channel feature set or the multi-channel feature set. Table 7 and Table 8 shows the highest classification accuracy of each feature sets. After feature selection, the top 30 features of Mul-feature set 3 selected by SVM-RFE achieved the best performance with an accuracy of 90.92%. Besides, the top 22 features of

Classification Performance
The information gain and SVM-RFE method were used to select features. After being sorted and selected, the features were put into the SVM classifier to compare whether the features from multi-channel signals performed better and whether entropy and cross entropy features can improve the classification accuracy. The number of features from single-channel feature sets was selected incrementing at step size 2, and the number of features from multi-channel feature sets was at step size 10. In order to explore the impact of multi-channel features on classification performance, classification accuracy based on single-channel feature sets and multi-channel feature sets were compared. The results are shown in Figure 7. It is worth noting that the number of features from different feature sets is different, so the abscissa of Figure 7 is a percentage of the total number of features. It can be clearly seen that multi-channel feature sets have advantages over single-channel feature sets in detecting CAD. Sin-feature set 2 selected by SVM-RFE achieved the best performance of single-channel features with an accuracy of 83.02 %.  Figure 8 uses the same data as Figure 7, it is drawn to explore the role of entropy and cross entropy features in classification. For the single-channel feature set, the highest classification accuracy among the five channels was used to draw Figures 7 and 8. When using information gain, the feature set of channel 3 had the highest classification accuracy. When using SVM-RFE, the feature set of channel 2 had the highest classification accuracy. But the classification accuracy of the two channels differed by only 0.85%. It can be seen from Figure 8 that the accuracy of classification is improved by adding entropy and cross entropy features to either the single-channel feature set or the multi-channel feature set.  83.03 ± 14.60 Note: the bold format represents the highest classification accuracy in each selecting method, 'Ch-1′ means 'Channel 1′, and data are expressed as mean value ± standard deviation.

Discussion
Considering the location of coronary artery occlusion, all CAD patients had left anterior descending stenosis, in which there were 7 subjects with first diagonal branch stenosis, 2 subjects with second diagonal branch stenosis, 1 subject with septal artery stenosis, 1 subject with middle branch stenosis and 16 subjects with left circumflex artery stenosis. That is to say, most coronary artery occlusion occurred in the left coronary artery. Among the five auscultation locations designed in this study, the positions of channel 2 and channel 3 mainly detected the left coronary artery. Therefore, channel 2 and channel 3 had excellent classification performance, which is consistent with our results. Previous studies [5,6] showed that coronary stenosis produces high-frequency sounds due to the turbulent blood flow in partially occluded arteries. This is consistent with the conclusion that the feature with the largest odds ratio in our study is the frequency domain feature.
Statistical analysis is an important step in exploring the validity of features. This study expands the sample size by segmenting the heart sound signal, so the samples of the same person are not independent of each other. Data segmentation is equivalent to the repeated measurement of data, which is suitable for analysis using GLMM. GLMM includes both fixed effects and random effects, and random effects can eliminate the influence of feature correlation within the group. The role of information gain is to estimate the importance of extracted features. The results show that the cross entropy feature not only accounts for the largest proportion in the top 10, top 20, and top 30, but also has the largest number of features that are statistically different. Considering that there are only 30 cross entropy features in the total 270 features, this result is more encouraging. SVM-RFE is a greedy algorithm for finding the optimal feature subset. Although it is time-consuming, it can enormously improve the accuracy of classification. The index of SVM-RFE is based on classification. Therefore, it is more reliable in improving the accuracy of classification. Figure 7 shows that the classification performance using features extracted from multichannel signals is better than that from single-channel signals. Compared with singlechannel signals, multi-channel signals can provide more information about detecting CAD from suspected patients. The murmurs generated by coronary artery occlusion are more likely to be picked up by multiple heart sound sensors located at different locations. In other words, multi-channel signals acquisition can increase the probability of detecting CAD. In addition, in this study, the application of multi-domain features also plays a significant role. Previous studies have proven that features from multiple domains are more conducive to feature classification [33,46]. Figure 8 shows the comparison of classification accuracy before and after adding the entropy feature. It can be seen that the accuracy of classification is improved by adding entropy and cross entropy features. Figure 5g-i show that the cross entropy features of CAD patients have a consistent increase compared with non-CAD. Cross entropy is a physical quantity to represent synchronization. The increase of the cross entropy represents a decrease in the synchronization of the two signals. For CAD patients, the stenosis of blood vessels will lead to myocardial ischemia and reduction of myocardial contractility. In cases of myocardial ischemia, the energy supply decreases, which can result in systolic dysfunction, such as delayed contraction, decreased contraction force, and non-synchronized motion in the myocardium. Changes in the state of the heart affect the flow of blood, which is reflected in the heart sound and captured by a microphone on the body surface. Previous studies concluded that there is disturbed myocardial synchrony in CAD patients, with greater dyssynchrony than in the control group [47]. This is consistent with the increase of the cross entropy of CAD patients in Figure 5. Table 9 summarizes the existing studies that use heart sound signals for CAD detection. Among these studies, the highest accuracy of CAD detection using heart sound signals is 84% [17]. Most of the studies in Table 9 identified CAD patients and healthy subjects. Obviously, it is easier to identify CAD patients and healthy subjects, because of their obvious differences in clinical symptoms and examination results. However, for the similarity of clinical symptoms, metabolism and electrocardiogram between CAD and suspected CAD patients [48], it is very difficult for physicians to accurately diagnose them. A previous study concluded that 10-30% of patients who received CAG due to angina pectoris had "normal" or "near normal" coronary arteries during CAG [49], which causes an additional significant burden on patients, families, and society. In this study, CAD and suspected CAD patients were identified, and the classification accuracy of the multi-domain features extracted from multi-channel signals is 90.92%. Table 9. Summary of the existing studies on the detection of CAD using PCG signals.

Conclusions
Among all the single-channel features, the highest classification accuracy was 83.02%. After collecting heart sound signals from five different locations, the classification accuracy of the multi-channel features was 86.70%. After adding entropy features and cross entropy features, the classification accuracy improved to 90.90%. It is concluded that multi-channel heart sounds can provide further information for CAD detection, and entropy features and cross entropy features have the advantage of improving classification accuracy. Due to the advantages of non-invasive, low cost, and simple operation, the use of heart sound signals will inevitably provide great help in disease screening and detection. Cross entropy features have shown great potential in statistical analysis and feature ranking, and more in-depth research can be done in the future. Multi-domain feature fusion of multi-channel heart sound signals can provide additional information, which will play an important role in the process of preventing and overcoming cardiovascular diseases. In future work, signals from more subjects are necessary to test the performance of the proposed method further. Moreover, we will pay attention to exploring more useful features and classification methods.