Biometric Identification Method for Heart Sound Based on Multimodal Multiscale Dispersion Entropy

In this paper, a new method of biometric characterization of heart sounds based on multimodal multiscale dispersion entropy is proposed. Firstly, the heart sound is periodically segmented, and then each single-cycle heart sound is decomposed into a group of intrinsic mode functions (IMFs) by improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN). These IMFs are then segmented to a series of frames, which is used to calculate the refine composite multiscale dispersion entropy (RCMDE) as the characteristic representation of heart sound. In the simulation experiments I, carried out on the open heart sounds database Michigan, Washington and Littman, the feature representation method was combined with the heart sound segmentation method based on logistic regression (LR) and hidden semi-Markov models (HSMM), and feature selection was performed through the Fisher ratio (FR). Finally, the Euclidean distance (ED) and the close principle are used for matching and identification, and the recognition accuracy rate was 96.08%. To improve the practical application value of this method, the proposed method was applied to 80 heart sounds database constructed by 40 volunteer heart sounds to discuss the effect of single-cycle heart sounds with different starting positions on performance in experiment II. The experimental results show that the single-cycle heart sound with the starting position of the start of the first heart sound (S1) has the highest recognition rate of 97.5%. In summary, the proposed method is effective for heart sound biometric recognition.


Introduction
Heart sound is a complex, non-stationary and quasi-periodic signal that is consisted of multiple heartbeats or cardiac cycles, which mainly contain components such as the first heart sound S1, the second heart sound S2, systolic murmur and diastolic murmur. Heart sound originates from the opening and closing of the heart valve and the turbulence of blood, which contains physiological information, such as atria, ventricles, major vessels, cardiovascular vessels and functional status of various valves, and could reflect mechanical activity and structure status of heart. As the biometric characteristics, the biggest advantage of heart sound is universality, stability, uniqueness and collectability [1]. So far, there have been studies that have verified the feasibility of heart sound signals for biometric identification. The heart sound signal as an option for biometric identification was first introduced by Beritelli and Spadaccini [2]. Their method needs to locate and describe S1 and S2, then chirp-z transform (CZT) is performed to obtain the feature set, and finally, Euclidean distance (ED) is used as classifier. In another study, Phua et al. [3] introduced linear frequency band cepstrum (LFBC) for heart sound feature extraction and used two classifiers of vector quantization (VQ) and Gaussian mixture Entropy 2020, 22, 238 3 of 21 MDE (RCMDE) is proposed to improve the computing speed and stability, which is more applicable to process the short and noisy signals in biomedical applications.
To avoid the error, the testing data should be consistent with the length of the corresponding training data. Furthermore, the heart sound signal has pseudo-periodicity, and each cardiac cycle contains the dynamic acoustic characteristics of the heart structure. The cardiac cycle is different for each individual, which also reflects physiological characteristics between individuals. Therefore, this work takes single-cycle heart sound as the input of the proposed method, and RCMDE is combined with ICEEMDAN to quantify the important biometric information of the individual contained in every cardiac cycle. For the heart sound signal more than one cycle, it is firstly periodically segmented, and then the single cycle of heart sounds is decomposed into a group of IMFs by ICEEMDAN. These IMFs are then segmented to a series of frames, which is used to calculate the RCMDE as a characteristic representation of the heart sound. In addition, feature selection was performed to remove redundant features through the Fisher ratio (FR), and then ED is used to metric and match the features, and finally forming a new method based on ICEMDAN-RCMDE-FR-ED. At the same time, it can be considered that ICEEMDAN-RCMDE-FR has generated a kind of biometric characterization of heart sounds, which is named as the multimodal multiscale dispersion entropy. The feature generation flowchart of the multimodal multiscale dispersion entropy is as Figure 1.
Entropy 2020, 22, x FOR PEER REVIEW 3 of 21 refined composite MDE (RCMDE) is proposed to improve the computing speed and stability, which is more applicable to process the short and noisy signals in biomedical applications.
To avoid the error, the testing data should be consistent with the length of the corresponding training data. Furthermore, the heart sound signal has pseudo-periodicity, and each cardiac cycle contains the dynamic acoustic characteristics of the heart structure. The cardiac cycle is different for each individual, which also reflects physiological characteristics between individuals. Therefore, this work takes single-cycle heart sound as the input of the proposed method, and RCMDE is combined with ICEEMDAN to quantify the important biometric information of the individual contained in every cardiac cycle. For the heart sound signal more than one cycle, it is firstly periodically segmented, and then the single cycle of heart sounds is decomposed into a group of IMFs by ICEEMDAN. These IMFs are then segmented to a series of frames, which is used to calculate the RCMDE as a characteristic representation of the heart sound. In addition, feature selection was performed to remove redundant features through the Fisher ratio (FR), and then ED is used to metric and match the features, and finally forming a new method based on ICEMDAN-RCMDE-FR-ED. At the same time, it can be considered that ICEEMDAN-RCMDE-FR has generated a kind of biometric characterization of heart sounds, which is named as the multimodal multiscale dispersion entropy. The feature generation flowchart of the multimodal multiscale dispersion entropy is as Figure 1.

Mathematical Model of Heart Sound
In heart sound biometric recognition, heart sound is a non-stationary and quasi-periodic signal due to the rhythmicity of the heartbeat. Although the waveform of each cardiac cycle of heart sound has slight differences in time and magnitude, heart sound can be approximated to the periodic signals in the mathematics model. At the same time, since each cardiac cycle of heart sound contains four main components, including the first heart sound S1, the second heart sound S2, systolic murmur and diastolic murmur, cardiac cycles of heart sound are considered as the main objective of the biometric. The mathematical model of heart sound can be described as follows: , represents any period in the heart sound, the length of heart sound is * s L T F = , where T represents the cardiac cycle, s F represents the sampling frequency; ( ), 1,2,..., * x i i N L = , represents a heart sound signal containing N cardiac cycles. In the second step of formula (1), 1 ( ) S i represents the first heart sound S1, ( ) Sysmur i represents the systolic murmur, 2 ( ) S i represents the second heart sound S2 and ( ) Diasmur i represents the diastolic murmur.

Mathematical Model of Heart Sound
In heart sound biometric recognition, heart sound is a non-stationary and quasi-periodic signal due to the rhythmicity of the heartbeat. Although the waveform of each cardiac cycle of heart sound has slight differences in time and magnitude, heart sound can be approximated to the periodic signals in the mathematics model. At the same time, since each cardiac cycle of heart sound contains four main components, including the first heart sound S1, the second heart sound S2, systolic murmur and diastolic murmur, cardiac cycles of heart sound are considered as the main objective of the biometric. The mathematical model of heart sound can be described as follows: In the first step of Formula (1), x T (i), i = 1, 2, . . . , L, represents any period in the heart sound, the length of heart sound is L = T * F s , where T represents the cardiac cycle, F s represents the sampling frequency; x(i), i = 1, 2, . . . , N * L, represents a heart sound signal containing N cardiac cycles. In the second step of Formula (1), S 1 (i) represents the first heart sound S1, Sysmur(i) represents the systolic murmur, S 2 (i) represents the second heart sound S2 and Diasmur(i) represents the diastolic murmur.

ICEEMDAN Method
Empirical Mode Decomposition (EMD) is an adaptive method for analyzing non-stationary signals originating from nonlinear systems. It decomposes the original signal into the sum of the intrinsic mode functions (IMFs), which can be described as follows [17]: (1) Set k = 0 and find all extremums of r 0 = x.
(2) Interpolate between the minimum value (the maximum value) of r k to obtain the lower (upper) envelope e min (e max ). (3) Calculate the average envelope: me = (e min + e max )/2. (4) Calculate the candidate IMF: d k+1 = r k −me.
(5) Is d k+1 an IMF? Yes, save d_(k+1), calculate the residual r k+1 = x− k i=1 d i , make k = k + 1, and put r k as input data in step (2). No, d k+1 is taken as input data in step (2). (6) Continue to cycle until the final residual r k meets some predefined stopping criteria.
The improved complete ensemble empirical mode decomposition with the adaptive noise (ICEEMDAN) method has been proved to be suitable for the processing of biomedical signals. The algorithm not only overcomes the mode mixing problem of EMD but also eliminates the spurious mode in CEEMDAN. Let E k (·) denote the operator of the kth modal obtained by EMD, ω (i) denotes the realization of white noise with zero mean and unit variance and M(·) denotes the operator for calculating the local mean of the signal. The realization steps of ICEEMDAN algorithm are as follows [18,19]: (1) Calculate the local mean of the signal by I-times realization of EMD: where · represents the average operator.
(5) Calculate the kth modal: d k = r k−1 − r k . (6) Go to the next k of step (4) until all modes are obtained.
The constant β k−1 is selected to adjust the signal-to-noise ratio (SNR) between the residual and the added noise. For k = 1, β 0 = ε 0 std(x)/std(E 1 (ω (i) )), where std(·) represents the standard deviation, ε 0 is the reciprocal of the required SNR between the input signal x and the first added noise. For k ≥ 2, β k = ε 0 std(r k ).

RCMDE Method
Multiscale dispersion entropy (MDE) is a combination of coarse-grained and dispersion entropy, and the refine composite multiscale dispersion entropy (RCMDE) improves MDE in that the different starting time of the coarse-grained time series corresponding to the different scale factors τ is adopted. Based on multiscale techniques, the main steps for calculating RCMDE are as follows [20][21][22]: (1) The first is to construct a continuous coarse-grained time series. For a univariate signal J,2 , . . . can be showed as follows: x (τ) where τ is the scale factor, and the original time series x is scaled by controlling the value of τ.
J,j with the normal cumulative distribution function: where σ and µ represent the standard deviation and mean of x J,j to an integer from Label 1 to c using a linear algorithm. The mapped signal can be defined as follows: z (τ,c) (4) Define embedding vector z (τ,c,m) J,j with embedding dimension m and time delay d as: For each dispersion pattern, the relative frequency can be obtained as: where p(π v 0 v 1 ...v m−1 ) represent the number of dispersion pattern which is assigned to z

Feature Selection
Fisher Ratio (FR) is proposed on the basis of Fisher criterion. It is used to measure the classification and recognition ability of features and has been successfully used by Pruzansky and Mathews in the research of speech recognition [23]. In this paper, the Fisher ratio is used to select the optimal features and the steps are as follows: (1) Calculate the inter-class dispersion σ between , which is used to measure the degree of dispersion of the r-dimensional feature parameters between the heart sound signals of various categories. The calculation formula is: where M represents the total number of heart sound samples, µ (i) r is the mean value of the r-dimensional feature parameters of the i-th type heart sound signal, and µ r is the mean value of the r-dimensional feature parameters in all heart sound signals.
(2) Calculate the intra-class dispersion σ within , which is used to measure the degree of dispersion in the r-dimensional feature parameters of a certain type of heart sound signal. The calculation formula is: where n i is the number of heart sound samples of the i-th type heart sound signal, x r is the r-dimensional feature parameter of the j-th heart sound sample of the i-th heart sound signal.
(3) To calculate the Fisher ratio, the calculation formula is: where F r is the Fisher ratio of the r-dimensional characteristic parameter.
(4) Sort the Fisher ratio of each dimension feature parameter in descending order: where r i ∈ {1, 2, . . . , R}, i = 1, 2, . . . , R, R is the dimension of the characteristic parameter. (5) The larger the Fisher ratio, the stronger the classification and recognition ability of the feature parameter of the dimension. According to this principle, the top N r dimensional feature parameters ranked first in (4) is selected as the optimal features.

Matching Recognition
This paper adopts the Euclidean distance (ED) and the close principle to realize the pattern recognition of the user's heart sound. The idea of the algorithm is as follows: when the data and labels in the training set are known, compare the one-dimensional feature vector of the test data with the corresponding feature vector in the training set to find the one-dimensional feature vector most similar to it in the training set, then the category corresponding to the test feature vector is the category corresponding to the training feature vector. The algorithm steps are: (1) Calculate the distance between the test data and each training data; For the feature vector v of the test data and the feature vector v A of the A-th training data in the heart sound database, the Euclidean distance d A in the D dimension Euclidean space is calculated as follows: where A = 1, 2, . . . , C, C is the number of training data, and D is the dimension of the feature vector.
(3) According to the selection principle, the closer the distance, the higher the degree of matching between the two data. The category corresponding to the closest feature vector v x 1 is selected as the prediction classification of the test data.

Evaluation Methods
This paper uses the following three indicators to evaluate the proposed algorithm [24]: (1) Average test accuracy: The CRR obtained by averaging the CRR of J experiments was used as the final experimental result, as shown in Equation (9). Considering the calculation amount and accuracy comprehensively, J = 200 is taken in the following experiment of parameter selection and algorithm comparison.
(2) Kappa coefficient: Kappa coefficient is an index to measure the accuracy of multi-classification. Its calculation formula is as follows: Among them, p 0 is the sum of the number of correctly classified samples of each class divided by the total number of samples, which is the overall classification accuracy. Suppose the number of true samples in each class is a 1 , a 2 , . . . , a c , and the number of predicted samples in each class is b 1 , b 2 , . . . , b c , and the total number of samples is num.
The kappa coefficient is usually between 0 and 1 and can be divided into five groups to represent different levels of classification accuracy: 0.0 to 0.20 extremely low classification accuracy, 0.21 to 0.40 general classification accuracy, 0.41 to 0.60 high classification accuracy, 0.61-0.80 very high classification accuracy and 0.81-1 extremely high classification accuracy.
(3) t-test: t-test uses the t-distribution theory to infer the probability of a difference occurring, thereby comparing whether the difference between the two averages is significant. This paper is repeatedly training/testing by randomly dividing the training set/test set multiple times, therefore this will get multiple test accuracy rates. Therefore, the t test can be used to verify whether the CRR 200 selected in this paper can be used as generalization Accuracy. Assuming the generalization accuracy µ 0 = CRR 200 , we get n test accuracy rates, CRR(i), i = 1, 2, . . . , n, then the average test accuracy µ and variance σ 2 are: Considering that the accuracy of these n tests can be regarded as independent sampling of the generalization accuracy µ 0 , then the variable t = √ n(µ−µ 0 ) σ follows a t-distribution with n−1 degrees of freedom. This paper uses the following t-test steps: (1) First establish hypotheses and determine the test level α: H 0 : µ = µ 0 (zero hypothesis), H 1 : µ µ 0 (alternative hypothesis), using bilateral hypothesis, α commonly used values are 0.05 and 0.1, the test level is α = 0.05 in this paper.
(2) Calculate the test statistics: Check the corresponding critical value table to determine the critical value t (α,n) and conclude: If the value of t is within the critical value range [−t (α,n) , t (α,n) ], you cannot reject the assumption that H 0 : µ = µ 0 , you can think that the generalization accuracy is µ, the degree of confidence is 1 − α; otherwise, the hypothesis can be rejected, that is, under this significance degree, the generalization accuracy µ 0 can be considered to be significantly different from the average test accuracy µ.

Data Sources
To verify the effectiveness of the proposed method, the open databases of heart sound recordings and the heart sound database built by our research group are both analyzed. The open database Entropy 2020, 22, 238 8 of 21 used in this paper consists of 72 heart sounds from the three open heart sound databases Michigan, Washington and Littman, including 18 normal heart sounds and 54 abnormal heart sounds. Among them, 23 cases and 16 heart sounds were obtained from the Michigan and Washington heart sound database, and 33 heart sounds were selected from 3M's Littman heart sound database, because 3 heart sounds that did not meet the experimental conditions (i.e., not satisfied should contain at least two cardiac cycles) was abandoned. For the Michigan and Washington heart sound databases [25,26], the sampling frequency is 44.1 kHz, and the acquisition time is about 60 s, respectively including 23 and 16 heart sounds. For the Littman heart sound database [27], the sampling frequency is 11.025 kHz, and the acquisition time is about 3 s. The heart sound database built by our research group consisted of 80 cases of heart sound recordings from college student and teacher volunteers, which are collected by using the Ω shoulder-belt wireless heart sound sensor self-developed by our research group (patent number: 201310454575.6) with sampling frequency of 11,025 Hz. Every volunteer is recorded twice at least one-hour intervals, and every time keep approximately 5 s by properly contacting it with the skin of the front chest wall of the subject, as shown in Figure 2. The heart sound recording is from the apex located slightly inside the midline of the left intercostal bone of the fifth intercostal space obtained from the valve area. In addition, the heart sound recordings obtained from the subjects are collected in their calm state, and the recorded heart sound recordings are stored in a .wav format.
Entropy 2020, 22, x FOR PEER REVIEW 8 of 21 database used in this paper consists of 72 heart sounds from the three open heart sound databases Michigan, Washington and Littman, including 18 normal heart sounds and 54 abnormal heart sounds. Among them, 23 cases and 16 heart sounds were obtained from the Michigan and Washington heart sound database, and 33 heart sounds were selected from 3M's Littman heart sound database, because 3 heart sounds that did not meet the experimental conditions (i.e., not satisfied should contain at least two cardiac cycles) was abandoned. For the Michigan and Washington heart sound databases [25,26], the sampling frequency is 44.1 kHz, and the acquisition time is about 60 s, respectively including 23 and 16 heart sounds. For the Littman heart sound database [27], the sampling frequency is 11.025 kHz, and the acquisition time is about 3 s. The heart sound database built by our research group consisted of 80 cases of heart sound recordings from college student and teacher volunteers, which are collected by using the Ω shoulder-belt wireless heart sound sensor selfdeveloped by our research group (patent number: 201310454575.6) with sampling frequency of 11,025 Hz. Every volunteer is recorded twice at least one-hour intervals, and every time keep approximately 5 s by properly contacting it with the skin of the front chest wall of the subject, as shown in Figure 2.
The heart sound recording is from the apex located slightly inside the midline of the left intercostal bone of the fifth intercostal space obtained from the valve area. In addition, the heart sound recordings obtained from the subjects are collected in their calm state, and the recorded heart sound recordings are stored in a .wav format.
(a) (b) Figure 2. Heart sound database collected by our group: (a) Ω shoulder-belt wireless heart sound sensor; (b) The processing of collecting heart sound.

Pretreatment
The original signal is first preprocessed before performing feature extraction and matching recognition. The preprocessing module includes set the labels for heart sounds, downsampling, denoising, and cycle segmentation. Firstly, the 72 heart sound recordings of the three open heart sound databases are set the labels of 1-72 separately to distinguish the individual corresponding to each heart sound. Then the downsampling frequency is set to 2000 Hz, and the background noise when collecting heart sounds is eliminated by using the wavelet packet multi-threshold denoising method. Wavelet packet multi-threshold denoising is through setting a certain threshold value for each layer of wavelet packet coefficients to quantify and analyze each wavelet coefficient, retain useful data and eliminate unnecessary data. Different wavelets may cause different denoising effects, therefore, Biorthogonal HS wavelets developed for heart sound signals [28] is used here to filter in this work. The specific process is as follows: (1) Performing four-layer HS wavelet packet transform on the noisy signal, and obtain a set of wavelet packet coefficients , 1, 2,...,16 (2) To quantify the threshold of i wpt separately by selecting the Heursure function, and use the threshold to remove the useless data in i wpt ; (3) To perform discrete wavelet reconstruction by using the denoised coefficient i wpt , and the reconstructed signal is the denoised signal.

Pretreatment
The original signal is first preprocessed before performing feature extraction and matching recognition. The preprocessing module includes set the labels for heart sounds, downsampling, denoising, and cycle segmentation. Firstly, the 72 heart sound recordings of the three open heart sound databases are set the labels of 1-72 separately to distinguish the individual corresponding to each heart sound. Then the downsampling frequency is set to 2000 Hz, and the background noise when collecting heart sounds is eliminated by using the wavelet packet multi-threshold denoising method. Wavelet packet multi-threshold denoising is through setting a certain threshold value for each layer of wavelet packet coefficients to quantify and analyze each wavelet coefficient, retain useful data and eliminate unnecessary data. Different wavelets may cause different denoising effects, therefore, Biorthogonal HS wavelets developed for heart sound signals [28] is used here to filter in this work. The specific process is as follows: (1) Performing four-layer HS wavelet packet transform on the noisy signal, and obtain a set of wavelet packet coefficients wpt i , i = 1, 2, . . . , 16; (2) To quantify the threshold of wpt i separately by selecting the Heursure function, and use the threshold to remove the useless data in wpt i ; (3) To perform discrete wavelet reconstruction by using the denoised coefficient wpt i , and the reconstructed signal is the denoised signal.

Periodic Segmentation
Since the proposed feature extraction method is based on single-period heart sounds, the logical regression (LR) and hidden semi-Markov model (HSMM) heart sound segmentation method proposed by Springer et al. is used in this work, which has proven in the 2016 PhysioNet/CinC Challenge to accurately segment heart sounds in noisy real heart sound recordings with good performance [29][30][31]. In this paper, the heart sound segmentation method is firstly used to assign four states such as S1 (the first heart sound), systole, S2 (the second heart sound) and diastole for the preprocessed heart sound recordings. The time point of the first jump from the initial state of the current heart sound recording to the next state is used as the initial split point. The following four situations may be obtained: (1) a series of cardiac cycles segmented from the beginning of S1 to the beginning of the next S1 of the current heart sound recording; (2) a series of cardiac cycles segmented from the beginning of the systole to the beginning of the next systole of the current heart sound recording; (3) a series of cardiac cycles segmented from the beginning of S2 to the beginning of the next S2 of the current heart sound recording; (4) a series of cardiac cycles segmented from the beginning of the diastole to the beginning of the next diastole of the current heart sound recording. The schematic diagram of the heart sound cycle segmentation corresponding to these four cases is shown in Figure 3.

Periodic Segmentation
Since the proposed feature extraction method is based on single-period heart sounds, the logical regression (LR) and hidden semi-Markov model (HSMM) heart sound segmentation method proposed by Springer et al. is used in this work, which has proven in the 2016 PhysioNet/CinC Challenge to accurately segment heart sounds in noisy real heart sound recordings with good performance [29][30][31]. In this paper, the heart sound segmentation method is firstly used to assign four states such as S1 (the first heart sound), systole, S2 (the second heart sound) and diastole for the preprocessed heart sound recordings. The time point of the first jump from the initial state of the current heart sound recording to the next state is used as the initial split point. The following four situations may be obtained: (1) a series of cardiac cycles segmented from the beginning of S1 to the beginning of the next S1 of the current heart sound recording; (2) a series of cardiac cycles segmented from the beginning of the systole to the beginning of the next systole of the current heart sound recording; (3) a series of cardiac cycles segmented from the beginning of S2 to the beginning of the next S2 of the current heart sound recording; (4) a series of cardiac cycles segmented from the beginning of the diastole to the beginning of the next diastole of the current heart sound recording. The schematic diagram of the heart sound cycle segmentation corresponding to these four cases is shown in Figure 3. . Four methods of heart sound cycle segmentation. (a) A series of cardiac cycles segmented from the beginning of S1 to the beginning of the next S1 of the current heart sound recording; (b) a series of cardiac cycles segmented from the beginning of the systole to the beginning of the next systole of the current heart sound recording; (c) a series of cardiac cycles segmented from the beginning of S2 to the beginning of the next S2 of the current heart sound recording; (d) a series of cardiac cycles segmented from the beginning of the diastole to the beginning of the next diastole of the current heart sound recording.
By the above segmentation method, 72 heart sound recordings in the open heart sound databases are divided into 2005 single-cycle heart sounds, where each heart sound recording is divided into 2-101 single-cycle heart sounds according to their length. In each of the following experiments, a singleperiod heart sound was randomly selected from the single-cycle heart sounds from the same heart sound recording as a test data, so that the test data contained 72 single-period heart sounds from different individuals, and the remaining 1933 single-cycle heart sounds were used as training data.

Framing and Windowing
Similar to the speech signal, heart sound is also a non-stationary and time-varying signal. Therefore, the heart sound signal is divided into a set of frames to analyze its characteristic parameters. For each frame, the length of the frame is called the frame length. The standard speech frame length is 20 ms to 25 ms, which is not suitable for heart sounds due to its pseudo-periodicity. . Four methods of heart sound cycle segmentation. (a) A series of cardiac cycles segmented from the beginning of S1 to the beginning of the next S1 of the current heart sound recording; (b) a series of cardiac cycles segmented from the beginning of the systole to the beginning of the next systole of the current heart sound recording; (c) a series of cardiac cycles segmented from the beginning of S2 to the beginning of the next S2 of the current heart sound recording; (d) a series of cardiac cycles segmented from the beginning of the diastole to the beginning of the next diastole of the current heart sound recording.
By the above segmentation method, 72 heart sound recordings in the open heart sound databases are divided into 2005 single-cycle heart sounds, where each heart sound recording is divided into 2-101 single-cycle heart sounds according to their length. In each of the following experiments, a single-period heart sound was randomly selected from the single-cycle heart sounds from the same heart sound recording as a test data, so that the test data contained 72 single-period heart sounds from different individuals, and the remaining 1933 single-cycle heart sounds were used as training data.

Framing and Windowing
Similar to the speech signal, heart sound is also a non-stationary and time-varying signal. Therefore, the heart sound signal is divided into a set of frames to analyze its characteristic parameters. For each frame, the length of the frame is called the frame length. The standard speech frame length is 20 ms to 25 ms, which is not suitable for heart sounds due to its pseudo-periodicity. Reference [32] thinks that the frame length of heart sounds should be longer than 20-25 ms, and it is best when the frame length equals to 256 ms. In this paper, the frame length of the heart sound should be related to the cardiac cycle, and frame lengths should be set different values according to the cardiac cycle. Further, the distance from the start of the frame to the start of the subsequent frame is called the frameshift. To smoothly change the feature parameters, a part of the overlap between adjacent frames is often provided in the case of framing. To prevent spectrum leakage, windowing is usually performed for each frame of heart sounds, usually a Hanning window or a Hamming window.
The single-cycle heart sound obtained after preprocessing and period segmentation is framed by overlap windowing, and then the RCMDE features of each frame are calculated. Then, the RCMDE features of each frame of the single-cycle heart sound are combined into a one-dimensional feature vector. When calculating RCMDE, four important parameters in RCMDE that may have a greater impact on the results, namely scale factor τ, categories c, embedding dimension r and delay time τ. In this experiment, a large number of experiments show when the scaling factor τ = 20, the categories c = 3, the embedding dimension m = 2 and the delay time d = 1, the algorithm performance is the best. In Figure 4a, the RCMDE characteristics of two different single-cycle heart sounds of the same person after windowing and framing are compared. As can be seen from the figure, all feature points of the two single-cycle heart sounds are distributed near the 45 • line. It shows that the two single-cycle heart sounds are close in their corresponding eigenvalues, and they are relatively matched. In Figure 4b, the RCMDE characteristics of two single-cycle heart sounds of different people after windowing and framing are compared. It can be seen that the two single-cycle heart sounds have more feature points distributed farther from the 45 • line, which indicates that the two single-cycle heart sounds have relatively large differences in corresponding feature values, and are not well matched. In Figure 4, the frame length is taken as T/4, the frameshift is taken as T/8 and the Hanning window is used. Reference [32] thinks that the frame length of heart sounds should be longer than 20-25 ms, and it is best when the frame length equals to 256 ms. In this paper, the frame length of the heart sound should be related to the cardiac cycle, and frame lengths should be set different values according to the cardiac cycle. Further, the distance from the start of the frame to the start of the subsequent frame is called the frameshift. To smoothly change the feature parameters, a part of the overlap between adjacent frames is often provided in the case of framing. To prevent spectrum leakage, windowing is usually performed for each frame of heart sounds, usually a Hanning window or a Hamming window.
The single-cycle heart sound obtained after preprocessing and period segmentation is framed by overlap windowing, and then the RCMDE features of each frame are calculated. Then, the RCMDE features of each frame of the single-cycle heart sound are combined into a one-dimensional feature vector. When calculating RCMDE, four important parameters in RCMDE that may have a greater impact on the results, namely scale factor τ, categories c, embedding dimension r and delay time τ. In this experiment, a large number of experiments show when the scaling factor τ = 20, the categories c = 3, the embedding dimension m = 2 and the delay time d = 1, the algorithm performance is the best. In Figure 4a, the RCMDE characteristics of two different single-cycle heart sounds of the same person after windowing and framing are compared. As can be seen from the figure, all feature points of the two single-cycle heart sounds are distributed near the 45° line. It shows that the two single-cycle heart sounds are close in their corresponding eigenvalues, and they are relatively matched. In Figure 4b, the RCMDE characteristics of two single-cycle heart sounds of different people after windowing and framing are compared. It can be seen that the two single-cycle heart sounds have more feature points distributed farther from the 45° line, which indicates that the two single-cycle heart sounds have relatively large differences in corresponding feature values, and are not well matched. In Figure 4, the frame length is taken as T/4, the frameshift is taken as T/8 and the Hanning window is used.  From the above analysis, it can be known that the RCMDE feature of single-cycle heart sounds after windowing and framing is feasible for the identification of different individuals. The effect of setting different frame lengths and frameshifts on the performance of the algorithm based on the cardiac cycle is discussed below. Here, adopting the control variable method, the above-mentioned parameters remain unchanged. It is discussed that the frame length takes win = T/i (i = 1, 2, …, 20) respectively in the condition of no frame overlap, and the corresponding CRR is as shown in the left half of Table 1. The result shows that the optimum frame length is T/4. Then, it is discussed that when the frame length takes T/4 unchanged, the frameshift takes inc = win/i (i = 1, 2, …, 10) respectively, and the corresponding CRR is as shown in the right half of Table 1. The result shows From the above analysis, it can be known that the RCMDE feature of single-cycle heart sounds after windowing and framing is feasible for the identification of different individuals. The effect of setting different frame lengths and frameshifts on the performance of the algorithm based on the cardiac cycle is discussed below. Here, adopting the control variable method, the above-mentioned parameters remain unchanged. It is discussed that the frame length takes win = T/i (i = 1, 2, . . . , 20) respectively in the condition of no frame overlap, and the corresponding CRR is as shown in the left half of Table 1. The result shows that the optimum frame length is T/4. Then, it is discussed that when the frame length takes T/4 unchanged, the frameshift takes inc = win/i (i = 1, 2, . . . , 10) respectively, and the corresponding CRR is as shown in the right half of Table 1. The result shows that when the frameshift is win/5, the best performance is achieved, and adding the frame overlap latter has not improved CRR.

ICEEMDAN-RCMDE-FR-ED Algorithm
To achieve a higher CRR, the ICEMDAN algorithm is used to decompose the training/test cycles into a group of IMFs, and then the hamming window with the window size of T/4 and the window shifting of T/20 is used to frame these IMFs. The training/test IMFs are segmented separately, and each of the obtained heart frames is subjected to RCMDE calculations. The result is sent to the ED algorithm, and the obtained CRR is shown in Table 2. Here, the parameters of the ICEEMDAN algorithm are selected as follows, the noise standard deviation is Nstd = 0.2, the number of EMD implementations is NR = 100, the maximum number of screening iterations allowed is MaxIter = 5000, and the SNRFlag = 1 indicates that the signal-to-noise ratio (SNR) is incremental with EMD implementation. Since ICEEMDAN is an adaptive decomposition algorithm, the number of modals obtained from different heart sound cycles may be different. For comparison, only the least number of IMFs obtained by the ICEEMDAN from the heart sound database is shown here. It is found through experiments that the first three IMFs of the heart sound cycle, respectively used as the algorithm input, can obtain a higher CRR compared with the others. It shows that the first three IMFs not only contain the majority of the information in the heart sound cycle but also dig deep the information in the entire heart sound cycle, which is expressed in a more detailed way. Therefore, adding the features from the above three IMFs to one feature vector as a new heart sound feature is considered. The original heart sound feature representation is shown in Figure 5a. The new heart sound feature representation is shown in Figure 5b. The red dot in the figure represents the single-cycle feature as the training, and the green dot represents the single-cycle feature as the testing. Since the single-cycle heart sounds as the training is much more than the single-cycle heart sounds as the testing, it is shown in the figure below that the green dot is wrapped by the red dot. single-cycle feature as the training, and the green dot represents the single-cycle feature as the testing.
Since the single-cycle heart sounds as the training is much more than the single-cycle heart sounds as the testing, it is shown in the figure below that the green dot is wrapped by the red dot. It can be found from Figure 5 that the merged features have twice as many feature dimensions as the original features and have great redundancy. Therefore, the Fisher ratio (FR) is used for feature selection. After the features are ranked according to the Fisher ratio, the features are selected. The first r N feature dimensions are used as new heart sound features.
After experimental verification, when 300 r N = , the recognition performance is optimal. The 200 CRR and Kappa coefficients of respectively using the original heart sound feature and the new heart sound feature with ED and the close principle are shown in Table 3.  It can be found from Figure 5 that the merged features have twice as many feature dimensions as the original features and have great redundancy. Therefore, the Fisher ratio (FR) is used for feature selection. After the features are ranked according to the Fisher ratio, the features are selected. The first N r feature dimensions are used as new heart sound features. After experimental verification, when N r = 300, the recognition performance is optimal. The CRR 200 and Kappa coefficients of respectively using the original heart sound feature and the new heart sound feature with ED and the close principle are shown in Table 3. It can be seen from Table 3 that the CRR 200 and Kappa coefficients on the three public heart sound databases obtained from the feature extraction method based on ICEEMDAN-RCMDE-FR are higher than the feature extraction method based on RCMDE, and can achieve an average recognition rate of 96.08%. The Kappa coefficient is between 0.8 and 1, which indicates that the classification accuracy is extremely high. The following t-test is used to verify whether CRR 200 = 96.08% obtained in the above table can be regarded as the generalization accuracy. Here, n random experiments are performed, where n = 10, 20, 30, 50, 100, 200, 300, 400, 500, 600, respectively, the average test accuracy µ and standard deviation σ corresponding to n experiments are shown in the left half of Table 4. Here it is assumed that the generalization accuracy µ 0 = CRR 600 , the test level α is 0.05 and then the t value of n experiments is obtained according to the t-test steps in Section 2.6, and the corresponding critical value range is also given. It can be seen from Table 4 that the t values corresponding to n experiments are all within the critical value range [−t (α,n) , t (α,n) ], and the average test accuracy µ corresponding to n experiments can be considered Both can be regarded as generalization accuracy of µ 0 and the degree of confidence is 0.95. It can also be found from Table 4 that when the number of experiments n is greater than 200, the average test accuracy µ has basically stabilized at 96.08% and the t value has basically stabilized near 0. Therefore, considering the stability and calculation cost, the number of experiments is taken as J = 200, the best generalization accuracy is µ 0 = 96.08%.
In summary, the feature extraction method based on ICEEMDAN-RCMDE-FR proposed in this paper can achieve a generalization accuracy of 96.08% on three public heart sound databases with a confidence level of 0.95, which shows that the multimodal multiscale dispersion entropy generated by the ICEEMDAN-RCMDE-FR algorithm has good characterization of heart sounds, and it is suitable for the field of biometrics. Considering that the classifier currently used is rough, this may be one of the reasons why the CRR cannot be further improved. Therefore, different classifiers such as SVM and KNN are compared with ED, and the results are shown in Table 5. The SVM classifier used here is parameter-tuned. The two main penalty parameters c and the kernel function parameter g are 64 and 0.001, and the nearest neighbors of the KNN classifier are taken as k = 5, 3, 2, respectively. From the results in Table 5, the difference between the best performance of the three classifiers is within 1%. It can be found that the smaller the parameter k of KNN is, the higher the CRR is. When k = 1 or 2, KNN is the ED classifier. The heart sound recordings in the open database are different in length, therefore the data distribution in the single-cycle heart sound database generated by the segmentation is not balanced, which may be the reason that the classifier performance cannot be further improved. Since the ED classifier is relatively simpler, the matching recognition time is also faster. Considering the combination, the ED classifier is most suitable for the heart sound database.

Practical Application of ICEEMDAN-RCMDE-FR-ED Algorithm
Although it has been considered in the previous section that the single-cycle heart sound as the test data should be aligned with the corresponding single-period heart sound in the database, the position of the initial split point is not the same when the heart sound cycle is divided. In the practical application of the heart sound biometric identification system, when the single-cycle heart sound is to be segmented from the randomly collected heart sound signal, the position of the initial segmentation point must be fixed and kept consistent. Therefore, the heart sound segmentation method based LR-HSMM proposed by Springer et al. [31] is firstly used to assign the states of the heart sound recording of 40 volunteers collected in the natural environment, and then the following four initial dividing points are used to obtain four kinds of single-cycle heart sounds as training: (1) The starting position of the first S1 appearing in the heart sound recording is taken as the initial dividing point; (2) the starting position of the first systole appearing in the heart sound recording is taken as the initial dividing point; (3) the starting position of the first S2 appearing in the heart sound recording is taken as the initial dividing point; (4) the starting position of the first diastole appearing in the heart sound recording is taken as the initial dividing point. At least one hour later, the heart sound recordings of the 40 volunteers were collected again, and four single-cycle heart sounds as testing are respectively obtained in the same manner as the single-cycle heart sounds obtained as training. A schematic diagram of four heart sound cycle segmentation methods is shown in Figure 6. further improved. Since the ED classifier is relatively simpler, the matching recognition time is also faster. Considering the combination, the ED classifier is most suitable for the heart sound database.

Practical Application of ICEEMDAN-RCMDE-FR-ED Algorithm
Although it has been considered in the previous section that the single-cycle heart sound as the test data should be aligned with the corresponding single-period heart sound in the database, the position of the initial split point is not the same when the heart sound cycle is divided. In the practical application of the heart sound biometric identification system, when the single-cycle heart sound is to be segmented from the randomly collected heart sound signal, the position of the initial segmentation point must be fixed and kept consistent. Therefore, the heart sound segmentation method based LR-HSMM proposed by Springer et al. [31] is firstly used to assign the states of the heart sound recording of 40 volunteers collected in the natural environment, and then the following four initial dividing points are used to obtain four kinds of single-cycle heart sounds as training: (1) The starting position of the first S1 appearing in the heart sound recording is taken as the initial dividing point; (2) the starting position of the first systole appearing in the heart sound recording is taken as the initial dividing point; (3) the starting position of the first S2 appearing in the heart sound recording is taken as the initial dividing point; (4) the starting position of the first diastole appearing in the heart sound recording is taken as the initial dividing point. At least one hour later, the heart sound recordings of the 40 volunteers were collected again, and four single-cycle heart sounds as testing are respectively obtained in the same manner as the single-cycle heart sounds obtained as training. A schematic diagram of four heart sound cycle segmentation methods is shown in Figure 6. It is known from experiments that the four segments of (a), (b), (c) and (d) obtain the same number of single-cycle heart sounds: 209 single-cycle heart sounds as training and 190 single-cycle heart sounds as testing. Since the current data set is relatively balanced, the feature processing here adopts another method different from the previous one: that is, one-to-one method, the feature vectors of all single-period heart sounds as training\testing corresponding to each of the individual are averaged to obtain an average feature vector, so that each individual corresponds to only one average feature vector as the training\testing. Then, the average feature vector as training/testing obtained from (a), (b), (c) and (d) is used as the input of the ICEMDAN-RCMDE-FR-ED algorithm for verification experiments. The selected parameters of the experiment are still the selected parameters in the previous section. The experimental results in Table 6 show that the first segmentation method (a) achieves the highest CRR = 97.5%, which may be since the Springer algorithm is more accurate for S1 segmentation, so the segmentation (a) makes the training and testing features closer. The difference in CRR obtained by the four segmentation methods is between 0% and 5%, and the difference in Kappa coefficients obtained by the four segmentation methods is between 0 and 0.0513. The overall is very close, which may be due to the use of the average feature vector, which enhances the robustness of the algorithm and does not affect the result due to some bad single-cycle heart sounds. Therefore, the ICEEMDAN-RCMDE-FR-ED algorithm proposed in this paper combined with the heart sound segmentation method based on the logistic regression and hidden semi-Markov model (HSMM) has high practical application value in the field of biometric identification.  Table 7 lists performance comparisons between the proposed study and other existing heart sound biometric work. Phua et al. [3] introduced linear frequency band cepstrum (LFBC) for heart sound feature extraction and used two classifiers of vector quantization (VQ) and Gaussian mixture model (GMM) for classification and recognition. The database used Composed of 10 users, the correct recognition rate is 96%. Fatemian et al. [6] proposed a PCG signal identification and verification system. The system is based on wavelet preprocessing, feature extraction using short-time Fourier transform (STFT), feature dimension reduction using linear discriminant analysis (LDA) and majority voting using Euclidean distance (ED) for classification. As a result, the recognition result for 21 subjects was 100%, and the equal error rate (EER) verification result was 33%. Tran et al. [7] used eight feature sets such as temporal shape, spectral shape, Mel-frequency cepstral coefficients (MFCC), linear frequency cepstral coefficients (LFCC), harmonic feature, rhythmic feature, cardiac feature, GMM-super vector as heart sound biometric recognition features, using two feature selection techniques, and using SVM for 52 users classification recognition, the first experiment achieved more than 80% accuracy and the second experiment achieved more than 90% accuracy. Jasper and Othman [32] applied wavelet transform (WT) to analyze the signals in the Time-Frequency representation, then selected Shannon energy envelogram (SSE) as the feature set, and tested the performance of the feature set in a database of 10 people with an accuracy of 98.67%. Cheng et al. [1] introduced a human feature extraction method based on an improved circular convolution (ICC) slicing algorithm combined with independent subband function (ISF). The technology uses two recognition steps to obtain different human heart sound characteristics to ensure validity, and then uses similar distances for human heart sound pattern matching. The method was verified using 10 recorded heart sounds. The results show that the two-step recognition accuracy is 85.7%. Cheng et al. [8] used heart sound linear band frequency cepstrum (HS-LBFC) for feature extraction and used similar distances for classification. The results were done on 12 heart sound signals, with a verification rate as high as 95%, false acceptance rate = 1% to 8%, and a false rejection rate of less than 3%. Zhao et al. [11] used the heart sound database of 280 samples constructed by 40 users to test their proposed marginal spectrum (MS) features and validated them using 80 samples randomly selected from the open database HSCT-11. Gautam and Deepesh [33] proposed a new method for heart sound recognition, which is based on preprocessing using a low-pass filter, using autocorrelation to detect the cardiac cycle, and segmenting S1 and S2 by windowing and thresholding. The method used WT for feature extraction and back propagation multilayer perceptron artificial neural network (BP-MLP-ANN) for classification and the accuracy rate on 10 volunteers reached 90.52%, the EER reached 9.48%. Tan et al. [34] demonstrated a new method for heart sound authentication. The pre-processing is based on low-pass filtering, and then the heart sounds are segmented using zero-crossing rate (ZCR) and short-term amplitude (STA) techniques to extract S1 and S2 sounds. Features are extracted using MFCC, and features are classified using a sparse representation classifier (SRC). Fifteen users were randomly selected, and the best effect of 85.45% can be achieved. Verma and Tanuja [35] proposed a heart sound-based biometric recognition system that uses MFCC for feature extraction and SVM for classification. They studied 30 topics with an accuracy rate of 96%. Abo Zahhad et al. [36] proposed a heart sound recognition system based on 17 subjects with an accuracy rate of 99%. Features were selected using MFCC, LFCC, bark frequency cepstral coefficients (BFCC) and discrete wavelet transform (DWT), and fused using Cone Correlation Analysis (CCA). GMM and Bayesian rules were used for classification. Abo Zahhad et al. [37] used HSCT-11 and BioSec. databases to compare the biometric performance of MFCC, LFCC, wavelet packet cepstral coefficient (WPCC) and non-linear frequency cepstral coefficients (NLFCC), and the conclusion is that WPCC and NLFCC have better biometric performance in high noise scenarios.

Comparison with Related Literature
Compared with the above work using different feature extraction, classification methods and heart sound database, we can conclude that our method has the best effect in the same size heart sound database. The previous methods were performed on normal healthy subjects, without taking into account heart disease, and this paper conducted research on the open pathological heart sound library Michigan, Washington, and Littman. Different from the previous method, we first use the LR-HSMM-based heart sound segmentation method proposed by Springer [31] to segment the pre-processed heart sound record into a series of single-cycle heart sounds, and frame and window based on each cycle length to ensure that Each single cycle heart sound can get the same number of frames. Different from the previous method, we first introduced RCMDE features for heart sound biometrics, and selectively combined RCMDE with ICEEMDAN, FR and ED methods, and strived to improve the mixed Recognition rate of normal and pathological heart sounds with more fine characterization. The proposed method not only achieved a correct recognition rate of 96.08% on the open heart sound database, but also achieved a recognition rate of 97.5% on the 80 heart sound database composed of 40 healthy subjects constructed by the research group, and draw the conclusion that the single-cycle heart sound recognition rate from the first heart sound (S1) to the next S1 is the highest.

Conclusions
In the current research, based on the characteristics of the heart sound signal, the improved ensemble empirical mode decomposition (ICEEMDAN), fine composite multiscale dispersion entropy (RCMDE), Fisher ratio (FR) and Euclidean distance (ED) is used to study the mixed recognition of normal and pathological heart sounds, the following conclusions were reached: (1) Given the quasi-periodic and non-steady-state characteristics of heart sound signals, this paper first uses LR-HSMM-based heart sound segmentation to divide heart sounds into a series of single-cycle heart sounds, and framing and windowing based on each cycle length to ensure each single cycle heart sound can get the same number of frames.
(2) To solve the problem of unified representation of heart sound frames with different lengths, this paper first introduces RCMDE for heart sound biometric identification and selectively combines RCMDE with ICEEMDAN, FR and ED methods for heart sound Biometric characterization.
(3) The recognition rate of this method on the open pathological heart sound database Michigan, Washington and Littman reached 96.08%, that is, the method can effectively recognize normal and pathological heart sounds.
(4) To enhance its practical application value, this paper applies the proposed method to a self-built heart sound database. Research shows that the single-cycle heart sound recognition rate from the first heart sound (S1) to the next S1 is the highest, which is 97.5%.
Although the features of this article have been proven to have a good effect on heart sound biometrics, we believe that each biometric has its limitations, and the future research direction is bound to integrate the outstanding performance features, and then use the latest powerful classifiers, such as deep learning methods, achieve optimal recognition. It is even possible to consider using a combination of feature extraction techniques for different signals, such as Abo-Zahhad et al. [38] proposed to use both ECG and PCG signals in a multimodal biometric authentication system, and Bugdol, M.D. et al. [39] proposed the multimodal biometric system combining ECG and sound signals. Of course, we also need to consider the impact of subject age, database size, race, gender and disease status on the performance of the biometric system. In the future, we can use features that are less affected by these factors to fuse or use only specific features to biometrically identify specific populations, such as using the method in this article to biometrically identify people with heart disease, so the research in this article can be used as a foundation for future biological identification research.