A Robust and Real-Time Capable Envelope-Based Algorithm for Heart Sound Classification: Validation under Different Physiological Conditions

This paper proposes a robust and real-time capable algorithm for classification of the first and second heart sounds. The classification algorithm is based on the evaluation of the envelope curve of the phonocardiogram. For the evaluation, in contrast to other studies, measurements on 12 probands were conducted in different physiological conditions. Moreover, for each measurement the auscultation point, posture and physical stress were varied. The proposed envelope-based algorithm is tested with two different methods for envelope curve extraction: the Hilbert transform and the short-time Fourier transform. The performance of the classification of the first heart sounds is evaluated by using a reference electrocardiogram. Overall, by using the Hilbert transform, the algorithm has a better performance regarding the F1-score and computational effort. The proposed algorithm achieves for the S1 classification an F1-score up to 95.7% and in average 90.5%. The algorithm is robust against the age, BMI, posture, heart rate and auscultation point (except measurements on the back) of the subjects.


Introduction
Cardiovascular diseases are the leading cause of death worldwide [1,2]. According to [3], this trend will continue and deteriorate in the future. Apart from the personal consequences, the healthcare costs are a huge burden for society [3].
One approach to tackle this problem is the daily monitoring of vital parameters, e.g., from the electrocardiogram (ECG) and phonocardiogram (PCG) (Supplementary Materials) by use of wearable sensors [4], since abnormal properties can indicate cardiac diseases. For the latter type of signals, automatic heart sound detection and classification algorithms are under research. The most successful approaches are the envelope-based, probabilistic-based and the feature-based methods [5].
Feature-based methods, as e.g., proposed by [6][7][8][9][10], extract features such as the Shannon entropy, discrete wavelet transform (DWT), continuous wavelet transform (CWT) or mel-frequency cepstral coefficients (MFCC) out of the PCG signal. With the use of classifiers (for example support vector machine (SVM), twin support vector machine (TWSVM) or deep neural networks (DNN)), it can be determined, if the features correspond to a heart sound [5]. However, feature-based methods have the disadvantages of high computational effort and strong dependency on datasets for training [5]. As the PCG strongly varies between the subjects as well as with the posture, heart rate and auscultation point, this would require separate data sets for each of these conditions.

Heart Cycle and Heart Sounds
The heart cycle consists of the systole and diastole, which correspond to the contraction and relaxation of the heart, respectively. The beginning of the systole is marked by the beginning of the first heart sound S 1 , the diastole starts with the second heart sound S 2 . Compared to the ECG, the beginning of S 1 is synchronous to the peak of the R-wave (Figure 1). Within the diastole, the third S 3 and fourth S 4 heart sound occur. However, the third and fourth heart sound are only heard occasionally during an auscultation [24]. Since S 4 has less diagnostic value, it is neglected in Figure 1 [25]. The frequency spectrum of heart sounds is approximately  Hz [21,26]. The ratio of the systole and the diastole is 1:2 in the resting heart rate and decreases with higher heart rates. The optimal auscultation point is called Erb's point. At this point the heart sounds have the highest amplitudes and, therefore, the best results for auscultation can be achieved. [27]  Wiggers diagram: the blue curve shows the ECG and the grey curve the related heart sounds (PCG). A heart cycle is defined as a sequence of a systole and a diastole [27].

Methods for Envelope Curve Extraction
The PCG is a periodical signal, however it has a non-linear and non-stationary characteristic. In consequence, the frequency changes over time. For the detection of the heart sounds, the PCG is transformed to a simpler signal to investigate the intrinsic characteristic.

Hilbert Transform (HT)
The envelope curve of the PCG can be extracted with the HT. This method is appropriate for narrowband signals like heart sounds and can be computed with low computational costs [28]. The HT of a real-valued and time-continuous signal x(t) is defined as [29,30] H (x(t)) = 1 The related envelope curve

Short-Time Fourier Transform (STFT)
The Fourier transform (FT) extracts the spectral components of a signal. However, if the signal is non-stationary, it is not possible to reconstruct the signal in the time domain, since the coherence of time and frequency is lost. Thus, the STFT is introduced, which solves this problem by computing the FT only within a limited time window of length b. This includes the assumption that the signal is stationary during this time window [31]. The window is shifted along the time axis, with an overlap between the windows, which is denoted as k and given in percent. To restrict the full signal x(t) to the window interval, x(t) is multiplied by a window function w(τ). The STFT is computed as proposed in [32] Rabiner et al. figured out that a Hamming window fits best for non-stationary audio signals (like PCGs) [33]. The choice of the window length and the overlap are essential, since the time-and frequency resolution depend on it. A short window is related to a high time and low frequency resolution, whereas a long window results in a low time and a high frequency resolution. Therefore, k and b are optimized for the heart sound classification.
For detecting the heart sounds, the envelope curve is extracted by computing the power-spectraldensity P, which is also called the spectrogram. It is derived out of the Fourier coefficients [31]: For extracting the envelope curve, the maximum of P is determined for each time-step.

Algorithm for Heart Sound Classification
The classification algorithm was implemented in MATLAB R2019a. The scheme of the algorithm for the detection and classification of heart sounds is illustrated in Figure 2. The algorithm sequence started with the data preprocessing, which includes filtering the raw-data and synchronizing the PCG and ECG signal. An initial segment of the PCG signals (60 s) of 8s length, which was used for synchronization (see. Section 3.1.2), was discarded. The remaining PCG signals were divided into five intervals of equal duration and further analysed separately. In the next step, the envelope curve was extracted from the filtered PCG signal for both, the HT and STFT. For both envelope curves, the algorithm was applied separately. The peaks of the envelope curve were detected and the heart rate was estimated by using the autocorrelation function (ACF). In this way, the proposed algorithm distinguished between increased (>80 bps) and normal heart rates and classified the heart sounds, using two different approaches. The single steps of the algorithm are explained in the following.

Filtering of the Raw-Data
For eliminating the noise, which is caused by respiration, human speech, lung sounds or movement of the stethoscope, the PCG was filtered. The applied filter was a Butterworth bandpass, consisting of a low-pass and a high-pass filter, with a passband from f lower to f upper and filter orders N LP and N HP , respectively. Those parameters were used for the optimization of the algorithm (see Section 6.1). In Figure 3 a comparison between the raw data and the filtered PCG is illustrated.

Synchronization of the PCG and ECG Signal
As the R-peaks in the ECG are simultaneous with the first heart sounds S 1 (see Figure 1), the R-peaks were used to evaluate the correct detection and classification of S 1 . For that purpose, the PCG and ECG were synchronised by use of artefacts induced by knocking three times on the electrodes of the ECG. Therefore, the first 8 s of the PCG signals were excluded.

Envelope Curves
The basis of the algorithm was the envelope curve of the PCG, which was extracted by two different methods, namely the HT and the STFT (Figure 4). The first subplot shows a filtered PCG signal and the corresponding envelope curve, derived by the HT, is shown in the second subplot. In the third subplot, the spectral power in dB/Hz is plotted over time and frequency. The heart sounds have a high power density, so for detecting S 1 and S 2 the maximum value of the power spectral density is computed for every time step. These values are shown in the fourth subplot of Figure 4. The resulting curve was similar to the envelope curve derived by the HT.

Peak Detection
The detection of the peaks in the envelope curve was realized by computing the gradient. The conditions for a local maximum are a changing sign of the gradient from positive to negative. All peaks with an amplitude, which was larger than the defined threshold were considered for the heart sound classification: where x env (t) is the envelope curve and n is an arbitrary parameter, which is optimized (see Section 6.1). Furthermore, it was essential to restrict two maxima within the length of a heart sound. Therefore, a time window of 150 ms, which approximately corresponded to the maximal length of a heart sound [5,34], was applied. The global maximum within the window was assigned as the detected maximum and, therefore, as a potential heart sound. In consequence, the algorithm was able to deal with split heart sounds.

Extracting the Heart Rate
As shown in Figure 5 the heart cycle, as well as the length of the systole, can be computed by using the ACF, which is a robust and well-established tool for the heart rate estimation. Thus, the local maxima of the ACF have to be extracted. The PCG is a quasi periodic signal and has finite length, therefore, the local maxima of the ACF are periodic with decaying amplitudes, since the signal is shifted and in consequence the overlapping of the signals gets smaller. On the left side the envelope of the PCG is shown. In the respective rows, it is shifted in time. For reasons of clarity and comprehensibility, dashed lines are not drawn for all heart sounds. On the right side, the corresponding ACF is shown.
The first major maximum occurs when the original signal and the shifted signal fully overlap. At the first minor maximum, the second heart sounds of the origin signal and the first heart sounds of the shifted signal are overlapping, whereas the second minor maximum appears, when the first heart sounds of the original signal and the second heart sounds of the shifted signal interfere. The second major maximum appears, when the shifted signal again fully overlaps with the original signal. Thus, the average heart cycle corresponds to the distance between the first two major maxima. The distance between the first major maximum and the first minor maximum is extracted and corresponds to the average length of the systole (SYS). The second major maximum is extracted by a global maximum search within an interval of 1.5 s after the first major maximum. This corresponds to a heart rate of 40 bps, which is chosen as lower boundary of the heart rate for the proposed algorithm. In [35,36] an alternative approach for estimating the systolic length is presented. It is stated that the length of a systole decreases linear with the heart rate (HR). Therefore, the extracted HR of the ACF can be used to estimate the length of a systole in ms according to the empirical formulas SYS = −1.14 HR + 371.55 ms if HR > 80 bps −6.58 HR + 766.44 ms otherwise.
The two methods for the estimation of the systolic length were both tested for the heart sound classification. The results of the comparison are presented in Section 6.2. The average HR was the reciprocal value of the average heart cycle. The average diastole length (DIA) in ms is computed with The length of a systole is defined as the distance between the beginning of S 1 and the beginning of S 2 (see Section 2). However, for the algorithm, the length of a systole is determined by the distance between the major peaks of S 1 and S 2 . Heart sounds can be split especially during inhalation [37][38][39] (like first S 1 in Figure 10, or first S 2 in Figure 4). In the peak detection process, the global maximum within the maximal width of a heart sound [34] was detected, which could be located anywhere in that interval. Therefore, the distance between the first and the second heart sound could differ from the actual length of the corresponding systole. Therefore, a tolerance of 175 ms was applied on the length of a systole, which also took into account that the heart cycle can vary from one cycle to another. Thus, the tolerance of a systolic length was composed of the approx. maximal duration of a heart sound (150 ms) [34] and the standard deviation of 25 ms for a systolic length [35]. Hence, the minimal and maximal systolic length are given by The maximal and minimal length of a diastole are computed with

Peak Classification
For the peak classification, two different approaches were developed for two different heart rate domains. At normal heart rates, the amplitude of S 1 is not necessarily higher than that of S 2 [40]. The amplitude of S 1 increases approximately linear with the heart rate [41]. Thus, at increased heart rates, the amplitude of S 1 is higher than that of S 2 [42]. Therefore, at increased heart rates, the first and second heart sound can be distinguished based on their different amplitudes. Hence, at increased heart rates, noise and artefacts are negligible compared to the amplitudes of S 1 and S 2 .
3.5.1. Simple Heart Sound Classification for Increased Heart Rates (>80 bps) The peaks are classified into S 1 and S 2 by the condition where ∆x i is the i-th distance between two detected peaks and y i is the amplitude of the i-th peak. Therefore, the peak i is classified as S 1 and i + 1 as S 2 , respectively ( Figure 6a). If one S 2 is not detected, the following condition will classify the peak as S 1 (Figure 6b):

Complex Heart Sound Classification for Normal Heart Rates (<80 bps)
For normal heart rates, the simple algorithm has to be extended by additional steps. It is necessary that peaks (e. g. caused by S 3 , S 4 or artefacts), which would lead to wrong heart cycles, have to be neglected. Those extra peaks lead to invalid diastoles and have to be removed before the classification of the peaks. Therefore, the following condition is used: If this condition is fulfilled, the right peak of ∆x i is neglected (Figure 7a). For the special case that no S 2 is detected in ∆x i+2 and the condition (15) is true, the peak is removed as shown in Figure 7b.  The peaks, which are removed by applying the aforementioned condition, lead to an invalid diastole length. However, if one extra-peak occurs shortly before S 1 , it can not be removed, since the corresponding length of the diastole is valid. Therefore, in the next step the remaining peaks between two valid systoles are removed with (16) (see Figure 8a). In the case that one S 2 is missing and the following condition is true, the right peak of the ∆x i is removed (Figure 8b). In the case that two consecutive S 2 are missing and the following condition is true, the corresponding peaks are classified as S 1 . Due to the higher deviation of the diastolic length (heart rate variability), the algorithm performance is more stable by considering the systolic length for the aforementioned condition.
The remaining distances ∆x are correct systoles, diastoles and heart cycles, therefore, with the information of the derived heart rate, systolic and diastolic length, the corresponding peaks can be classified into S 1 and S 2 .
Valid Systole Valid Diastole Figure 8. Heart sound classification for normal heart rates: (a) One extra sound (red), which is near to an S 1 , exists within the diastole and no S 2 is missing; (b) One extra sound (red), which is near to an S 1 , exists within the diastole and one S 2 is missing (dashed lines).

Statistical Evaluation and Optimization
For the first heart sounds, the peaks of the R-wave out of the ECG are used as a reference, whereas the classification of the second heart sounds is not evaluated with the ECG. Therefore, the performance of the classification algorithm is statistically evaluated in terms of the sensitivity, specificity, accuracy, precision and the F 1 -score only for S 1 . These parameters are defined as For evaluating the proposed algorithm, a tolerance window TW, which was applied around the peak of the R-wave, was introduced. The window was necessary, since the synchronization of the PCG and ECG was only an approximation and the maximum of a heart sound did not always occur at the beginning of the corresponding heart sound. Taking the maximal duration of a heart sound in consideration, a tolerance window of 150 ms was appropriate [34,43]. If a peak, which was classified as S 1 , lay within the window, the heart sound was correct. f p is the number of wrongly classified S 1 peaks, which were outside of TW and f n are correct heart sounds, which were not detected by the algorithm. t p is the number of correctly classified S 1 peaks and t n is the number of correctly as false classified S 1 peaks.
All performance parameters were computed for all measurements for both, the HT and the STFT. The classification of the heart sounds was optimized to achieve the highest F 1 -score. In case of a high heart rate, the threshold n for peak detection was optimized separately. The results of the optimization process can be found in Section 6.1.

Measurement Devices and PC Setup
For recording the PCG, the electronic stethoscope 3M TM Littmann R model 3200 was used. The recorded data was sampled with 4 kHz. Chen et al. proposed that sampling rates above 5 kHz are not sufficient for heart sound recording, since for higher sampling rates, irrelevant sound components can be included [7]. The integrated microphone of the stethoscope amplifies frequencies between 20-200 Hz, since heart sounds are within this frequency range (see Section 2.1).
The classification of the first heart sounds was validated by an ECG. For this purpose a COR12 ECG device from Corscience was used, which has 12 channels and a sampling rate of 500 Hz. The classification algorithm was performed in MATLAB R2019a with a PC with an Intel R Xeon R E-2136 Processor at 3.3 GHz and 32 GB RAM. The computational time of the algorithm for the heart sound detection and classification was assessed for both, the HT and STFT.

Study Population and Protocol
For the study the PCG and ECG from 12 healthy male subjects, with no known heart diseases, were recorded. Their ages varied in a wide range between 24 and 68. An overview of the probands is given in Table 1. For every person, 10 different measurements were conducted. The duration of each measurement was 60 s. If nothing else was indicated, the measurement was conducted at Erb's point while the test person was sitting. The different types of measurements are listed in Table 2. As the first measurement was conducted under optimal conditions it serves as reference. For the other measurements the posture, physical stress and auscultation position were varied.

Results of Optimization
The performance of the presented algorithm was optimized regarding the F 1 -score. The values of the optimized parameters are listed in Table 3. The threshold parameters n normal and n high were greater for the HT, since through the averaging effect of the STFT, its resulting envelope curve was smoother. The cut-off frequencies of the HT and STFT for the low-pass filter were 40 Hz and 20 Hz, respectively, and the cut-off frequencies of the HT and STFT for the high-pass filter were 190 Hz and 120 Hz, respectively. The filter for both, the HT and STFT, were from the order of 10 for the low-pass filter and 4 for the high-pass, respectively.
The filter suppressed noise, which was caused by human voice, respiration or lung sounds. The fundamental frequency of human voice is approximately 120 Hz for male and 190 Hz for female, respectively [44]. Therefore, the applied filter eliminated the majority of human voices. However, no study about the influence of speaking during the measurements was made. The frequency range of lung sounds and respiration is approximately 60-1200 Hz [45,46]. Thus, lung sounds and breathing were partly suppressed by the filtering. However, artefacts from the lung could not be fully eliminated, since heart sounds occurred within that frequency range. The cut-off frequencies were the result of the optimization process, regarding the average F 1 -score.

Comparison of the Two Approaches for Systolic Length Estimation
As introduced in Section 3, two different methods were considered for the systolic length extraction: based on the ACF and based on the empirical formula 6. For each of these methods and for both, the HT-and STFT-based approach, F 1 -scores were calculated. The results for these combinations are shown in Figure 9. The proposed algorithm achieves a better performance by using the empirical formula for the systolic length estimation. Therefore, in the following the algorithm is evaluated by using the empirical approach. . Comparison of the ACF-based and the empirical-based method for the systolic length estimation: the average F 1 -score is plotted over the ten conducted measurements.

Results of Heart Sound Classification
The respective average values of the performance parameters are listed in Table 4. Furthermore, the average performance parameters were calculated without the measurements 4, 8 and 9, since the reference ECG was noisy for the measurement 4 and the measurements at the back (8 and 9) had a noisy PCG. The evaluation results for the F 1 -score for the single probands and measurements are shown in Table 5 and Table 6. Measurement 1 was the reference for the other measurements. It showd the best results, since it was conducted under optimal circumstances at Erb's point.

Varying the Auscultation Point
The measurements 1, 7, 8 and 9 were conducted during a sitting position and in the resting state. Only the auscultation point was varied. As the results in Tables 4-6 suggest, the F 1 -score was best for Erb's point (Measurement 1) as expected, whereas measurement 7 was performed at the sternum. The average F 1 -score was lower compared to the reference measurement 1 and varied largely between the single subjects. The reason for that performance is the lower amplitude of the heart sounds (see Section 2.1). Therefore, the signal-to-noise ratio suffered and noise could be misinterpreted as heart sounds.
The results for measurements 8 and 9, which were performed on the back of the subjects, provided poor results for heart sound classification. The reason for that is the weak acoustic signal, which is attenuated by the lunges and the backbone. In consequence, it is not advisable to place a wearable system for heart sound monitoring on the back, as suggested in [47]. Therefore, the average of the evaluation parameters was calculated without the measurements 8 and 9 as well.

Varying the Posture
The measurements 1, 3, 4, 5 and 6 were conducted with different postures of the probands. Measurement 3, 5 and 6, where the probands were lying on the back, lying on the right side and lying on the left side, showed similar values regarding the performance parameters. However, compared to measurement 1, the results were slightly worse.
The results for measurement 4 were poor, since during the measurement the subjects were lying on the stomach. This led in some cases to noise in the ECG, which was caused by movement of the electrodes. In consequence, the reference signal was distorted and the evaluation of the classification performance suffered. However, the PCG was not affected.

Varying the Physical Stress
Measurement 2 was conducted with deep breathing and measurement 10 after 5 minutes of sport, respectively. This reflects physical stress situations. The results of measurement 10 show that the classification of heart sounds worked well for increased heart rates. The average ratio of the amplitudes of S 1 and S 2 for increased heart rates was 1.8 for the STFT and 3.4 for the HT. Therefore, S 1 could be distinguished easily from noise as well as from S 2 . Moreover, the results of measurement 2 showed that deep breathing hardly affected the classification algorithm.

Influence of BMI
The probands were arranged according to their BMI and, therefore, divided into two groups of equal size. Group "low BMI" consists of proband 1, 3, 4, 6, 8 and 10 and group "high BMI" of 2, 5, 7, 9, 11 and 12. The average F 1 -score without the measurements 4, 8 and 9 was computed for both the HT and STFT and compared for both groups (see Table 7). The results for the group "high BMI" showed that the average F 1 -score was approximately 4% worse than the group "low BMI", regarding for both, the HT and STFT, since the heart sounds were more attenuated for higher BMIs. In consequence, the algorithm was quite robust towards a variation of the BMI, for envelope extraction for both the HT and STFT.

Comparison of HT and STFT
Overall, the average F 1 -score by using the HT for extracting the envelope curve was approx. 5% better than those with the STFT. Due to the fact that the STFT was computed within a time window, the time resolution was limited, since an appropriate frequency resolution was needed. In consequence, the number of samples was reduced and, therefore, the accuracy of the derived length of the systoles was smaller, resulting in incorrectly removed S 1 . This effect was even increased in case of split S 1 .
Furthermore, the classification for S 2 showed that the HT performed better. This is because the STFT was computed within a time interval, which led to an averaging of the amplitudes. Therefore, in some cases the maximal power spectral density was reduced, which was used as the envelope curve for the classification. An exemplary issue is shown in Figure 10. The fourth and fifth S 2 were not detected by using the STFT for envelope curve extraction. Figure 10. Example of classified heart sounds, where the detection of S 2 failed in some cases for the STFT (red). The reference ECG is shown in green.
Regarding the goal of a wearable sensor solution for daily health monitoring, the computational cost and time were essential. A comparison between the computational effort of the HT and STFT is given in Table 8. This means that a 60 s PCG signal was classified within approximately 140 ms for the HT and 480 ms for the STFT, respectively. In consequence, both methods can be regarded as real-time capable, but nevertheless, the algorithm based on the HT performed about 3 times faster than the STFT. Wearable systems have limited computational capacity as well as power supply. Thus, it is essential to use a computational low complex algorithm for the real-time monitoring of daily life activities. Hence, the HT can be ranked as more appropriate for this purpose compared to the STFT approach.

Comparison with other Approaches
In the following, the performance parameters for the S 1 classification of the proposed algorithm are compared to other algorithms for the heart sound classification (see Table 9). As aforementioned in Section 1, there are three well-established groups of algorithms for heart sound classification: the feature-based, probabilistic-based and envelope-based methods. Therefore, the performance of algorithms, which represent the state of the art, was compared to the presented algorithm ones. However, it has to be noted that the performance parameters could not be directly compared to each other, since the proposed data set differed from the others. Therefore, the reference measurement of the proposed data set was used, since it was conducted under optimal conditions like it is normally applied in the literature. Furthermore, no standards for measurements and evaluation of the algorithms exist, which led to non-uniform performance parameters. Since the performance of the presented algorithm is best by using the HT, it was used for the comparison. Table 9. Comparing the performance of the proposed algorithm with the state of the art heart sound classification algorithms. The performance parameters S, P, Acc and F 1 correspond to the sensitivity, precision, accuracy and F 1 -score, respectively. The performance parameters are only related to S 1 , if not stated otherwise in the notes.

Reference
Year The database size in the literature is in most cases very small compared to the proposed one (60 × 120 s). Only Springer et al. used a larger database than the proposed one [12]. Furthermore, in many approaches very short recordings are included in their database, for example~1 s by Renna et al. [14], or in total 87 heart sounds by Chen et al. [7]. Moreover, many researchers use a database like PhysioNet and do not declare their study population or recording length [6,9,39]. Other researchers have conducted their measurements under optimal conditions (apart from [39]) and no variation of the posture, auscultation point, physical stress and breathing was considered within their studies. Furthermore, the proposed study population includes only healthy subjects, in [16,18,39] this was also the case.
The feature-based methods have the best performance parameters [6,9]. However, since featurebased methods have a strong dependency on their training datasets and a high computational effort, they are not the favourable methods for a low complex wearable sensor platform to monitor daily activities in real-time. This holds also true for probabilistic-based methods presented in [12,14]. However, in [39] a real-time capable probabilistic-based method was realized with a low-cost smartphone platform. The performance, however, is poorer as the state of the art suggests, including the proposed one.
Even the average performance for different physiological conditions (e.g., physical stress, posture, BMI, auscultation point, breathing) of the presented algorithm is quite good compared with the state of the art. In consequence, the developed algorithm is robust and appropriate for a wearable sensor platform. The presented algorithm is not able to deal with more than one extra peak, nor is able to classify pathological sounds (e.g., murmur). Thus, the performance suffers, if more than one detected peak exists within a diastole. For the probabilistic-based methods even one extra peak can be problematic, since it can lead to wrong states in their sequence.

Conclusions
This paper presents an enveloped-based and real-time capable algorithm for the detection and classification of the heart sounds S 1 and S 2 in phonocardiograms (PCG). The peaks of the envelope curve were classified and the found S 1 were compared to the reference ECG. The algorithm was tested using the Hilbert transform (HT) and short-time Fourier transform (STFT) as methods to extract the envelope curve out of the PCG. The results for the heart sound classification suggested that using the HT is more favourable, due to the better performance parameters and lower computational effort. The developed algorithm is robust against the variation of the posture, heart rate, BMI, age and auscultation point, except for the back, since the PCG signals are attenuated by the lungs and backbone. As expected, the auscultation at Erb's point provides the best result followed by the sternum. The posture and physical effort hardly effect the performance of the proposed algorithm for heart sound classification. Furthermore, the algorithm is adapted in order to deal with additional peaks caused by noise and an equal length of the systole and the diastole by an increased heart rate, respectively. Thus, the proposed measurements reflect and predict daily situations of the probands.
In the future, the envelope curves of the HT and STFT will be combined in order to increase the accuracy of the classification, since both envelope curves contain different information. Moreover, the heart rate could be estimated with the non-negative matrix factorization (NMF) out of the spectrogram, as suggested by [23]. Therefore, the algorithm with the STFT approach could be improved. Furthermore, the presented algorithm will be combined with activity classification, as proposed in [4]. For this purpose, the computational effort of the proposed algorithm must be reduced by an optimization of the implemented code as well as a reduction of the sampling rate of the PCG.