Next Article in Journal
Exercise Stress Echocardiography in the Diagnostic Evaluation of Heart Failure with Preserved Ejection Fraction
Previous Article in Journal
The Effect of a Single Bout of Resistance Exercise with Blood Flow Restriction on Arterial Stiffness in Older People with Slow Gait Speed: A Pilot Randomized Study
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Transfer Learning Models for Detecting Six Categories of Phonocardiogram Recordings

1
School of Biomedical Engineering, Dalian University of Technology, Dalian 116024, China
2
School of Instrument Science and Engineering, Southeast University, Nanjing 214135, China
*
Author to whom correspondence should be addressed.
J. Cardiovasc. Dev. Dis. 2022, 9(3), 86; https://doi.org/10.3390/jcdd9030086
Submission received: 5 February 2022 / Revised: 9 March 2022 / Accepted: 14 March 2022 / Published: 16 March 2022

Abstract

:
Background and aims: Auscultation is a cheap and fundamental technique for detecting cardiovascular disease effectively. Doctors’ abilities in auscultation are varied. Sometimes, there may be cases of misdiagnosis, even when auscultation is performed by an experienced doctor. Hence, it is necessary to propose accurate computational tools to assist auscultation, especially in developing countries. Artificial intelligence technology can be an efficient diagnostic tool for detecting cardiovascular disease. This work proposed an automatic multiple classification method for cardiovascular disease detection by heart sound signals. Methods and results: In this work, a 1D heart sound signal is translated into its corresponding 3D spectrogram using continuous wavelet transform (CWT). In total, six classes of heart sound data are used in this experiment. We combine an open database (including five classes of heart sound data: aortic stenosis, mitral regurgitation, mitral stenosis, mitral valve prolapse and normal) with one class (pulmonary hypertension) of heart sound data collected by ourselves to perform the experiment. To make the method robust in a noisy environment, the background deformation technique is used before training. Then, 10 transfer learning networks (GoogleNet, SqueezeNet, DarkNet19, MobileNetv2, Inception-ResNetv2, DenseNet201, Inceptionv3, ResNet101, NasNet-Large, and Xception) are used for comparison. Furthermore, other models (LSTM and CNN) are also compared with our proposed algorithm. The experimental results show that four transfer learning networks (ResNet101, DenseNet201, DarkNet19 and GoogleNet) outperformed their peer models with an accuracy of 0.98 to detect the multiple heart diseases. The performances have been validated both in the original heart sound and the augmented heart sound using 10-fold cross validation. The results of these 10 folds are reported in this research. Conclusions: Our method obtained high classification accuracy even under a noisy background, which suggests that the proposed classification method could be used in auxiliary diagnosis for cardiovascular diseases.

1. Introduction

Heart disease morbidity and mortality are increasing year after year. Meanwhile, heart disease has become a serious disease threatening human health. There are various methods for diagnosing cardiovascular diseases [1]. Among them, the most common methods are electrocardiograms (ECG) and phonocardiograms (PCG), which are used for detection of heart diseases. ECG can evaluate the condition of the heart work directly. However, in some cases, the ECG cannot reflect all existing disorders, such as the presence of heart murmurs [2].
In the clinical examination, the doctors first listen to the sounds on the surface of the patients’ chest by the stethoscope. These sounds are called heart sounds (HSs), and the recording of the HSs is called phonocardiogram (PCG). The PCG can reflect the condition of the cardiovascular system comprehensively, which contains pathological or physiological information of the heart. Therefore, PCG has great value in assisting doctors to diagnose or analyze different kinds of heart diseases [3].
As we know, there are four valves (the mitral valve, tricuspid valve, pulmonary valve, aortic valve) in the heart. If there is a problem with these heart valves opening or closing, there will be damage to the heart which may cause heart valve disease. Heart valvular diseases usually involve mitral stenosis (MS), mitral regurgitation (MR), aortic stenosis (AS), and mitral valve prolapse (MVP). These different heart valvular diseases reflect different features on heart sounds.
Mitral stenosis: Mitral stenosis will cause rheumatic heart disease. Diastolic blood flows from the left atrium through the narrow mitral valve to the right ventricle. It will generate low-pitched murmurs. The murmur can be heard best at the apex.
Mitral regurgitation: The murmur of mitral regurgitation is generated as blood regurgitates from the left ventricle to left atrium. The first heart sounds (S1s) are very soft. We can hear a pan-systolic murmur best at the apex of heart.
Aortic stenosis: The murmur of aortic stenosis is a systolic ejection murmur that peaks early in systole. It is heard best at the second right interspace.
Mitral valve prolapse: If mitral valve prolapse is present, then a mid-systolic click may be heard, followed by a late systolic murmur.
In addition, we also provide the data on patients of pulmonary hypertension (PH).
Pulmonary hypertension: PH is a hemodynamic and pathophysiological condition in which the pulmonary artery pressure rises above a certain threshold. Symptoms of heart sound findings include augmented second heart sound (such as P2 component), tricuspid regurgitant, and the third heart sound (S3) gallop.
However, doctors are not always able to diagnose heart diseases accurately by simply listening or observing a HS record. For this reason, studies on PCG have been increased to make it easier for doctors to make a diagnosis. In recent years, computer-assisted detection technology for the processing and analysis of heart sound signals have made remarkable achievements and aroused great interest [4,5,6,7,8].
Currently, smart detection of PCG technology has not been widely used in real-life clinical diagnosis, and the main method used for detection of heart sounds is still artificial auscultation. Therefore, research and application of computer-aided heart sound detection techniques will greatly facilitate the development in the field of cardiovascular disease diagnosis. From the existing research literature, there were mainly four strides used to detect cardiovascular disease: (1) pre-processing of the heart sound signals, (2) segmentation of the first heart sounds (S1s) and the second heart sounds (S2s) or division of cardiac cycles, (3) extraction of features, and (4) recognition of normal and abnormal HS recordings. In general, manual operation or algorithms extract the key features from PCG signals first. Then, they compare the monitoring sequence of the patients with the tagged database. At last, more intuitive diagnostic results can be obtained automatically.
In early years, many researchers paid close attention to the location of the boundaries of HS components (such as: S1s and S2s) [9,10,11,12,13,14,15]. However, these segmentation methods may be inaccurate with the massive growth of databases today. If the segmentation is inaccurate, then the detection of cardiovascular disease will even be more inaccurate. Therefore, most of the current methods are based on feature extraction to detect heart diseases instead of segmentation of S1s and S2s. In our research, we classify the PCGs without segmentation of HSs.
In the feature-extraction stage, it is worth noting that some of the features of one-dimensional signals are similar in diverse cardiovascular diseases. These similar features may influence the results of the multi-classification. As a consequence, it is particularly important to magnify the variedness in different features of the heart diseases. Many researchers have extracted manual features [16,17,18]. Most of these handcrafted features have physiological explanations, such as the amplitude, time interval, kurtosis, energy ratio, MFCC, and entropy etc. These features have usually been used to conduct binary classification (normal PCG vs. abnormal PCG) by previous researchers. The computation of these manual features is small and simple, but may be not good at multi-classification and new databases. For deep networks with complex and deep structures, the classification effect may be poor. Hence, deep features are needed for multi-classification of heart diseases. Some researchers have used deep-learning models to extract deeper features automatically, such as CNN or other ANN models [19,20,21]. Additionally, their results were better than the results of manual features extraction.
Table 1 shows a detailed comparison with some recent existing excellent work. However, there are some limitations in the field of HSs classification due to the few clinical databases. Additionally, most of the studies have focused on binary classification. At the same time, most of the validation and training was based on one single database (such as PASCAL or Open heart sound database). This is because of the absence of multi-labels heart sound databases and corresponding annotations of the categories of heart sounds from the databases. To solve this problem, we combine the databases from the website [22] with the data collected by ourselves together, which have six categories of heart sound signals in total (normal, mitral stenosis, mitral regurgitation, mitral valve prolapse, aortic stenosis and pulmonary hypertension). Furthermore, in our research, the proposed method is validated based on data augmentation condition. This works very well under the different noise recordings based on the heart sound augmentation method.
This paper is organized as follows: Part 2 introduces the two databases applied in our research. Part 3 describes the detailed method, such as the CWT for the creation of the time–frequency images and the transfer learning models with the augmented databases. Part 4 describes the results of the 10 transfer learning models. At the same time, the transfer learning models have compared the results with other multi-classification results. Part 5 contains the conclusion of this paper’s proposed method.

2. Database Details

(Database A) The phonocardiogram database [22] is used as one of our databases. The database includes 1000 audio recordings (the exact number of people is unclear) which are in the format of wav audio. The sampling rate is 8000 Hz. There are five categories of heart sound signals, which are normal (N) and four major valvular heart diseases: mitral stenosis (MS), mitral valve prolapse (MVP), mitral regurgitation (MR), aortic stenosis (AS). Each category has 200 HS recordings (200 audio recordings/per category). The duration of heart sound signals ranges from 1.1556 s to 3.9929 s in the database A. We take the HS signal time length up to 1.1556 s according to the minimum time length of HS signal in database A. The database of the five categories of original heart sounds can be obtained at: https://github.com/yaseen21khan/Classification-of-Heart-Sound-Signal-Using-Multiple-Features-/blob/master/README.md (accessed on 10 September 20212).
(Database B) The second database was collected at the Second Hospital of Dalian Medical University. All the subjects were informed to the study and signed the study participation consent. The database B contains 74 PH subjects, including 102 recordings in total. The sampling rate is 2000 Hz. We select two non-overlapping segments according to the time length of 1.1556 s from each original recording randomly in database B. Therefore, there are 204 recordings after we re-split the recordings. To be consistent with the numbers in database A, we select the first 200 heart sound recordings as database B.
The details of the two given databases are described in Table 2. Additionally, typical examples of the PCG signals of the represented classes are shown in Figure 1. The PH database can be obtained at: https://github.com/wangmiao1992/pulmonary-hypertension-database/tree/main (accessed on 12 January 2022).

3. Methodology

The main objective of this work is to apply transfer leaning networks to detect major cardiac diseases using HS recordings automatically. Figure 2 is the framework of this paper proposed approach. In summary, the research is divided into four steps: (1) Acquire the heart sound recordings, one is from the online database and the other one is PH subjects’ recordings collected by ourselves from the hospital; (2) Signal pre-processing including denoising, amplitude normalization and data augmentation; (3) One-dimensional heart sound signal is converted to three-dimensional time–frequency image which can help to improve the performance of the multi-classification results; (4) Apply transfer learning architectures to classify these images for training and testing the models in 10-fold cross validation. The proposed flow path could be used for multi-classification diagnosis of major heart diseases by PCG signals automatically.
The details on the programming used in the experiments are already uploaded to GitHub. Please check the website: https://github.com/wangmiao1992/pulmonary-hypertension-database/tree/main (accessed on 12 January 2022).
Furthermore, the software platform to run the proposed method is based on Matlab 2021a.

3.1. Signal Preprocessing

The sampling frequency of database A is 8000 Hz. However, the sampling frequency of database B is 2000 Hz. We only conduct preprocessing of database B and retain the original signal of database A. To eliminate the difference in sampling frequency, the sampling frequency of database A is reduced to 2000 Hz. Then, each heart sound signal in the two databases has fixed sample length of 2312.
The signal quality of the database A is good, while the heart sound recordings from database B include slight noise. As we know, the frequency of heart sound signal is usually between 50 Hz and 150 Hz [28]. Digital filters can be used to remove the low- and high-frequency components. In this paper, the HS signals pass a third-order Butterworth filter with bandwidth in the range of 15 Hz to 150 Hz and reverses the filtered sequence and runs it back through the filter to remove the noise outside the bandwidth and avoid time delay. Subsequently, the signals in both database A and database B are normalized using Equation (1).
X norm = x | x max |

3.2. PCG Augmentations

Heart sound signal is a time series signal, its characteristics and individual differences hinder the application of the traditional data augmentation methods in the field of heart sound signal. Hence, how to explore a more effective and more suitable augmentation method from the original heart sound signal is an important problem for building of the multi-label heart sound diagnosis system.
The operation process of data augmentation usually included flip, rotate/reflection, shift, zoom, contrast, color and noise disturbance [29,30,31]. However, these data augmentation methods in the image field only change basic information such as position and angle from a macro perspective, and these methods can only apply in the field of simple computer vision methods such as image recognition, which can not be applied to data augmentation of heart sound signals.
In this research, the PCG augmentation method applies a 1D signal augmentation mechanism. The augmentation method includes HS signals under various cases in order to recognize the model with stronger generalization performance. The methodology explores background formations, and at the same time the transfer learning models are able to categorize various heart sound signals even in a noisy circumstance.
There is a given heart sound signal represented as ‘original_signal’. At the same time, the same-size background transformations are generated stochastically. The background transformation is displayed as ‘random_signal’, where the ‘delta’ represents the parameter of the deformation control. The background deformation ‘delta’ belongs to the interval “(0,1)". An augmented signal is calculated based on Equation (2), which is generated based on the random background noise mixed with the original heart sound signal. It should be noted that in the testing unit there is no data augmentation. Figure 3 describes the effect of the data augmentation. Figure 3a represents the original heart sound signal; Figure 3b represents the augmented heart sound signal through the Equation (2); Figure 3c represents the denoised signal of the Figure 3b. Table 3 summaries the recording distribution after data augmentation. Finally, the database contains 2400 PCG recordings in total. There are 400 PCG recordings in each class.
augmentation _ signal = original _ signal + delta random _ signal  

3.3. Creating Time–Frequency Representations

Time–frequency transformation is a common approach in the classification of speech events to extract a time–frequency representation of sound. Time–frequency representation is to convert a one-dimensional signal into a three-dimensional image representation. After that, the features extracted from the transformation are used to identify the most likely source of sound. Based on the investigation in [32], the authors conclude that among three time–frequency representations (short-time Fourier transform (STFT), Wigner distribution, and continuous wavelet transform (CWT)), CWT gives the clearest presentation of the time–frequency content for PCG signals.
The CWT spectrogram is produced by Morse analysis. A magnitude spectrogram of the heart sound signal is calculated for each sample. These spectrograms are used to train and test the transfer learning models. The CWT of a heart sound signal x ( t ) is defined in (3), and Equation (4) is the Morse analytic wavelet:
W ( a , b ) = x ( t ) 1 a ψ ( t b a ) d t
ψ ( t ) = e t 2 cos ( π 2 ln 2 t )
where x ( t ) is a heart sound signal, ψ ( t ) is the mother wavelet, and a and b are the parameters that manage the scaling and translation of the wavelet, respectively. The CWT is calculated by varying a and b continuously over the range of scales and the length of the heart sound signal, respectively.
The CWT provide superior time and frequency resolution. This allows for different-sized analysis windows at different frequencies. The spectrograms of the heart sound signals show the frequencies at different times and provide an optical presentation that can be used to tell apart the various heart sounds. The CWT creates 3D scalogram data and they are stored as RGB images. To match the inputs of different transfer learning architectures, each RGB image is resized to an array of size n-by-m-by-3. For example, for the GoogLeNet architecture, the RGB image is resized to an array of size 224-by-224-by-3. The six typical spectrograms of HS signal are shown in Figure 4. Figure 4a represents the spectrogram of the original heart sound signals; Figure 4b represents the spectrogram of the augmented heart sound signals.

3.4. Architecture of Transfer Learning for PCG Multiple Classification

Transfer learning aims at utilizing the acquired knowledge on target domains to address other problems in different but related areas. This approach may be a better choice than some simple structures of CNN models. However, transfer learning is rarely reported and considered for classifying PCG signals.
In this work, the heart sound signal is converted into its corresponding pattern based on CWT spectrogram, and two syncretic databases of HS signals are taken as one database to perform experimentation. The 10 existing transfer learning models (Squeezenet, Googlenet, NasNet-Large, Inceptionv3, Densenet201, DarkNet19, Mobilenetv2, Resnet101, Xception and Inceptionresnetv2) are used to classify the heart sound signals into six categories (N, AS, MR, MS, MVP, PH). These parameters of transfer learning models are shown in Table 4. It is worth noting that the different transfer learning models have different image input sizes, therefore the generated images should follow the input size of the models. Table 4 shows the image input sizes of the models. Additionally, Figure 5 illustrates the flow chart of transfer learning.
In this research, the pre-trained transfer learning networks’ parameters are modified and some of the architectures are fine-tuned. The earlier layers identify more common features of images, such as blobs, edges, and colors. Subsequent layers focus on more specific characteristics in order to differentiate categories.
For example, the original GoogLeNet is pretrained to categorize various pictures into 1000 target categories. However, in this research filed, we retrain GoogLeNet for solving the problem of PCG classification. To prevent over-fitting of the transfer learning model, a dropout layer is used. The final dropout layer (‘pool5-drop_7x7_s1’) is replaced for a dropout layer of probability 0.6. Furthermore, we also replace the layers of ‘loss3-classifier’ and ‘output’ with a new fully connected layer to adapt to the new data. At last, the learning rate factor is increased to 0.001. This is an iterative processing for training a neural network for minimizing the loss function. A gradient descent algorithm is used to minimize the loss function. In each reiteration, the loss function gradient is assessed, and at the same time the weight of the drop algorithm is updated. We set mini-batch size to 10 and max epochs to 15. In this paper, the stochastic gradient descent with momentum optimizer is applied. The other transfer learning models have the same settings as above.
The analysis and model development are performed in a workstation with hardware/software configuration and specification as follows: DELL(R) Precision T3240 i7-10700, Graphical Processing Units (GPU) NVIDIA Quadro RTX3000, 64GB RAM, and 64-bit Windows 10.

3.5. Model Training and Testing

The proposed methods used diverse HS data for training, validation and testing. The training and testing are based on 10-fold cross validation. In this research, nine folds are used for training the transfer learning models while one fold is used for testing. The process iterates repeatedly to ensure the coverage of the entire database for training and testing conditions. The choice of each fold is not based on independent subjects, because all the recordings of the patients are put in one collection. The training data includes all of the augmented data and 90% of the original heart sound data (There are 3800 heart sound recordings which include 2000 augmented heart sound recordings and 1800 original heart sound recordings). The testing data only include 10% of the original heart sound data (includes 200 original recordings).

3.6. Assessment Indicators

To evaluate the performance of the methodology in this paper, four indicators, accuracy, precision, recall and Fl-score, are used. Accuracy (ACC) is the indicator of all the correct recognition events. Precision and recall are powerful estimations when the database is quite imbalanced. Additionally, the F1_score is defined as the harmonic mean of precision and recall. The equations of the performance are calculated as follows:
A C C = T r u e _ P o s i t i v e + T r u e _ n e g a t i v e T r u e _ P o s i t i v e + T r u e _ n e g a t i v e + F a l s e _ P o s i t i v e + F a l s e _ n e g a t i v e
P r e c i s i o n = T r u e _ P o s i t i v e T r u e _ P o s i t i v e + F a l s e _ P o s i t i v e
R e c a l l = T r u e _ P o s i t i v e T r u e _ P o s i t i v e + F a l s e _ n e g a t i v e
F 1 _ s c o r e = 2 × P r e c i s i o n × R e c a l l P r e c i s i o n + R e c a l l

4. Experiment Results and Discussions

4.1. Experiment Results

Table 5 shows one transfer learning model’s result—the accuracy with loss of the GoogleNet training and testing in the augmented PCG database and the original PCG database, respectively. The experiment is performed on the 10-fold cross validation. Here, in Table 5, there are some parameters, such as train samples, test samples, training accuracy (Acc), testing accuracy (Val Acc), training loss (Loss) and testing loss (Val Loss) on each fold. Table 5a shows 10-fold cross validation results on the augmented PCG database. Table 5b shows 10-fold cross validation results on the original database. It is visible from the Table 5a,b that the proposed methods achieve an average of 98% accuracy in classification of six categories both with the augmented data training and the original data training. The results of the PCG database show the efficiency of this method. Furthermore, we also evaluate the impact of the PCG augmentation method with additional background deformation.
Figure 6 and Figure 7 represent the confusion matrix for the entire 10-fold with multiple classification estimations. At the same time, Table 6 shows the performance (precision, recall and Fl-score) of the GoogleNet structure for the multiple classifications of different heart diseases for all 10 folds with various indicators on the augmented PCG database and on the original PCG database.
Figure 8 shows the receiver operating characteristic (ROC) curve for the GoogleNet model results of multiple classifiers for six categories of heart sounds with AUC area. There are six colors which represent different categories of heart sounds. Figure 8a shows the ROC curve on the augmented PCG database; Figure 8b shows the ROC curve on the original PCG database.
Figure 9 shows the comparison of the different confusion matrices for the multi-classification by the other transfer learning models, such as Xception convolutional neural network, NASNet-Large convolutional neural network, resnet101, inceptionv3, densenet201, Inception-ResNet-v2, mobilenetv2, darknet, and squeezenet. Table 7 presents the accuracy, recall, precision, and F1-scores corresponding to these transfer learning models, where the results show that the Resnet101, Densenet201, Darknet and the model before we used GoogleNet obtained good accuracy in comparison with peer approaches.

4.2. Experiment Discussions

The methods proposed by the authors have only considered one depiction of heart sound signals, which is spectrogram from the HSs. The spectrogram is an image representation of sound signals in the time–frequency domain. The inputted heart sound signals have been first converted into the respective spectrogram and then are classified further into six categories using the transfer learning models.
In addition, Table 8 describes the other methods without transfer learning. It shows the accuracy of other methods compared with the transfer learning models in the multiple classification of heart diseases. The comparison of these different methods is performed on the same database. However, the result from Table 8 shows that the accuracy is very low. We conduct two controlled trials: (1) Original 1D PCG signals are inputted into the Bi-LSTM network for six categories of heart sound classification, but an accuracy of only 21.67% is achieved; (2) The 3D images of the heart sound spectrogram are inputted into the simple CNN network only with three convolution layers, and the accuracy is only 76.67%.
Compared with B-mode ultrasonography, nuclear magnetic resonance imaging, computed tomography and so on, phonocardiography has the characteristics of being non-invasive, non-destructive, good repeatability, simple operation and low cost, which could be applied for the prevention, preliminary diagnosis and long-term monitoring of related diseases. With the development of digital medical technology and biological technology, researchers have increased the demands on the processing and analysis of heart sound signal in related fields. Automatic analysis methods for processing of medical sequence signals can share the responsibility and pressure of the medical domain, and provide long-term monitoring of disease. At the same time, they can help medical staff to grasp the condition better then work out plans for disease prevention and treatment. Thereby, doctors can enhance the overall health of society.
Despite advancements in the automatic diagnosis of heart sounds domain, there are still some limitations to be overcome to develop this technology further. For example, database deficiencies, huge feature extraction and low accuracy in multiple classification of heart disease. Solving these challenges can allow deep-learning technology to obtain a huge breakthrough in the field of human health. In our paper, we provide a heart sound database of pulmonary hypertension which is the first heart sound database related to pulmonary hypertension. Furthermore, feature extraction of heart sounds often takes a lot of time to acquire, which is a limitation. For this reason, we also proposed one-dimensional signal transfer to the three-dimensional image for training and testing, which can generate features automatically by a convolution layer in the heart sound domain. At last, we propose transfer learning technologies to diagnose multiple heart sounds and obtain a good performance. This overcomes the independent learning pattern through applying previously learned knowledge to solve similar problems. It is important for small-sample-size data to use transfer learning in the artificial intelligence domain because the pre-trained weights can be more efficient in training and obtain a better performance.
In this work, we suspect that the diversity of the augmented data can contribute to the networks’ ability to generalize to unseen data during the training stage. Data augmentation can improve the robustness of training. Comparison is made with traditional methods and transfer learning. In all experiments, the transfer leaning networks performed better on the task than other simple networks (such as Convolutional Neural Network and Long Short-Term Memory Network), as shown by several performance metrics. This approach has the potential to provide physicians with an efficient and accurate means to triage patients.
The proposed approach would have a significant impact in clinical situations by assisting medical doctors in decision making regarding different kinds of heart diseases. Our model performs efficiently in predicting the occurrence of an abnormality in a recorded signal. Moreover, it is tested on specific valvular diseases and patients with pulmonary hypertension who are not easily diagnosed early.
In summary, there are three contributions of this work. (1) The first is that we provide a new type of heart sound database (PH database). Additionally, our methods are validated under different conditions of HS databases. (2) The second is that we use a HS data augmentation strategy for completely automatic heart disease diagnosis. The method of augmentation improves the robustness of the heart diseases diagnosis, especially in noisy environments. (3) According to the published literature, transfer learning is rarely applied in the field of heart sound classification. We use 10 transfer learning models to verify the classification methods. We obtain a low error rate and great accuracy (0.98 accuracy for six categories of heart sounds) for multiple classification of heart diseases, which help to cope with multiple classification issues.

5. Conclusion

Heart sound signals carry important information about the function of heart valves during heartbeats. Therefore, these signals are very important in diagnosing heart problems at an early stage. To detect heart problems with great precision, we apply the transfer learning architectures based on the CWT method under background deformation for classifying PCG. In total, we use 10 transfer learning models to build the methods. The classification results are good even they are validated by a fusion of two different databases. The results show the method is robust. This may be particularly useful in remote areas or community hospital screening activities.

Author Contributions

The PH data was collected by M.W.; The study was proposed and written by M.W.; H.T. verified the data and modified the paper. B.G. and Z.Z. analyzed the data. Y.H. corrected grammar issues. C.L. provided experimental help. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the National Natural Science Foundation of China (Grant Nos. 61971089 and 61471081), the National Key R&D Program of the Ministry of Science and Technology of China (2020YFC2004400) and Supported by the Open Research Fund of State Key Laboratory of Bioelectronics (Grant Nos: Sk1b2021-k01), Southeast University.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Biological and Medical Ethics Committee in Dalian University of Technology and The Second Hospital of DaLian Medical University (protocol code: DUTIEE190311_05, 2019/3/11).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study. Written informed consent has been obtained from the patient(s) to publish this paper.

Data Availability Statement

The data presented in this study are openly available in https://github.com/wangmiao1992/pulmonary-hypertension-database/tree/main (accessed on 12 January 2022).

Acknowledgments

The authors would like to thank G.W. at The Second Hospital of DaLian Medical University. He helped us to collect the clinical PH data.

Conflicts of Interest

The authors declare that there was no financial interest or personal relationship that appeared to affect the work reported in this paper.

References

  1. Li, S.; Li, F.; Tang, S.; Xiong, W. A Review of Computer-Aided Heart Sound Detection Techniques. BioMed Res. Int. 2020, 2020, 5846191. [Google Scholar] [CrossRef] [PubMed]
  2. Abdollahpur, M.; Ghaffari, A.; Ghiasi, S.; Mollakazemi, M.J. Detection of pathological heart sounds. Physiol. Meas. 2017, 38, 1616–1630. [Google Scholar] [CrossRef] [PubMed]
  3. Ismail, S.; Siddiqi, I.; Akram, U. Localization and classification of heart beats in phonocardiography signals—A comprehensive review. EURASIP J. Adv. Signal Process. 2018, 2018, 26. [Google Scholar] [CrossRef] [Green Version]
  4. Aziz, S.; Khan, M.U.; Alhaisoni, M.; Akram, T.; Altaf, M. Phonocardiogram Signal Processing for Automatic Diagnosis of Congenital Heart Disorders through Fusion of Temporal and Cepstral Features. Sensors 2020, 20, 3790. [Google Scholar] [CrossRef] [PubMed]
  5. Tiwari, S.; Jain, A.; Sharma, A.K.; Almustafa, K.M. Phonocardiogram Signal Based Multi-Class Cardiac Diagnostic Decision Support System. IEEE Access 2021, 9, 110710–110722. [Google Scholar] [CrossRef]
  6. Wei, W.; Zhan, G.; Wang, X.; Zhang, P.; Yan, Y. A novel method for automatic heart murmur diagnosis using phonocardiogram. In Proceedings of the 2019 International Conference on Artificial Intelligence and Advanced Manufacturing, Dublin, Ireland, 17–19 October 2019; pp. 1–6. [Google Scholar]
  7. Firuzbakht, F.; Fallah, A.; Rashidi, S.; Khoshnood, E.R. Abnormal heart sound diagnosis based on phonocardiogram signal processing. In Proceedings of the Electrical Engineering (ICEE), Iranian Conference on, Mashhad, Iran, 8–10 May 2018; pp. 1450–1455. [Google Scholar]
  8. Lv, J.; Dong, B.; Lei, H. Artificial intelligence-assisted auscultation in detecting congenital heart disease. Eur. Heart J.-Digit. Health 2021, 2, 119–124. [Google Scholar] [CrossRef]
  9. Liang, H.; Lukkarinen, S.; Hartimo, I. Heart sound segmentation algorithm based on heart sound envelogram. In Computers in Cardiology 1997; IEEE: Lund, Sweden, 1997; pp. 105–108. [Google Scholar]
  10. Kumar, D.; de Carvalho, P.; Antunes, M.; Henriques, J.; Melo, A.S.E.; Schmidt, R.; Habetha, J. Third Heart Sound Detection Using Wavelet Transform-Simplicity Filter. In Proceedings of the 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Lyon, France, 22–26 August 2007; pp. 1277–1281. [Google Scholar]
  11. Moukadem, A.; Dieterlen, A.; Hueber, N.; Brandt, C. A robust heart sounds segmentation module based on S-transform. Biomed. Signal Process. Control 2013, 8, 273–281. [Google Scholar] [CrossRef] [Green Version]
  12. Springer, D.B.; Tarassenko, L.; Clifford, G.D. Logistic regression-HSMM-based heart sound segmentation. IEEE Trans. Biomed. Eng. 2016, 63, 822–832. [Google Scholar] [CrossRef]
  13. Naseri, H.; Homaeinezha, M.R. Detection and boundary identification of phonocardiogram sounds using an expert frequency-energy based metric. Ann. Biomed. Enq. 2013, 41, 279–292. [Google Scholar] [CrossRef]
  14. Boutana, D.; Benidir, M.; Barkat, B. Segmentation and identification of some pathological phonocardiogram signals using time-frequency analysis. IET Signal Process. 2011, 5, 527–537. [Google Scholar] [CrossRef]
  15. Tang, H.; Li, T.; Qiu, T.; Park, Y. Segmentation of heart sounds based on dynamic clustering. Biomed. Signal Process. Control 2012, 7, 509–516. [Google Scholar] [CrossRef]
  16. Tang, H.; Wang, M.; Hu, Y.; Guo, B.; Li, T. Automated Signal Quality Assessment for Heart Sound Signal by Novel Features and Evaluation in Open Public Datasets. BioMed Res. Int. 2021, 2021, 7565398. [Google Scholar] [CrossRef] [PubMed]
  17. Kui, H.; Pan, J.; Zong, R.; Yang, H.; Wang, W. Heart sound classification based on log Mel-frequency spectral coefficients features and convolutional neural networks. Biomed. Signal Process. Control 2021, 69, 102893. [Google Scholar] [CrossRef]
  18. Deperlioglu, O. Heart sound classification with signal instant energy and stacked autoencoder network. Biomed. Signal Process. Control 2021, 64, 102211. [Google Scholar] [CrossRef]
  19. Bilal, E.R.M. Heart sounds classification using convolutional neural network with 1D-local binary pattern and 1D-local ternary pattern features. Appl. Acoust. 2021, 180, 108152. [Google Scholar]
  20. Iqtidar, K.; Qamar, U.; Aziz, S.; Khan, M.U. Phonocardiogram signal analysis for classification of Coronary Artery Diseases using MFCC and 1D adaptive local ternary patterns. Comput. Biol. Med. 2021, 138, 104926. [Google Scholar] [CrossRef]
  21. Baghel, N.; Dutta, M.K.; Burget, R. Automatic diagnosis of multiple cardiac diseases from PCG signals using convolutional neural network. Comput. Methods Prog. Biomed. 2020, 197, 105750. [Google Scholar] [CrossRef]
  22. Yaseen; Son, G.-Y.; Kwon, S. Classification of Heart Sound Signal Using Multiple Features. Appl. Sci. 2018, 8, 2344. [Google Scholar] [CrossRef] [Green Version]
  23. Li, S.; Li, F.; Tang, S.; Luo, F. Heart Sounds Classification Based on Feature Fusion Using Lightweight Neural Networks. IEEE Trans. Instrum. Meas. 2021, 70, 4007009. [Google Scholar] [CrossRef]
  24. Arora, V.; Verma, K.; Leekha, R.S.; Lee, K.; Choi, C.; Gupta, T.; Bhatia, K. Transfer Learning Model to Indicate Heart Health Status Using Phonocardiogram. Comput. Mater. Contin. 2021, 69, 4151–4168. [Google Scholar] [CrossRef]
  25. Tuncer, T.; Dogan, S.; Tan, R.-S.; Acharya, U.R. Application of Petersen graph pattern technique for automated detection of heart valve diseases with PCG signals. Inf. Sci. 2021, 565, 91–104. [Google Scholar] [CrossRef]
  26. Alkhodari, M.; Fraiwan, L. Convolutional and recurrent neural networks for the detection of valvular heart diseases in phonocardiogram recordings. Comput. Methods Prog. Biomed. 2021, 200, 105940. [Google Scholar] [CrossRef]
  27. Deperlioglu, O. Classification of phonocardiograms with convolutional neural networks. BRAIN—Broad Res. Artif. Intell. Neurosci. 2018, 9, 22–33. [Google Scholar]
  28. Iyer, V.K.; Ramamoorthy, P.A.; Fan, H.; Ploysongsang, Y. Reduction of Heart Sounds from Lung Sounds by Adaptive Filterng. IEEE Trans. Biomed. Eng. 1987, 33, 1141–1148. [Google Scholar] [CrossRef] [PubMed]
  29. Salamon, J.; Bello, J.P. Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process. Lett. 2017, 24, 279–283. [Google Scholar] [CrossRef]
  30. Frid-Adar, M.; Diamant, I.; Klang, E.; Amitai, M.; Goldberger, J.; Greenspan, H. GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. Neurocomputing 2018, 321, 321–331. [Google Scholar] [CrossRef] [Green Version]
  31. Li, X.; Zhang, W.; Ding, Q.; Sun, J.-Q. Intelligent rotating machinery fault diagnosis based on deep learning using data augmentation. J. Intell. Manuf. 2020, 31, 433–452. [Google Scholar] [CrossRef]
  32. Meintjes, A.; Lowe, A.; Legget, M. Fundamental heart sound classification using the continuous wavelet transform and convolutional neural networks. In Proceedings of the 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Honolulu, HI, USA, 18–21 July 2018; pp. 409–412. [Google Scholar]
Figure 1. Typical examples of the PCG signals. (a) One example of AS; (b) One example of MR; (c) One example of MS; (d) One example of MVP; (e) One example of N; (f) One example of PH.
Figure 1. Typical examples of the PCG signals. (a) One example of AS; (b) One example of MR; (c) One example of MS; (d) One example of MVP; (e) One example of N; (f) One example of PH.
Jcdd 09 00086 g001
Figure 2. Flowchart of PCG classification by transfer learning models.
Figure 2. Flowchart of PCG classification by transfer learning models.
Jcdd 09 00086 g002
Figure 3. Examples of original PCG, augmented PCG and denoised PCG. (a) An original PCG signal (normal heart sound signal) after denoising; (b) an augmented PCG signal with noise; (c) a denoised PCG signal.
Figure 3. Examples of original PCG, augmented PCG and denoised PCG. (a) An original PCG signal (normal heart sound signal) after denoising; (b) an augmented PCG signal with noise; (c) a denoised PCG signal.
Jcdd 09 00086 g003
Figure 4. Time–frequency representation of PCG (a) time–frequency graphs of original heart sound signals; (b) time–frequency graphs of the augmented heart sound signals.
Figure 4. Time–frequency representation of PCG (a) time–frequency graphs of original heart sound signals; (b) time–frequency graphs of the augmented heart sound signals.
Jcdd 09 00086 g004
Figure 5. The process of transfer learning.
Figure 5. The process of transfer learning.
Jcdd 09 00086 g005
Figure 6. 10-fold cross validation of confusion matrix for the GoogleNet model on augmentation database. The accuracy of each fold is: (a): 0.98; (b): 0.99; (c): 0.98; (d): 0.99; (e): 0.95; (f): 0.98; (g): 0.98; (h): 0.98; (i): 0.98; (j): 0.99.
Figure 6. 10-fold cross validation of confusion matrix for the GoogleNet model on augmentation database. The accuracy of each fold is: (a): 0.98; (b): 0.99; (c): 0.98; (d): 0.99; (e): 0.95; (f): 0.98; (g): 0.98; (h): 0.98; (i): 0.98; (j): 0.99.
Jcdd 09 00086 g006
Figure 7. 10-fold cross validation of confusion matrix for the GoogleNet model on original heart sound database. The accuracy of each fold is: (a): 0.98; (b): 0.98; (c): 0.97; (d): 0.98; (e): 0.98; (f): 0.98; (g): 0.95; (h): 0.97; (i): 0.99; (j): 0.98.
Figure 7. 10-fold cross validation of confusion matrix for the GoogleNet model on original heart sound database. The accuracy of each fold is: (a): 0.98; (b): 0.98; (c): 0.97; (d): 0.98; (e): 0.98; (f): 0.98; (g): 0.95; (h): 0.97; (i): 0.99; (j): 0.98.
Jcdd 09 00086 g007
Figure 8. ROC curve for (a) the augmented heart sound classification based on GoogleNet; (b) the original heart sound classification based on GoogleNet.
Figure 8. ROC curve for (a) the augmented heart sound classification based on GoogleNet; (b) the original heart sound classification based on GoogleNet.
Jcdd 09 00086 g008
Figure 9. Comparison of confusion matrix results of nine categories of transfer learning models. (a) Xception convolutional neural network; (b) NASNet-Large convolutional neural network; (c) resnet101; (d) inceptionv3; (e) densenet201; (f) Inception-ResNet-v2; (g) mobilenetv2; (h) darknet; (i) squeezenet.
Figure 9. Comparison of confusion matrix results of nine categories of transfer learning models. (a) Xception convolutional neural network; (b) NASNet-Large convolutional neural network; (c) resnet101; (d) inceptionv3; (e) densenet201; (f) Inception-ResNet-v2; (g) mobilenetv2; (h) darknet; (i) squeezenet.
Jcdd 09 00086 g009
Table 1. A comparative performance of existing work for the cardiac disease classification.
Table 1. A comparative performance of existing work for the cardiac disease classification.
YearRelated WorkDatabaseConditionFeature ExtractionMethodAccuracy
2021Haoran Kui et al. [17]Collected by themselvesTwo and four classesMFSCCNN93.89% (two-classes); 86.25%(multi-classes)
2021Omer Deperlioglu et al. [18]PASCAL B-trainingThree classesInstant energyStacked autoencoder network99.61%
2020Neeraj Baghel et al. [21]Yaseen databaseFive classes7-conv-layersImproved network architecture98.60%
2018Yaseen et al. [22]Yaseen databaseFive classesMFCC+DWTSVM, DNN, KNN97%
2021Suyi Li et al. [23]PhysioNet databaseTwo classesTime–frequency feature fusionLightweight neural network model95.50%
2021Vinay Arora et al. [24]PhysioNet 2016 and PASCAL 2011Two classesCWTMobileNet, Xception, VGG16, ResNet, DenseNet, and InceptionV392.96%
2021Turker Tuncer et al. [25]Yaseen databaseFive classesPetersen graph patternDecision tree, linear discriminant, bagged tree, and support vector100%
2021Mohanad Alkhodari et al. [26]Yaseen databaseFive classesMaximal overlap discrete wavelet transformCNN-BiLSTM network97.87%
2018Omer Deperlioglu et al. [27]PASCALThree classesHeartbeatCNN97.9%
Note: MFCC (Mel-Frequency Cepstrum Coefficient); MFSC (Mel-frequency spectral coefficients); DWT (Discrete Wavelet Transform).
Table 2. Original heart sound database.
Table 2. Original heart sound database.
Heart DiseaseRecording SizeSample Frequency
Normal (N)2008000 Hz
Aortic Stenosis (AS)2008000 Hz
Mitral Regurgitation (MR)2008000 Hz
Mitral Stenosis (MS)2008000 Hz
Mitral Valve Prolapse (MVP)2008000 Hz
Pulmonary Hypertension (PH)200 (with 74 subjects)2000 Hz
Table 3. Recording distribution after data augmentation.
Table 3. Recording distribution after data augmentation.
Heart DiseaseRecording SizeSample Frequency
Normal (N)4008000 Hz
Aortic Stenosis (AS)4008000 Hz
Mitral Regurgitation (MR)4008000 Hz
Mitral Stenosis (MS)4008000 Hz
Mitral Valve Prolapse (MVP)4008000 Hz
Pulmonary Hypertension (PH)4002000 Hz
Table 4. Overall view of the transfer learning models used in this study.
Table 4. Overall view of the transfer learning models used in this study.
No.NetworkDepthSizeParameters
(million)
Image Input Size
1Squeezenet185.2 MB1.24227 × 227 × 3
2Googlenet2227 MB7.0224 × 224 × 3
3Inceptionv34889 MB23.9299 × 299 × 3
4Densenet20120177 MB20.0224 × 224 × 3
5Mobilenetv25313 MB 3.5224 × 224 × 3
6Resnet101101167 MB44.6224 × 224 × 3
7Xception7185 MB22.9299 × 299 × 3
8Inceptionresnetv2164209 MB55.9299 × 299 × 3
9nasnetlarge*332 MB88.9331 × 331 × 3
10darknet191978 MB20.8256 × 256 × 3
* The nasnetlarge networks do not consist of a linear sequence of modules.
Table 5. (a) Results of 10-fold cross validation on augmented heart sound database on GoogleNet. (b) 10-fold cross validation result on the original dataset on GoogLeNet.
Table 5. (a) Results of 10-fold cross validation on augmented heart sound database on GoogleNet. (b) 10-fold cross validation result on the original dataset on GoogLeNet.
(a)
FoldTrain SamplesTest SamplesAccVal AccLossVal Loss
122801201.000.990.100.07
222801201.000.990.120.10
322801201.000.980.090.02
422801201.000.950.140.12
522801201.000.980.190.04
622801201.000.980.150.04
722801201.000.980.130.06
822801201.000.980.160.14
922801201.000.980.160.05
1022801201.000.990.100.07
Mean 0.98
(b)
FoldTrain SamplesTest SamplesAccVal AccLossVal Loss
110801200.980.98 0.000.2
210801201.000.98 0.000.3
310801201.000.98 0.10.07
410801201.000.95 0.000.02
510801201.000.97 0.000.05
610801201.000.97 0.000.08
710801201.000.98 0.000.09
810801201.000.99 0.000.10
910801201.000.98 0.000.06
1010801201.000.98 0.000.06
Mean 0.98
Table 6. Results for multiple class classification on GoogleNet.
Table 6. Results for multiple class classification on GoogleNet.
Performance IndicatorsResults on an Augmented DatabaseResults on an Original Database
ASMRMSMVPNPHASMRMSMVPNPH
Fold 1Precision0.951.001.001.001.001.001.000.951.000.951.001.00
Recall1.001.001.000.951.001.000.950.951.001.001.001.00
F1-Score0.981.001.000.971.001.000.970.951.000.981.001.00
Fold 2Precision0.951.001.001.001.001.000.951.001.000.951.001.00
Recall1.000.951.001.001.001.000.951.001.000.951.001.00
F1-Score0.980.971.001.001.001.000.951.001.000.951.001.00
Fold 3Precision1.000.950.950.951.001.001.001.001.001.001.000.91
Recall1.001.000.901.001.000.951.001.001.000.901.001.00
F1-Score1.000.980.920.981.000.971.001.001.000.951.000.95
Fold 4Precision0.831.001.000.951.000.951.000.861.000.901.000.95
Recall1.000.800.901.001.001.000.950.900.950.901.001.00
F1-Score0.910.890.950.981.000.980.970.880.970.901.000.98
Fold 5Precision1.001.000.950.911.001.001.001.000.870.951.001.00
Recall1.000.951.001.001.000.901.000.951.000.951.000.90
F1-Score1.000.970.980.951.000.951.000.970.930.951.000.95
Fold 6Precision0.911.000.951.001.001.000.951.000.950.951.000.95
Recall1.000.901.000.951.001.000.901.001.000.901.001.00
F1-Score0.950.950.980.971.001.000.921.000.980.921.000.98
Fold 7Precision0.951.001.000.951.000.951.000.950.911.001.001.00
Recall1.000.951.000.901.001.001.000.951.000.901.001.00
F1-Score0.980.971.000.921.000.981.000.950.950.951.001.00
Fold 8Precision0.871.001.001.001.001.001.000.951.001.001.001.00
Recall1.000.851.001.001.001.000.951.001.001.001.001.00
F1-Score0.930.921.001.001.001.000.970.981.001.001.001.00
Fold 9Precision1.000.951.000.911.001.001.000.911.001.001.001.00
Recall1.000.901.001.001.000.951.001.000.901.001.001.00
F1-Score1.000.921.000.951.000.971.000.950.951.001.001.00
Fold 10Precision1.001.000.951.001.001.001.000.951.001.001.000.95
Recall1.001.001.000.951.001.001.001.001.000.901.001.00
F1-Score1.001.000.980.971.001.001.000.981.000.951.000.98
Table 7. Nine transfer learning models of classification performance based on augmented heart sound database.
Table 7. Nine transfer learning models of classification performance based on augmented heart sound database.
ModelsACCPerformanceResults on an Augmented Database
ASMRMSMVPNPH
xception0.90Precision0.860.940.860.941.000.86
Recall0.950.80.90.8510.95
F1-Score0.900.860.880.891.000.90
Resnet1010.98Precision1.001.000.951.001.000.95
Recall1.000.950.951.001.001.00
F1-Score1.000.970.951.001.000.98
NASNet_large0.92Precision0.95 1.00 0.94 0.77 1.00 0.95
Recall0.90 0.85 0.85 1.00 1.00 0.95
F1-Score0.92 0.92 0.89 0.87 1.00 0.95
Inception-v30.94Precision0.950.940.861.001.000.91
Recall1.000.750.950.951.001.00
F1-Score0.980.830.900.971.000.95
DenseNet-2010.98Precision1.001.000.951.001.000.95
Recall1.000.951.000.951.001.00
F1-Score1.000.970.980.971.000.98
Inception-ResNet-v20.95Precision0.790.951.001.001.001.00
Recall0.951.001.000.751.001.00
F1-Score0.860.981.000.861.001.00
MobileNet-v20.95Precision1.000.950.860.951.000.95
Recall1.000.950.950.901.000.90
F1-Score1.000.950.900.921.000.92
Darknet190.98Precision1.001.000.911.001.001.00
Recall1.000.951.000.951.001.00
F1-Score1.000.970.950.971.001.00
Squeezenet0.97Precision0.950.951.001.000.911.00
Recall1.001.001.000.801.001.00
F1-Score0.980.981.000.890.951.00
Table 8. Results for multiple class classifications based on other deep-learning networks.
Table 8. Results for multiple class classifications based on other deep-learning networks.
ModelsNetwork Architecture FeaturesAccuracy
Bi-LSTMfully connected layer of size 2, a softmax layer, the maximum number of epochs is 30.Original one-dimensional PCG signals21.67%
CNN3 convolution layer, each layer contains a normalized RELU, MaxPooling, fully connected layer, and Softmax. The maximum number of epochs is 15.Spectrogram of the PCG signals76.67%
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Wang, M.; Guo, B.; Hu, Y.; Zhao, Z.; Liu, C.; Tang, H. Transfer Learning Models for Detecting Six Categories of Phonocardiogram Recordings. J. Cardiovasc. Dev. Dis. 2022, 9, 86. https://doi.org/10.3390/jcdd9030086

AMA Style

Wang M, Guo B, Hu Y, Zhao Z, Liu C, Tang H. Transfer Learning Models for Detecting Six Categories of Phonocardiogram Recordings. Journal of Cardiovascular Development and Disease. 2022; 9(3):86. https://doi.org/10.3390/jcdd9030086

Chicago/Turabian Style

Wang, Miao, Binbin Guo, Yating Hu, Zehang Zhao, Chengyu Liu, and Hong Tang. 2022. "Transfer Learning Models for Detecting Six Categories of Phonocardiogram Recordings" Journal of Cardiovascular Development and Disease 9, no. 3: 86. https://doi.org/10.3390/jcdd9030086

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop