1. Introduction
The heart sound results from myocardial movement and the valve opening and closing; it is greatly affected by the hemodynamics and electrical activity of the heart muscle [
1]. In the early stage of cardiovascular disease, heart sound auscultation, as a means of preliminary screening for cardiovascular diseases, can help differentiate abnormal signals from normal heart sound signals, and, therefore, provide effective information for the auxiliary diagnosis of cardiovascular diseases. Any dysfunction and anatomical defect in the heart can be reflected by the time, frequency spectrum, and morphological characteristics of the heart sound [
2]. Though the electrocardiogram (ECG) signal contains a lot of physiological information on the cardiovascular system, it cannot reveal a lesion in the early stage of cardiovascular disease, for a lesion is not clear enough. Yet, this can be achieved by heart sounds during the early stage of the lesion. Therefore, heart sound signals contain very important physiological information, and the study of heart sound signals possesses very important clinical value for the early diagnosis of cardiovascular diseases. The segmentation and classification of heart sound signals are currently the most commonly used methods for studying the heart sound signal.
Heart sound segmentation, as a common method of heart sound signal processing, aims to divide the heart sound cycle into four corresponding states, and it is also an important processing step for heart sound classification [
3]. A heart sound cycle of a normal adult mainly includes the first heart sound (S1), systolic period (sys), the second heart sound (S2), and diastolic period (dia), as shown in
Figure 1. S1 occurs when the mitral and tricuspid valves close, which marks the beginning of ventricular contraction. S2 occurs when the aortic valve and pulmonary valve close, marking the beginning of ventricular diastole. The normal contraction and relaxation of the heart are the basis of human blood circulation. When there are abnormalities in the heart, they will be shown in the heart sound signal, and various diseases have different signals. For example, for several common heart valve diseases, the manifestations of heart sounds is that there are often murmurs appearing in mitral regurgitation during systole, mitral valve stenosis during diastole, pulmonary valve stenosis during systole, ventricular septal defect during diastole, aortic valve stenosis during systole, and in aortic valve insufficiency during diastole. Based on the abnormal part of the heart sound, experienced clinicians can make a preliminary diagnosis of the disease, with some necessary examinations according to the needs of the patient for further diagnosis. The determination of the abnormal part of the heart sound can only be achieved when the state of the heart sound is determined, which can be done by heart sound segmentation. The common methods for heart sound segmentation mainly includes ECG signal-based segmentation methods [
4,
5,
6], envelope-based segmentation methods [
7,
8,
9,
10,
11], feature-based segmentation methods [
12,
13,
14,
15], machine learning-based segmentation methods [
16,
17,
18,
19,
20], and Hidden Markov Model (HMM)-based segmentation methods [
16,
21,
22,
23,
24]. In earlier times, there has been only some achievements made in segmentation methods based on ECG signals and envelopes. Later, Schmidt et al. first used the Hidden semi-Markov Model (HSMM) to precisely simulate the expected duration of HMM center sounds [
23]. Based on the research of Schmidt et al., Springer et al. extended this method through using logistic regression’s emission probability estimation, improving the Viterbi algorithm to decode the most probable heart sound state sequence, and obtained HSMM based on logistic regression (LR-based HSMM) [
24]. Until 2016, this method has been regarded as a very good method for heart sound signal segmentation, which was recommended as the heart sound signal segmentation algorithm by The 2016 PhysioNet/CinC Challenge. Many people who participated in the competition used this method for segmentation before classifying heart sounds, and got a good ranking. [
25,
26,
27]. However, for this method, the prediction time of each state needs to be added, and the logistic regression based on the Gaussian distribution of emission probability estimation and the extended Viterbi algorithm are used to predict the next states [
24], and there will be errors in this method in the case of long period and irregular sinus rhythm [
20]. However, the method based on convolutional neural network (CNN) does not require preliminary detection of the Phonocardiogram (pcg) segment. The structure of CNN is directly used to obtain the sound characteristics able to minimize segmentation errors from the heart sound signal itself or the features extracted from it, when completing the segmentation of the heart sound [
19].
Heart sound classification is to determine whether a heart sound is normal or not. The classification method of heart sounds includes heart sound classification without segmentation and heart sound classification including segmentation. Heart sound classification without segmentation means that the features of the entire heart sound are extracted after the preprocessing of the heart sounds, and the classifier is trained and classified using these features. In recent years, Hamidi, Arora, and Yaseen et al. have conducted some related research [
28,
29,
30]. Heart sound classification including segmentation is to extract the features of S1, sys, S2, and dia, based on the segmentation of the heart sound, and a new feature set is formed through the combination of these features of each part and other features of the entire heart sound. The classifier is trained and classified based on the new feature set. In this aspect, Pedro Narváez, Kucharski, and Li Fan et al. have done related work [
31,
32,
33]. Compared with the heart sound classification without segmentation, the heart sound classification including the segmentation can obtain the state mark of the heart sound, which enables the clinicians to locate the abnormality part of the heart sound, such as diastolic or systolic murmur, and contributes to further determining the position of the heart valve that results in the disease. “classification of heart sound recordings-the physionet computing in cardiology challenge 2016” was a competition for heart sound classification [
34], aiming to encourage the development of algorithms to classify heart sound recordings and to identify whether the subject of the recording should be referred on for an expert diagnosis. The PhysioNet provides a basic method for heart sound segmentation and a large number of heart sound signals, which have been widely applied in the 2016 competition and other researches afterwards. Among the participants in the competition, Potes et al. used the segmentation algorithm to classify heart sounds and obtained the first place [
25]. However, in subsequent research on heart sound signals, Renna et al. pointed out that the complex sound classifier can only improve the classification to a limited extent, and the improved segmentation algorithm can be the best way of obtaining a more significant improvement in heart sound classification. They applied CNN to heart sound signal segmentation and got good experimental results, but they did not further discuss whether the segmentation network leads to a good performance of classification [
19]. In 2020, Khan et al. also studied heart sound signals. They compared the classification results of segmented and unsegmented heart sound signals and concluded that using segmented heart sound signals can contributes to better classification. However, in the experiment, they used the improved empirical wavelet transformation and standardized Shannon average energy to preprocess and automatically segment the signals to identify the systolic and diastolic interval of the signal, instead of the segmentation of the four states [
35].
Therefore, this paper studies the method of heart sound segmentation using deep CNN, and further combines the segmentation network with heart sound classification. Firstly, the heart sound was preprocessed, then the signal was segmented in the multi-channel deep CNN, and finally classified in the CNN classifier and the Adaboost classifier. Heart sound segmentation, as a necessary stage of heart sound signal analysis, does not require the knowing of prediction time of each state in advance, and directly uses the deep CNN to learn the sound features that minimize segmentation errors, which is the focus of this paper. Considering the increasing number of cardiovascular diseases and the existing shortage of medical resources, we will apply a relevant study to real life by a set of auxiliary diagnosis system including hardware and software, as shown in
Figure 2. The hardware mainly includes electronic stethoscope (simple electronic stethoscope and professional electronic stethoscope), and the software includes record analysis software on the computer and mobile phones. The simple electronic stethoscope is available in every household just like a clinical thermometer. When the body is abnormal, the device is used to collect the signal and gets a preliminary diagnosis on the mobile terminal of the mobile phone. If the signal is abnormal, people who was uncomfortable go to the hospital for treatment. In hospital, the doctor collects the signal through the professional equipment, analyzes the signal on the computer (just like the ECG), and arranges the next examination.
5. Discussion
In this paper, the CNN was used to segment the heart sound signal, and it was further applied to classification. In the study of heart sound signal segmentation, referring to the process of image segmentation, the U-net network composed of the deep CNN was used for the segmentation of heart sound. Furthermore, the CNN classifier was used to classify the segmented heart sounds into normal or abnormal. In terms of segmentation, we discussed the impact of data length, network depth, convolution kernel size, and optimizer on the segmentation results. It can be seen from the fixed length parameters that the increase in the data length can improve the segmentation accuracy. As
Table 2 shows, the amount of data decreases while the fixed length increases. Ronneberger et al. pointed out that the U-net network can also show good performance on small data sets [
36], with a smaller amount of data reducing the credibility of the optimized model. When the data length was set to 512, the best segmentation results were obtained. The influence of the amount of fixed-length data on the result needs to be further explored through more data. The increase in network depth can effectively improve the performance of the network. This conclusion is consistent with the research results obtained by Krizhevsky and Simonyan et al. [
38,
39]. However, during the experiment, it was found out that the increase in network depth will increase the complexity of the model. It means that the number of related parameters in the model has increased exponentially. Too many parameters will consume a lot of computer memory capacity and training time. On the basis of a good segmentation effect, it is not worthwhile to spend a lot of computer memory and training time on the improvement of the segmentation accuracy, which was why the network depth was set to 5. The selection of the optimizer has a greater impact on the segmentation results. When the SGD optimizer was selected, the segmentation results were poor. When the Adam optimizer was selected, the segmentation effect was improved to a certain extent, and it performed best in the selected optimizer. However, in the study conducted by Keskar et al., it can be found out that selecting the most basic SGD optimizer [
40], and gradually increasing optimization parameters (such as first-order momentum, second-order momentum) to optimize the model according to the research object can improve the model. For heart sound signals, this method can be considered to further explore the optimizer of heart sound signals to further improve the performance of the model.
6. Conclusions
This paper proposed a method of using CNN to study heart sound signals, which mainly involved segmentation and classification. In the study of segmentation, this paper applied U-net network composed of deep CNN to the segmentation step, and determined the relevant parameters of the model and trained the model that can segment heart sounds well through optimizing the relevant network structure and comparing the segmentation results under different optimizers and different input data lengths. In the study of classification, the segmentation model obtained was used to segment the heart sounds, and then used the Adaboost classifier and the CNN classifier to classify the heart sounds, and finally compared the classification results to select a better classifier. Without knowing the prediction time of each state, the segmentation model we trained obtained the overall accuracy rate of 0.996, the accuracy rates of S1, sys, S2, and dia were 0.991, 0.996, 0.996, 0.997, and the average accuracy rate was 0.995. Additionally, in the subsequent classification process, the CNN classifier got the accuracy of 0.964, the sensitivity of 0.781, and the specificity of 0.873. Therefore, a preliminary conclusion can be drawn that, as a basic network structure of deep learning, CNN can be applied to the research of heart sounds. We also believe that it will shine in the future combination of heart sounds and deep learning. In addition, considering that the changes in heart sounds at different periods are often accompanied by different types of cardiovascular diseases, the advent of the era of big data, the rapid development of artificial intelligence and the increasing incidence of heart diseases, we can carry out the analysis of different types of diseases in the future, the study of heart sound signals in different periods, and even the research on specific diseases. In the future, we will focus on the study of heart sound signals in different periods, and expand the classification of normal and abnormal heart sounds to the screening of specific diseases in specific periods. At the same time, we will further look for more opportunities of collaboration with clinicians to collect more heart sound data, and optimize the model structure.