Deep Learning Methods for Classiﬁcation of Certain Abnormalities in Echocardiography

: This article experiments with deep learning methodologies in echocardiogram (echo), a promising and vigorously researched technique in the preponderance field. This paper involves two different kinds of classification in the echo. Firstly, classification into normal (absence of abnormalities) or abnormal (presence of abnormalities) has been done, using 2D echo images, 3D Doppler images, and videographic images. Secondly, based on different types of regurgitation, namely, Mitral Regurgitation (MR), Aortic Regurgitation (AR), Tricuspid Regurgitation (TR), and a combination of the three types of regurgitation are classified using videographic echo images. Two deep-learning methodologies are used for these purposes, a Recurrent Neural Network (RNN) based methodology (Long Short Term Memory (LSTM)) and an Autoencoder based methodology (Variational AutoEncoder (VAE)). The use of videographic images distinguished this work from the existing work using SVM (Support Vector Machine) and also application of deep-learning methodologies is the first of many in this particular field. It was found that deep-learning methodologies perform better than SVM methodology in normal or abnormal classification. Overall, VAE performs better in 2D and 3D Doppler images (static images) while LSTM performs better in the case of videographic images.


Introduction
With the advances in the field of biomedical imaging, digital images play a vital role in the early detection of abnormalities or diseases in the human body for any systems. Many intricate systems exist in the human body, namely the nervous system, cardiac system, endocrine system, etc that are important for survival. Out of these, the cardiac system is considered to be one of the most delicate systems. Cardiology is viewed as a complex subject of practice due to less exposure to the intricacies of relevant technologies. Medical imaging has become a tool for diagnosis purposes and provides information about the anatomic structures with the assistance of computers through imaging modalities like Computed Tomography (CT), Magnetic Resonance Imaging (MRI), Angiogram, Electrocardiograph (ECG) and others [1].
Amongst these, echocardiogram (echo) is considered and perhaps the most frequently used tool in the field of the cardiac system. It is used mainly due to its ability for early diagnosis and management of heart diseases. It is a simple, non-invasive, and inexpensive technique that can precisely show the pressure gradient of heart lesions. Since it uses sound waves instead of radiation, echo is considered to be safe [2]. Echo uses standard 1.
Mitral Regurgitation (MR): It is the most common valvular involvement in children and rheumatic heart diseases. In color flow, LA size is increased, and the MR jet can be seen [10].

2.
Aortic regurgitation (AR): AR is known to be less frequent than MR. But most patients having AR have associated mitral valve disease. It results from distorted aortic leaflets and for which careful analysis of the aortic valve is a must [9]. Color flow and Doppler give an estimate of the severity of AR.

3.
Tricuspid Regurgitation (TR): It is often common in people who smoke [11]. It can also be seen in 20% of rheumatic heart disease patients. Using Doppler and color flow, TR can be seen, and depending on the TR jet, the severity of TR can be revealed. The tricuspid valve is similar to that of the mitral valve with more variability [9].
The above mentioned three types of regurgitation are acquired heart disease (cause during one's lifetime). Not all kinds of regurgitation are known to be acquired, as some can be congenital (presence since birth). In our case, it was observed that all cases of patient data are acquired. Figure 1 shows a Doppler echo having AR, MR, and TR abnormalities respectively. This valvular regurgitation plays a significant role and represents an important cause in mortality and morbidity [9]. The echo plays an essential role in regurgitation assessment and using Doppler echo the presence of the types of regurgitation can be distinguished more distinctively. But this has to be done by a cardiologist by precisely locating and assessing the visualization in the form of video. To detect the presence or absence of any abnormalities, extraction of an image or images from a videographic echo is a necessity. From the visualization of echo, a cardiologist can predict the functions of valves and defective parts, if any. However, It requires trained cardiologists to interpret accurate findings and give reports. Often cardiologists take the help of cauterization [12], which is a surgically invasive and expensive procedure. Usage of automated methods will help in the accurate diagnosis of any heart abnormalities and also reduce the necessity of invasive procedures. There are no automated facilities that can detect the presence of abnormalities or any disease in the heart. Thus, finding a way to treat such abnormalities using automated algorithms is needed. An attempt using machine learning algorithms have been made in the past in which SVM has provided better result using static images. But using videographic images was never explored. For this purpose, work using videography has been introduced to reduce the work of a cardiologist and provide an efficient and effective result as this will help in the early detection and diagnosis of heart diseases. Usage of deep learning and machine learning techniques help us to a greater extent to handle such fine details. This work aims to classify images into normal and abnormal and the different type of regurgitation. It has been observed that the performance of small architecture is quite similar with the performance of more complex architecture. We wanted to examine the clinical capability of our method in classifying the different types of images. Firstly, an effort has been made to classify the echo images into two classes (i.e., abnormal and normal). Then videographic echo images have been considered for further classifications, based on regurgitation present, using two types of deep learning-based models (i.e., LSTM and VAE-CNN). Firstly, RNN based model using LSTM has been chosen due to its capability in recalling the time and predicting the next image or frame. Another method is using Variational Auto Encoder (VAE) with Convolutional Neural Network (CNN). CNN is used for extracting features and for space reduction. A comparison with well-known SVM methodology is also performed using the static images of the echo.
The main contributions of the paper are as follows: 1.
Work based on videographic images has been proposed as an initiative to find out its usefulness in the diagnosis of different types of abnormalities. Videographic images are used for classification into six types of regurgitation and two-class (normal or abnormal) classification.

2.
Work on 2D images for classification into normal or abnormal in PLAX view [13] has been done and compared to an existing method, i.e., SVM [14,15]. 3.
Using color Doppler 3D images, classification into normal or abnormal was done. 4.
Using RNN and CNN based VAE deep learning methodologies including an existing technique SVM, used for 2D classification, was done.

5.
Classification is performed, using the images captured in the Radiology laboratory and validated with the help of a cardiologist.
Works related to regurgitation classification are described in Section 2. In Section 3, the flowchart of different methodologies used, are explained in brief. Section 4 provides the experimental result and Section 5 consists of the conclusion and future work.

Related Works
Work based on the cardiac system has become one of the most popular and aspiring field for many researchers. This is because it is one of the most important system of the human body and is the leading cause for morbidity and mortality in patients with kidney disease in the United States [16]. It is one of the main organs for blood supply and can be called as a manufacturer of blood circulation and thus plays a vital role. During echo visualization, any abnormal inflow or outflow of blood can be a sign of abnormalities or diseases in the heart. For this reason, works related to heart abnormalities have been taken in this paper and are discussed further. Many of the related works are not based on the classification of heart abnormalities but are included as they deal with the classification in the heart-related field.
Work related to cardiac classification can be found in Allan et al. [14], where the classification of Mitral Regurgitation (MR) was carried out using SVM as a classification method with an accuracy of 82% for moderate or severe MR. The apical view was taken for this purpose using a 2D echo with 6993 studies obtained from the Clinical Medical Research Ethics Board of Vancouver Coastal Health [14]. Balaji et al. have done works on the classification of different views of echo wherein [17] parasternal short axis (PSAX), parasternal long axis (PLAX), apical two-chamber (A2C), and apical four-chamber (A4C) was classified using the histogram and statistical features with 87.5% accuracy and in [18], parasternal long axis (PLAX), apical two-chamber (A2C) and apical four-chamber (A4C) was classified using Connected Component Labelling with 94.56% accuracy. Nandagopalan [19], has also worked for view classification where parasternal short axis (PSAX), parasternal long axis (PLAX), apical two-chamber (A2C), and apical four-chamber (A4C) was classified using a proposed method with 96% accuracy. Pinjari [20] used Proximal Isovelocity Surface Area (PISA) method for the classification of mild Mitral Regurgitation (MR), moderate Aortic Regurgitation (AR) and severe Aortic Regurgitation (AR). It was done using color Doppler images where images used were MR and AR. The images were first converted into YCbCr space, and filtering techniques like wiener and Gaussian filters were applied. For the same, Segmentation was done using Fuzzy C Means. Another work can be seen in [21], where heart valve disease for AR is assessed. It is assessed using a gradient, Aortic Stenosis (AS) grading, peak velocity, velocity ratio, Aortic Valve Area (AVA), Indexed AVA, and mean gradient. The type of AS is known based on the ratio obtained. Also, Strunic et al. worked on the classification of murmurs using ANN as a classification technique using heart sound [22]. Many papers have considered Left Ventricle (LV) segmentation as an important aspect in finding abnormalities considering LV as the largest part of the heart where the flow of blood can be witnessed [23,24]. A review work was done on machine learning for heart disease prediction in [25], and work on the classification of heart diseases based on the counts of heart beat could be seen in [26].
Along with these methods, other state-of-art emerging techniques are deep learning like Convolutional Neural Network (CNN), Autoencoders, Recurrent Neural Network (RNN) in different fields of biomedical imaging, and computer vision. Deep learning has several families, including fully connected networks like autoencoders, convolutional neural networks like AlexNet, and LeNet, recurrent neural networks like LSTM, and deep belief networks. Using Deep Learning architecture like CNN has an advantage over other Deep learning methodologies, where features are extracted during the process. These architectures have shown excellent performance in many fields and even gained popularity in the field of segmentation of images.
Works related to normal or abnormal heart images have not been done previously. Such work is necessary to help physicians for identification of the presence or absence of any abnormalities. This work has been taken in this paper with a hope that automated methodologies can reduce human exertion and applicable as a tool in places abstain by an expert or ease the process of diagnosis.
In this paper, work based on classification has been taken as an initial step for the prediction of a specific region of interest. Two types of classification have been carried out, namely, classification into normal and abnormal images and classification into different types of regurgitation.

Classification of Heart Abnormalities Using Different Architectures
Classification plays an important role in the prediction of an area or region containing abnormalities for the diagnosis of any disease. It classifies input into different classes. In this work, we have used Long Short Term Memory (LSTM), Variational Autoencoder + Convolutional Neural Network (VAE-CNN) along with SVM are used for classification. The 2D static echo images and 3D static doppler images are classified into two classes namely normal or abnormal. Videographic echo images are also classified into two classes (normal or abnormal) and six-classes of regurgitation using the same methodologies.

Data Acquisition
The raw data were obtained from a Cardiac Clinic namely Hope Clinic loacated in Shillong, India using echo as a tool under the supervison of specialist in the relevant field.
A sample image used in the work is shown in Figure 1. The data obtained are in 2D jpeg images, 3D bitmap images, colored and 2D videographic images in Audio Video Interleave (AVI) format. A total of 120 patient data with abnormality/abnormalities cases and a few normal cases were collected. The different types of abnormalities are MR, AR, TR, and a few having mixture of these. All the data are validated with the help of a cardiologist.

Image Preprocessing and Data Augmentation
An overall flowchart depicting the working methodologies is shown in Figure 2. Our scheme starts with taking an input image (frame in case of video), which is then cropped and converted into gray scale for 2D classification of images and videos. This conversion is important as grayscale images are more detailed and give a better representation of the image. It is then passed for filtering using the Gaussian filter as in [20]. It was done to remove noise, and unwanted data and Gaussian filter give a comparatively better result. Few images having a mixture of two or more abnormalities were augmented so as to obtain 10 number of patient data for experimental purpose in the case of video classification of 6 classes. Augmentation using cropping has been done.

Classification Using LSTM, VAE-CNN and SVM Methodologies
The images after preprocessing are then saved into two Comma-Separated Values (CSV) files for the testing and training phase. The training CSV file is used for the training and validation phase consisting of a labelled dataset. The images are then processed using different methodologies (LSTM/VAE-CNN/SVM) for validation purposes. The testing CSV file consists of unlabelled data. After which the test images will be predicted, and the output obtained is class 0 or 1 in the case of two class classification and class 0, 1, 2, 3, 4, 5 in the case of six classes for regurgitation classification.

Steps Involved in Classification of Video
The steps involved in Videographic images are as follows:

1.
Extract each frame and operate on each of them. This is known as spatiotemporal deep learning.

2.
Each frame is assigned a class in the training and validation phase (labelled frames).

3.
Frames are cropped and resized into 224 × 224. The size was chosen randomly based on the previous network, like AlexNet.

4.
The training set is then passed into the network (LSTM and VAE-CNN) for classification. 5.
Output classes 0, 1, 2, 3, 4, 5 in the case of six-class classification and 0 and 1 in the case of two-class classification were obtained. 6.
Testing was done on the remaining unlabelled frames of each video. 7.
Steps 3 to 5 are repeated.

Long Short Term Memory (LSTM)
LSTM is an RNN based model. LSTM methods are used in speech and Natural Language Processing (NLP). Since 1997, when Hochreiter introduces LSTM, it has become prevalent in the field of text classification [27]. But this RNN technique has also been found suitable for videos by researchers and can help in predicting the next frame in a video and are applied in many fields of videography [28]. For this reason, this method has been used in our case.
This paper aims to use RNN using LSTM as a variant as it is a better version of RNN [28]. RNN has many variants, including LSTM, GRU, and other modified versions. Here, taking video as input is challenging compared to images as videos are a collection of frames.
LSTM is designed to overcome long term dependencies and to solve the vanishing gradient problem [29]. LSTM improves gradient flow and is most suitable when time is taken as a factor. In a way, LSTM is similar to ResNet (Residual networks) [29].
In this paper, the LSTM model is used without any change in architecture. It has input gate units and output gate units, and the resulting units which are complicated are called memory cell [27]. It also consists of forget gate, memory cell inputs, memory cell output. Gates are used for the memorizing process [30]. A diagram showing the working components of LSTM is shown in Figure 3. The elements of LSTM can be calculated as: (1) where F t is the forget gate, I t is the input state, G t is the cell state, W is the weight, H is the output, X is the input, O t is the output of the sigmoid gate, and C t is the cell state. For our purpose, the input is passed to a convolutional layer for feature extraction which is then passed to LSTM architecture, and the output is classified into 0 or 1 (normal or abnormal) class for 2D classification, Doppler classification, and two-class video classification using Sigmoid classifier, and class 0 to 5 for six-class classification of regurgitation for videographic images using softmax classifier.

Variational Autoencoder + Convolutional Neural Network (VAE-CNN)
Combining CNN with other methods help the network excel at spatial relationships [31]. Convolutional layers are a significant building block in deep neural networks. However, the gradient computation of the convolution network remains a challenge in TensorFlow. Many researchers argue that even random convolutions are content [32]. On the other hand, Autoencoder is a powerful generative model that takes CNN idea, which is useful for reconstructing its output through encodings.
The overall diagram of how CNN is combined with VAE is shown in Figure 4. VAE has been taken as a method as it works with a diverse range of data [33]. The overall procedure starts with the input being passed to a convolutional layer with filter size 3 × 3 and stride of 2 × 2 followed by another convolutional layer with the same filter size and no stride. Then it is passed to the Maxpooling layer with size 2 × 2 that helps in reducing the image size which is then passed to a fully connected layer with 4096 nodes. It is then followed by VAE where a first dense layer of 500 is used, followed by another dense layer of 120 and then followed by vector generation of µ and σ, which will produce a sample vector of 30 [33]. µ is mean, and σ is the standard deviation. It is then passed to a classification layer that produces output class using sigmoid and softmax classifiers for two-class and six-class, respectively. It can also be passed to a decoder but was not done, as in our case, our purpose is classification.

Support Vector Machine (SVM) Methodology
SVM is one of the most widely used supervised learning methods for classification in the different fields in medical imaging [14,[34][35][36]. SVM is memory efficient and effective in case of high dimensional spaces. It is not only used for classification but for regression as well. For this paper, the SVM methodology was taken from [14]. Although the paper did not mention the type of SVM being used, but for our purpose, SVM with a linear kernel is used as it is popularly used in many fields. The linear kernel can be represented as [37]: where W T I is the sum of the inner product. The output class obtained is 0 or 1, 0 for normal (absence of abnormalities) class, and 1 for abnormal (presence of abnormalities) class.
The difference in parameters used in LSTM and VAE-CNN is provided in Table 1. After a brief discussion on the different methods used in this paper, the next section is provided with experimental results and conclusions.

Experiment and Result Analysis
The implementation was carried out using jupyter notebook, which is a readily available and open-source web application for python programming language and google colab. The result is divided into two parts, firstly, classification into normal or abnormal, and secondly classification into different types of regurgitations. We have used k fold cross validation as our first approach consisting of varieties of folds (2, 5 and 10). The second approach used here is generalization capability in medical diagnostic where there is no observation in the training phase and no data in the training is in the testing phase [38]. If that is not maintained there might be an astonishing relationship between the obtained status and real identity which results in unrealistic results. Here no data in the training phase is in the testing phase as data here are based on patient data where a patient in the training and validation phase (train-test split) is not used in the testing phase (separate CSV file). Except in two patient cases for six-class classifications, the same data was augmented and kept in the same CSV file. The training data is labelled and the testing data is unlabelled. The optimizer used is Adam and batch size of 50.

Performance Metrics
Performance metrics are used to evaluate and for checking the quality of performance by the algorithms. Accuracy is considered to be one of the most widely applied performance metrics in classification. The different performance metrics used in the paper are as follows:

Classification Accuracy
It is a measure that calculates the ratio of the correct classification to the total number of samples. For class having the same number of samples, accuracy itself is sufficient as a metric.

Logarithmic Loss
It works by castigating false classification. The lower the loss, the better will be the accuracy. It works well for multi-class classification. Here, Binary crossentropy is used as a loss function.

Confusion Matrix
It is a matrix that describes the complete performance of a model. The following can be calculated based on the confusion matrix.

Precision
It is the fraction of True Positives (TP) and False Positives (FP). For two class classification, precision can be calculated using Precision = TP/TP + FP (8) For six-class classification precision can be calculated using where i ranges from 0 to 5.

Recall
It represents the fraction of True Positives (TP) and False Negatives (FN). For two-class classification recall ca be calculated as For six-class it can be generalized as

F1 Score
It is a harmonic mean of precision and recall. For two-class classification it is given by: For six-class classification it can be calculated from recall (R i ) and precision (P i ) as:

Dataset
Here, classification into normal or abnormal is carried out using two types of echo images. Obtained data are 2D images in Joint Photographic Experts Group (JPEG) format and 3D color Doppler in BitMap (BMP) format. Data were collected from Hope clinic, Shillong. For the validation phase, 10% of the total data were obtained from the training set used in the training phase. For the testing phase, data are separated in an unused folder (which is later saved in CSV file) where these are tested in the later part after validation. The number of images used for 2D image classification and 3D Doppler image classification is 1070 and 540, respectively. Out of which, 10% is for validation and the rest for training. Excluding these, there are 38 number of 2D images and 10 number of Doppler images for testing purposes. Testing data are separated from training and validation in the experiment for prediction purposes. The total number of 2D images is 1108, and 3D Doppler is 550. For k fold classification the data from all the phases are combined. The k fold cross validation was run multiple times to obtain the same number of data in both the methodologies for plotting the confusion matrix. This is done so that comparison can be made with the same number of data even though the pattern obtained in both cases are different. Table 2, the result showing accuracy, precision, recall, and F1 Score. From the graph in Figure 15 plotted for all performance metrics and output obtained without k fold, we can conclude that VAE-CNN gives better output in almost all cases. In the case of SVM, in testing for 2D images it is almost equivalent to that of VAE-CNN. In other cases, deep learning methodologies are better compared to SVM in the classification of heart images. It can also be seen that accuracy is better in VAE-CNN, precision in SVM, recall in VAE-CNN, and F1 Score in VAE-CNN for 2D images in the validation phase. In the case of color Doppler, accuracy is better in VAE-CNN, precision in VAE-CNN, recall in SVM, and F1 Score in VAE-CNN in the validation phase. During prediction (testing phase), accuracy in SVM, precision in LSTM, recall in VAE-CNN, and F1 Score in VAE-CNN and SVM is better in the case of 2D images. Overall, VAE-CNN gives a better output compared to the other two methods. Using k fold cross validation, in case of 2D images VAE-CNN performs better in the case of 2 fold and 5 fold and almost equivalent to the others in case of 10 folds. We can also see that for 3D doppler images VAE-CNN performs better compared to LSTM and SVM in all the three folds. Overall, VAE-CNN gives a better output when using k fold which is in the case of generalization classification as well.

Statistical Significance Test
Statistical tests are used for comparison of the classifier. Several statistical tests are available out of which paired T-test has been used for comparison of LSTM and VAE-CNN to that of existing SVM methodology. As k fold cross validation is already considered as a statistical procedure, T-test has not been calculated for the same. It is used for determining the mean of two sets, which is equivalent to zero [39]. It shows the significance of a model by specifying the p-value obtained from the test. Mathematically it can be calculated using: where d is the mean difference, s 2 is the sample variance and n is the number of samples.
Based on the statistical test in Table 3, the result obtained by both deep learning methodologies dominates that of SVM. It is done by considering the value of 0.05. From the total 8 cases, 6 cases show statistically significant improvement compared to SVM. In all cases LSTM and VAE-CNN obtained statistical significance improvement over SVM, though in some cases there is not much significance. Only in 1 case of VAE-CNN and 1 case of LSTM, there was no improvement at all. Using the obtained data, the deep learning methodologies are more effective based on the value of the paired T-test. Based on the test of the three methodologies, a difference is observed between the group and therefore is statistically significant. 1.
It can be observed that the traditional method could properly classify the different classes as compared to deep learning methodologies.

2.
Using different views and types of image format gives an almost equivalent output, which means that these methods work for any view of echo.

Classification into Types of Regurgitation
Classification for 6 types of regurgitation has been done using videographic images namely, class 0-mitral regurgitation (MR), class 1-aortic regurgitation (AR), class 2-tricuspid regurgitation(TR), class 3-mitral regurgitation, and tricuspid regurgitation (MR+TR), class 4-aortic regurgitation and mitral regurgitation (AR+MR) and class 5-aortic regurgitation, mitral regurgitation and tricuspid regurgitation (AR+MR+TR). These classes were selected based on data availability of the types of regurgitation. Also, classification into two types, i.e., class 0-normal or class 1-abnormal was done using videographic images.

Dataset
The data obtained were in video format, and for each class, 10 patients' data were used. The frame ranges from 33 to 150 for each patient. The total number of images used for two-class classification during training is 2430, where 243 are for validation purposes. For the testing phase, the total number of images is 539. The total number of images used for six-class classification during training is 5160, where 516 are for validation purposes. For the testing phase, the total number of images is 736. In the case of k fold cross validation, the data were combined from all the phases. Here too the methodologies were run multiple times to obtain the same number of data in both the methodologies for plotting the confusion matrix.

Output
Two methods, namely, LSTM and VAE-CNN, are used for comparison purposes. It was not compared with other methods as no such work has been done in the same field using video. These two methods were used to check the applicability of deep learning methodologies in these fields. From the output obtained in Tables 4-6 and graphs in Figures 16 and 17, it could be seen that classification into normal or abnormal gives very accurate results for both the methodologies in the training and validation phase. When it comes to prediction using VAE-CNN, a better result is obtained. Both cases under performed as accuracy during prediction could not be more than 80% in the case of two-class classification. Nonetheless, it can be observed that deep learning methodologies could classify correctly with the highest of 100% accuracy during validation and 95% accuracy during testing/prediction in six-class classification using LSTM. It can also be observed from the output in Figure 18, Tables 6-8, and graphs plotted in Figures 19 and 17 that classification into 6 types of regurgitation give better accuracy and overall performance than that of normal or abnormal classification. This is due to the higher similarities pattern between classes. Few examples showing classification into six-class regurgitation can be seen in Figure 20.
Using k fold cross validation it can be observed that LSTM performed better compared to VAE-CNN in both cases (normal or abnormal and types of regurgitation classification). In the case of two-class classification LSTM gives a 100% accuracy score in all three folds while VAE-CNN gives 99% and 98% unlike in the generalization approach. In the case of six-class classification VAE-CNN gives a different result than the generalization approach, where the accuracy is improved and with the best accuracy of 86%. VAE-CNN did not under perform, but could not overtake LSTM in either of the cases.  Table 5. Confusion matrix for validation and testing into six-class (type of regurgitation) classification.

Summary
It could be observed that accuracy, precision, recall, and F1 score is better in the validation phase for two-class classification than that of six-class classification. Another output could be observed in the testing phase (generalization aprroach) where LSTM is found to be better than VAE-CNN using color Doppler for six-class classification. In the testing phase (generalization approach), the output of VAE-CNN is better than of LSTM for the two-class classification. Low accuracy can occur due to fewer data used. Overall, deep learning can be applied and used instead of the SVM method.
With the number of images increases, VAE performance too increases. For all other cases, the pattern of output is similar or the same except in six-class classification where LSTM performs better in case of testing using generalization approach (train test split and prediction). This is due to inadequate patient data available where few repetitive data had to be used for two different classes. This causes misclassification where a patient having both AR and MR is treated to have only MR or AR. This shows that VAE cannot classify properly for classes having two abnormalities in the same frame. LSTM performs better using video images compared to VAE-CNN. LSTM performs better since it is an RNN based method, it has time as a factor that can predict what is the next class based on the present and previous inputs. LSTM sometimes fails to classify images of the different class taking the previously obtained classes as the next class. Varying encoding on every single pass makes VAE-CNN difficult to classify frames of the same class which is its disadvantage. However, VAE-CNN has both the property of CNN and VAE which provides a continuous latent space and made interpolation simpler and sampling easier. In conclusion, we can say VAE-CNN gives better output using static images and LSTM using videographic images from k fold cross validation and generalization approach.

Conclusions and Future Work
Heart abnormalities classification has been little explored in the field of cardiology. It is an important aspect for detecting any future diseases. Any step that makes diagnosis more accessible and a tool in the future for human intervention can never be considered vague. Several works have been done in the past, but using deep learning methodologies or machine learning models has not been explored much in this field. This paper presented two such methodologies in quest of a better algorithm that can better classify the types of regurgitation and class consisting of abnormalities and without any abnormalities. From the obtained output, it can be concluded that using deep learning methodologies regurgitation can be better classified as compared to a well-known SVM method. Using LSTM and VAE prove to be an efficient and effective way in abnormalities detection where the accuracy in most cases is high. Using such algorithms provide a solution to a cardiologist and ease the process of diagnosis. This will reduce human effort and can be used in early detection and for better diagnosis. In this paper, we have used clinical data, not process data, which could be the reason for lesser than expected accuracy in some cases. Work can be done to achieve a greater number of properly processed data in the future using the videographic echo with keyframe extraction and segmentation. This paper is an initiative for the application of deep learning in such kind of works, which can be further expanded. More experiments are needed for a better diagnosis that can ease and even replace human exertion to some extent.