Image Segmentation for Mitral Regurgitation with Convolutional Neural Network Based on UNet, Resnet, Vnet, FractalNet and SegNet: A Preliminary Study

: The heart’s mitral valve is the valve that separates the chambers of the heart between the left atrium and left ventricle. Heart valve disease is a fairly common heart disease, and one type of heart valve disease is mitral regurgitation, which is an abnormality of the mitral valve on the left side of the heart that causes an inability of the mitral valve to close properly. Convolutional Neural Network (CNN) is a type of deep learning that is suitable for use in image analysis. Segmentation is widely used in analyzing medical images because it can divide images into simpler ones to facilitate the analysis process by separating objects that are not analyzed into backgrounds and objects to be analyzed into foregrounds. This study builds a dataset from the data of patients with mitral regurgitation and patients who have normal hearts, and heart valve image analysis is done by segmenting the images of their mitral heart valves. Several types of CNN architecture were applied in this research, including U-Net, SegNet, V-Net, FractalNet, and ResNet architectures. The experimental results show that the best architecture is U-Net3 in terms of Pixel Accuracy (97.59%), Intersection over Union (86.98%), Mean Accuracy (93.46%), Precision (85.60%), Recall (88.39%), and Dice Coefﬁcient (86.58%).


Introduction
Every year in the United States, the population experiencing heart attacks or strokes is recorded at more than 2 million people.The biggest cause of death is cardiovascular disease [1].Each year, the Centers for Disease Control and Prevention, the National Institutes of Health in collaboration with the American Heart Association (AHA), and other government agencies compile the latest statistics relating to heart disease, stroke, and cardiovascular and other metabolic diseases as outlined in the Heart Disease and Stroke statistics update.From data collected by Coronary Artery Risk Development in Young Adults (CARDIA), Atherosclerosis Risk in Communities (ARIC), and Cardiovascular Health Study (CHS) in 2000 in the United States in the adult population; the incidence rate of heart valve disease is the most common is mitral regurgitation, which is 1.7%, and increased from 0.7% in participants aged 18 to 44 years to 11.3% in participants aged over 75 years.The incidence of regurgitant mitral valve disease is estimated to be four times higher than that of stenotic aortic valve disease [2].
The heart has four heart valves, which are responsible for circulating blood and producing a pulse.Examination of the presence or absence of heart disease can be done using an echocardiography tool that can display the four heart chambers and heart valves.
Echocardiogram examination in the presence of color Doppler is a fairly robust and convincing imaging method to evaluate the geometry, dynamics and function of degenerative and functional mitral valve (MV) regurgitation.In addition, the presence of color Doppler is useful for medical personnel to identify the location of the regurgitation hole and the severity of mitral regurgitation.Automated assessment of mitral regurgitation using color doppler echocardiographic images to assess the severity of the mitral disease is of great value in helping surgeons perform MV repair [3,4].
The most common congenital heart disease found is Atrial Septal Defect (ASD).The relationship between ASD and valvular heart disease has been recognized for years.In Indonesia, a study conducted at Dr. Sardjito Hospital showed records from echocardiography examination, the echocardiograph was Vivid 7, with a total of 103 adult patients, consisting of 16 men and 87 women aged between 17 to 76 years, with an average age of 36 years, found to have ASD [5].
Deep learning is one type of machine learning that has become quite popular in recent years.Deep learning is also suitable for research in the health sector because it can process large amounts of data and can accept several types of data, such as images containing a collection of input points, then produce the appropriate output.The research of Andre Esteva et al. presents learning techniques in computer vision for health care that impact several health fields, especially medical imaging.This study outlines how deep learning models can be used on large data sets because their capabilities run on dedicated computing hardware and can accept multiple data types as suitable inputs in the health sector, because health data has heterogeneous input data [6].
Segmentation, feature extraction, and classification tend to use the advantages of Artificial Intelligence (AI), which uses neural network and deep learning techniques to get more accurate results in segmentation, feature extraction, and classification in the fields of medical diagnostics [7].In recent years, many studies have used deep learning approaches, especially Convolutional Neural Networks, for the detection of diseases.Another study also used a deep learning approach for the automatic detection of melanoma skin cancer from dermoscopic skin samples, which accurately classifies malignant vs. benign melanoma.This study used dermoscopic images containing different cancer samples.The data obtained were from the International Skin Imaging Collaboration data repository (ISIC 2016, ISIC2017, and ISIC 2020 Evaluation of models based on accuracy, precision, recall, specificity, and F1 score.The Deep Convolutional Neural Network (DCNN) classifier achieves high accuracy [7].
One way to diagnose the disease is segmentation.Segmentation is one of the keys to conducting medical image analysis.Segmentation is the process of partitioning a digital image in some area, which means simplifying or transforming the representation of an image into something more meaningful, that is easier to analyze and identify.Many algorithms have been used for image segmentation, U-Net is an image segmentation technique developed especially for image segmentation tasks.In the medical imaging community, U-Net has been accepted as the main tool for segmentation tasks in medical imaging.
The success of U-Net has been proven by its use with such images as those from CT scans, MRIs, x-rays, and microscopy [8].Meanwhile, the study of B. Ait Skourt et al. proposed the segmentation of CT images of the lungs with U-Net architecture.U-Net architecture is one of the architectures that are widely used in deep learning for image segmentation.The experimental results show that the segmentation architecture is accurate and the U-Net architecture provides an accurate picture of the lung in detecting lung cancer [9].
In the study developed by Ronneberger et al. [10], they proposed a network and training strategy with the efficient use of data augmentation on large annotated samples.The proposed method is the U-Net architecture that is being developed.In the proposed method, the structure of the method is seen using the encoder on the left and then performed using the encoder on the right.The encoder is one of the characteristics of the CNN structure that is used to compose the Convolution layer [10].Convolutional Neural Network has been applied in various medical image segmentation tasks and has been shown to perform better than traditional algorithms [11].In the research conducted by Q. Zhang et al., using the merging method based on the Resnet and U-Net architectures on Ultrasound Nerve data using Residual Units on the U-net architecture, the combination of these methods got the best results at the Dice Coefficient (69.15%) [12].In this paper, we also used U-Net development, which is U-Net3, and we obtained 86.58% higher Dice Coefficient results compared to that study.
In the research developed by Daniel et al., the proposed design of the new Unet architecture, namely Unet3, was introduced to detect and track people in crowded and continuous environments, such as airports or stations.In this architecture, the addition of normalized batches after the activation of the first relay and after the max-pooling and upsampling functions was proposed.This approach was tested on public datasets, TVHeads Dataset, which resulted in an F1 Score output ranging from 90%.They also performed performance comparisons between U-Net3 with FractalNet, Resnet, U-Net, U-Net2, and Segnet.The results show that U-Net3 is superior to other architectures [13].
Referring to the results of this research, we built the proposed model, which is U-Net3, but the difference is that we used the medical data that we built ourselves, not the public data.We also added one other architecture, which is Vnet, to compare our proposed model.
Another study conducted by Numaini et al. proposed a CNN-based U-Net architecture to automatically segment cardiac chambers to detect abnormalities (holes) in the cardiac septum by using a segmentation model for four classes, by comparing the performance of two architectures, namely U-Net and V-net.The results showed that the accuracy of the two architectures was above 90%, while U-Net had better accuracy than the V-Net model architecture; therefore, it can be concluded that the CNN architecture succeeded in segmenting the heart chambers for the detection of defects in the cardiac septum and can support the work of heart experts [14].In this study, we compared the proposed model and one other architectural model, while we compared the proposed model with six other architectural models.
Other medical research conducted by Kalane et al. used the U-Net architecture to automatically detect COVID-19 disease.The dataset used uses 1000 chest CT images consisting of 448 images of patients with COVID-19 disease and 552 patients without COVID-19 disease.The dataset was obtained from the GitHub repository and the Italian Society of Excellent Collections of Medical Radiology and Interventions.The experimental results show that the U-Net architecture is proven to be effective and produces output with an overall accuracy of 94.10% [15].That study discusses the COVID-19 disease, as we discuss Mitral Regurgitation disease.
Based on the description above, in this paper, we also propose a CNN-based architecture, using a segmentation model on a four-chamber heart image, to make a segmentation model using valvular heart disease and mitral regurgitation image datasets.Validating U-Net3 model with six other architectures using pixel accuracy, intersection over union, mean accuracy, precision, recall, and dice coefficient Section 2 of this paper consists of patients with mitral regurgitation disease and normal patients.The focus of this paper is to make a segmentation from proposed datasets using labeled images.In Table 1 we describe details amount of filtered frame from each video.In Table 2 we describe the details amount of data for training and testing, and unseen data.In Table 3, we also describe parameters for the various CNN Architecture.
In Table 4 we describe details of the architecture of U-Net.The performance assessment was conducted on pixel accuracy, intersection over union, and dice coefficient values and also compared the proposed architecture, U-Net3, with six other architectures, namely SegNet, Resnet, U-Net, U-Net2, and V-Net, which can be seen in Table 5.In Table 6 showing performance measurement with unseen data.From the results of this study, it can be concluded that the U-Net3 architecture has a fairly effective and best performance among other comparison architectures.

Materials and Methods
The Mitral Regurgitation Valve Segmentation process begins with the collection of a dataset of mitral regurgitation valve disease and normal heart valves, where the data collected are in the form of color Doppler echocardiogram videos obtained from patients suspected of having heart valve abnormalities.Furthermore, from the videos, data preparation is carried out by breaking the videos into a collection of images.The annotation process is carried out on each image to produce a ground truth.After all the images are annotated, the segmentation and prediction process is carried out.

Data Acquisition
In this research, a private dataset was built.The dataset was built by taking video echocardiography recording data consisting of mitral valve heart disease, regurgitation, and normal hearts, in the form of a four-chamber display in video format in the form of *.avi.Video data retrieval was taken from the Mohammad Hoesin Hospital in Palembang for the period of December 2019 to December 2021 using Transthoracic Echocardiogram (TTE) [2].
The data collected consisted of 42 patients and 923 images.We only used images of good quality and also frames that showed all parts of the heart.All 21 patients had mitral valve leakage, which resulted in 454 images, and 21 patients had normal hearts, which resulted in 469 images.The total number of frames obtained for the training and testing data used was 777 images.Of the 621 images for training and 156 for testing, they were divided into 37 patients for training and testing, and 6 patients for unseen data.the new data from 6 different patients outside of training and testing produced 146 images.Four patients had mitral valve leakage, which resulted in 90 images, and two patients had normal hearts, which resulted in 56 images.

Data Pre-Processing
Data preprocessing is a series of steps in the data filtering process.The preprocessing process is shown in Figure 1.
for the period of December 2019 to December 2021 using Transthoracic Echocardiogram (TTE) [2].
The data collected consisted of 42 patients and 923 images.We only used images of good quality and also frames that showed all parts of the heart.All 21 patients had mitral valve leakage, which resulted in 454 images, and 21 patients had normal hearts, which resulted in 469 images.The total number of frames obtained for the training and testing data used was 777 images.Of the 621 images for training and 156 for testing, they were divided into 37 patients for training and testing, and 6 patients for unseen data.the new data from 6 different patients outside of training and testing produced 146 images.Four patients had mitral valve leakage, which resulted in 90 images, and two patients had normal hearts, which resulted in 56 images.

Data Pre-Processing
Data preprocessing is a series of steps in the data filtering process.The preprocessing process is shown in Figure 1.Mitral Regurgitation video data pre-processing for segmentation starts from converting videos in .aviformat into a collection of images.The next step is to filter the data by taking data on images that have mitral regurgitation heart disease and normal heart when the heart valve is closed, the presence of color Doppler on echocardiogram examination will indicate the presence or absence of mitral valve disease.The next step in data preprocessing is to label the filtered frames.The labeling process or ground truth is carried out using the Label Me application to get ground truth results.

Model Architecture
The deep learning methods used in this research are the U-Net, Resnet, SegNet, and Vnet architectures, which are various types of CNN architectures designed to perform image processing.The U-Net architecture looks like a "U" with three parts: shrinkage, bottleneck, and expansion.The contraction section consists of many contraction blocks.Each block receives an input, applying two layers, 3 × 3 convolution followed by 2 × 2 maxpooling [2].The ResNet architecture introduces a new block called a residual block.A residual Neural Network has the ability to pass through multiple layers using shortcuts, and can allow a layer to copy an input to the next layer.The ResNet architecture is a CNN architecture developed using a residual network.The main idea of the ResNet network is to introduce residual blocks.Residual blocks overlay constant mapping layers based on a flat grid to perform residual learning, improve feature extraction accuracy, and solve missing gradient problems [12].V-Net is one of the most popular CNN architectures for medical imaging.V-Net is one of the architectures used to segment the image.The V-Net architecture has two important parts, namely the compression path and decompression path [16].

SegNet
SegNet has a corresponding encoder network and decoder network, followed by a final pixel classification layer.The network encoder consists of 13 correlated convolution layers responding to the first 13 convolution layers in the VGG16 network designed for object classification [17].

ResNet
ResNet is a CNN architecture designed with residual networks in mind.The main idea of the ResNet network is to introduce the remaining blocks.Residual blocks overlay constant mapping layers based on a flat grid to perform residual learning, improve feature extraction accuracy, and solve missing gradient problems [12].

V-Net
V-Net is one of the most popular CNN architectures for medical imaging.V-Net is one of the architectures used to segment the image.V-Net has two important parts of the architecture, namely compression and path decompression, a process that converts input data into an output data stream with a smaller size is called compression while the process of converting data that has been compressed into data originally is called decompression [16].

Fractal-Net
FractalNet is one of the architectures from CNN that avoids residual connections.This architecture involves the repeated application of simple expansion rules to create a fractal convolution network.The fractal network contains sub-paths that interact with different lengths.Each internal signal is altered by the filter before heading toward the next layer [18].

U-Net
U-Net is a CNN architecture developed for bio-medical image segmentation.This network is based on a fully connected network.The network consists of contract paths and extension paths, resulting in architecture in the form of U. U-Net consisting of convolution operations, maximum pooling, ReLU activation, upsampling layer, and downsampling.The downsampling path has five convolution blocks.Each block has a two-layer convolution with a 3 × 3 filter [11].

U-Net3
U-Net3 is a U-Net architecture modified by adding a Batch Normalization function with ReLU activation, max pooling, and upsampling.U-Net3 was Introduced by Daniel Licioti in 2018.This architecture consists of two main parts, namely the contracting path on the left and the expansive path on the right.The first path corresponds to the convolution network architecture, which consists of looping applications of two 3 × 3 convolutions, whereas the second path and each path are followed by ReLU and 2 × 2 Maxpooling operations with stride 2 for downsampling.In each downsampling step, the feature is duplicated while in the expansive path, each upsampling step of the feature is followed by 2 × 2 convolution and ReLU.At the end of the layer, it is followed by 1 × 1 convolution, which is used to map each of the 32 feature vector components into a specified number of classes [13].

Performance Metric
The model produces image predictions, which are then evaluated.The grounded image data is used as a reference to measure the performance of the segmentation results.The performance of the segmentation results is measured by validating the ground truth image used as test data.This validation is carried out with Pixel Accuracy [19], Intersection over Union [20], Mean Accuracy, Precision, Recall, and Dice Coefficient [20] with Equations (1)-( 6).

Results
In this study, we demonstrate that a CNN-based U-Net architecture can successfully explain MR heart valve segmentation.This section will explain the results of the predictions on the testing data.Each CNN architecture is trained.After the training process is complete, the entire architecture is evaluated with different patient data that are not used in the training process.The performance of various architectures is analyzed with three parameters, namely Pixel Accuracy, Intersection over Union, and Dice Coefficient.Figure 2 shows the different between raw image and ground truth.In Table 5, it can be seen that the metric assessment results on the Pixel Accuracy value and the Dice Coefficient on U-Net3 have the highest value compared to other architectures, although other metric evaluations have similar results.
Table 6 shows the results of the entire model by measuring performance using Pixel Accuracy, Intersection over Union, and Dice Coefficient.As can be seen in the table, U-Net3 is the best model, with a Pixel Accuracy value of 97.59%, Intersection over Union value of 86.88%, Mean Accuracy value of 93.46%, Precision value of 85.60%, Recall value of 88.39%, and Dice Coefficient value of 86.58%.We also conducted performance assessments of new data that was not used in training and testing data, unseen data.Although the test results of the U-Net3 model have performance results that are similar to other models; based on the unseen data, the U-Net3 architecture is proven to outperform other architectures, especially on the Dice Coefficient value.
The prediction result of Ground Truth on U-Net3 is close to the result of the Ground Truth, as seen in Figure 5, either in the MR or Normal category.
Table 7 shows that U-Net3 is superior to other architectures because the training time on the U-Net3 model has proven to be faster than other CNN models.In Table 5, it can be seen that the metric assessment results on the Pixel Accuracy value and the Dice Coefficient on U-Net3 have the highest value compared to other architectures, although other metric evaluations have similar results.
Table 6 shows the results of the entire model by measuring performance using Pixel Accuracy, Intersection over Union, and Dice Coefficient.As can be seen in the table, U-Net3 is the best model, with a Pixel Accuracy value of 97.59%, Intersection over Union value of 86.88%, Mean Accuracy value of 93.46%, Precision value of 85.60%, Recall value of 88.39%, and Dice Coefficient value of 86.58%.We also conducted performance assessments of new data that was not used in training and testing data, unseen data.Although the test results of the U-Net3 model have performance results that are similar to other models; based on the unseen data, the U-Net3 architecture is proven to outperform other architectures, especially on the Dice Coefficient value.
The prediction result of Ground Truth on U-Net3 is close to the result of the Ground Truth, as seen in Figure 5, either in the MR or Normal category.Table 7 shows that U-Net3 is superior to other architectures because the training time on the U-Net3 model has proven to be faster than other CNN models.

Discussion
Many studies discuss heart segmentation, but not many discuss heart valve segmentation, in particular, the segmentation of mitral heart valve disease by using the U-Net architecture.As far as we know, this is the first report describing mitral regurgitation segmentation from a color doppler 2D echocardiogram.In another study, an assessment was carried out by assessing the severity of mitral regurgitation to diagnose the severity of mitral regurgitation disease suffered by patients using a deep learning algorithm, namely the convolutional neural network algorithm (Mask R-CNN) in an automated qualitative MR evaluation using color Doppler echocardiography images.The results achieved an average accuracy of around 90 percent.Based on the severity, the classification accuracy was 0.90, 0.89, and 0.91 for mild, moderate, and severe MR [4], respectively, but our proposed method uses the U-Net architecture.Table 8 shows Comparison between ground truth and predicted frame from each architecture.
Based on the predicted ground truth data, it can be seen that the proposed model, U-Net3, succeeded in making predictions that are closest to the ground truth compared to other models because, in other models, the results are not visible.Resnet merges between the top and bottom.U-Net, in the prediction results, shows the upper part is cut off.For U-Net2, the image is also blurred, and is still black in the above image.The V-Net image still looks like the background of the original image, while on fractalNet, in the top and bottom images, it also looks fused.
The research conducted by Nova et al. [14] used U-Net architectures for the automatic segmentation of cardiac septal defects.They proposed a CNN-based U-Net to automatically analyze cardiac chamber segments to detect abnormalities (holes) in the cardiac septum.In this study, segmentation was performed on atrial septal defects (ASD), ventricular septal defects (VSD), septal defects (AVSD), and a normal heart.The results showed that the proposed method can produce superior performance in detecting cardiac septal defects.The segmentation model for the four classes resulted in a Pixel Accuracy of 99.15%, an average Intersection over Union (IoU) of 94.69%, and F1 scores of 94.88%, respectively.The research also compared the proposed models of U-Net and V-Net architecture.The result of the accuracy of the prediction contour comparison for U-Net was 99.01%, while V-Net was 93.70%.It can be concluded that the U-Net model architecture has higher accuracy than V-Net.Meanwhile, in the research that we propose, we also use the U-Net architecture that has been developed, namely U-Net3.As for the results, although we have pixel accuracy of 97.62%, IoU of 86.93%, and F1-Score of 86.51, which is slightly smaller, the difference lies in the object segmentation, the architectural model, and the epoch used.We propose segmentation of mitral regurgitation with the U-Net3 architectural model and also compare it with six other architectures, namely Vnet, Segnet, Resnet, FractalNet, U-Net, and U-Net2.We currently use 500 epochs, while this study proposes segmentation of abnormalities (holes) in the cardiac septum, the architectural model used is U-Net, which is compared to V-Net architecture and 1000 epochs of epoch.
In a study conducted by Rachmatullah et al. [21], they proposed the U-Net architecture for automatic segmentation on the fetal heart image.Regarding the data in the segmentation of ultrasound images, as many as 519 images of the fetal heart were obtained from three videos.In that paper, the combination of U-Net and Otsu threshold resulted in a fairly good performance; 99.48% Pixel Accuracy, 94.92% IoU, and 0.21% error rate.This study discussed fetal heart, which was almost the same as previous studies, whereas we discussed mitral disease regurgitation, which is still rarely discussed.There were 519 images used in that research, while we used 923 images.
In a previous study by Diniz et al., the U-Net architecture model with Concatenation Block (Concat U-Net) managed to get good results for cardiac segmentation.The results of the study reached a Dice Coefficient of 87.95%.This study discussed CT Scan Heart, whereas we discussed mitral disease regurgitation and produced a Dice Coefficient of 86.58%.Although the results were not too different, our research is still rarely carried out and we have also made ground truth predictions, as shown in Table 9.
In this study, the proposed U-Net3 architecture can segment regurgitation mitral valves and normal heart valves in 4-chamber heart data, which has a Pixel Accuracy value of 97.59%, Intersection over Union value of 86.88%, Mean Accuracy value of 93.46%, Precision value of 85.60%, Recall value of 88.39%, and Dice Coefficient value of 86.58%.With a high accuracy value, the prediction of the accuracy is close to the original image.Although there has been no segmentation research on heart valves, the results show that the U-Net3 architecture is quite superior because it has a high accuracy value.The Dice Coefficient value is 86.58%.In addition to the Dice Coefficient value on unseen data, the U-Net3 model also has a training time that has proven to be faster than other CNN models.In this study, Unet3 has been a proven prediction in segmentation using the proposed datasets.Unet3 has the highest metric evaluation in Pixel Accuracy, Intersection Over Union, Recall, And Dice Coefficient.In addition to the Dice Coefficient value in the unseen data in Table 6, the U-Net3 model also has training time which is proven to be faster than other CNN models, as seen in Table 7.
There are many approaches used to segment various medical objects using CNN.Table 5 shows a comparison of the results of several previous studies and the current approach.

Conclusions
In this paper, we proposed a segmentation model using a CNN-based U-Net3 architecture that successfully explains the segmentation of heart valve disease mitral regurgitation.
We measured and evaluated the performance of the proposed model using the parameters of Pixel accuracy, IoU, Mean Accuracy, Precision, Recall, and F1 Scores, obtaining score values of 97.59%, 86.98%, 93.46%, 85.60%, 88.39%, and 86.58%, respectively.The best performance is obtained on the U-Net3 architecture with a batch size of 64 and the loss function of binary cross entropy.We also compared the proposed model with six other architectures, namely SegNet, ResNet, FractalNet, U-Net and U-Net2, and tested all seven architectures using unseen data, which is new data that was not used during training.From the results of the experiment, it can be concluded that U-Net3 is the best predictor in predicting ground truth that is close to the original image.In future work, we will increase

Figure 3 Figure 3 .
Figure 3 shows the illustration of implemented U-Net3 architec

Figure 4
Figure4shows plotting model accuracy and model loss in training and testing from every architecture.

Figure 3 Figure 2 .
Figure 3 shows the illustration of implemented U-Net3 architec.

Figure 3 Figure 3 .
Figure 3 shows the illustration of implemented U-Net3 architec

Figure 4
Figure4shows plotting model accuracy and model loss in training and testing from every architecture.

Table 1 .
No. of images for training and testing.

Table 2 .
Dataset for training, testing, and unseen.

Table 3 .
Parameters for the various CNN Architectures.

Table 5 .
Performance measurement with splitting data.

Table 6 .
Performance measurement with unseen data.

Table 7 .
Performance Training Time.

Table 9 .
Comparison of other methods with different data for segmentation.