Fast and Automated Segmentation for the Three-Directional Multi-Slice Cine Myocardial Velocity Mapping

Three-directional cine multi-slice left ventricular myocardial velocity mapping (3Dir MVM) is a cardiac magnetic resonance (CMR) technique that allows the assessment of cardiac motion in three orthogonal directions. Accurate and reproducible delineation of the myocardium is crucial for accurate analysis of peak systolic and diastolic myocardial velocities. In addition to the conventionally available magnitude CMR data, 3Dir MVM also provides three orthogonal phase velocity mapping datasets, which are used to generate velocity maps. These velocity maps may also be used to facilitate and improve the myocardial delineation. Based on the success of deep learning in medical image processing, we propose a novel fast and automated framework that improves the standard U-Net-based methods on these CMR multi-channel data (magnitude and phase velocity mapping) by cross-channel fusion with an attention module and the shape information-based post-processing to achieve accurate delineation of both epicardial and endocardial contours. To evaluate the results, we employ the widely used Dice Scores and the quantification of myocardial longitudinal peak velocities. Our proposed network trained with multi-channel data shows superior performance compared to standard U-Net-based networks trained on single-channel data. The obtained results are promising and provide compelling evidence for the design and application of our multi-channel image analysis of the 3Dir MVM CMR data.


Introduction
Three-directional cine multi-slice left ventricular myocardial velocity mapping (3Dir MVM) is a promising cardiac magnetic resonance (CMR) technique to assess cardiovascular conditions. In addition to conventionally available magnitude data, this technique also acquires velocity-encoded phase velocity mapping data from the phase-contrast MRI without ionizing radiation [1][2][3]. These phase velocity mapping data are used to generate velocity maps in three orthogonal directions throughout the cardiac cycle to gather both spatial and temporal information from the cardiac movement. The cine data also provided us with an opportunity to observe the cardiac movements and segment the left ventricular myocardium by employing the available temporal data.
Delineating the left ventricle (LV) myocardium from the CMR data is crucial in the determination of the myocardial motion from this data. Manual segmentation of both of human inter-observer variations in the myocardium segmentation in terms of their Dice Scores.

Data Acquisition and Pre-Processing
The CMR data were acquired from 18 healthy subjects (8 of them were acquired twice, giving 26 datasets in total) at the Royal Brompton Hospital. There was an additional subject acquired twice for independent testing (Section 3.2). All 19 subjects (18 for training (26 datasets) and 1 for independent testing (2 datasets)) had normal sinus rhythms. Further demographic information of these 19 subjects is included in Table 1.
For each subject, high temporal resolution (reconstructed to 50 frames per cardiac cycle, i.e., per R-R interval in the electrocardiogram (ECG)) cine spiral MVM with non-Cartesian SENSE reconstruction and 3 orthogonal directions of velocity encoding were acquired in a single breath-hold [21]. There were 121 cine slices (giving 6050 cine frames in total) in total from all these datasets (3-5 cine slices for each dataset). The acquired spatial resolution was 1.7 × 1.7 mm 2 , and data were reconstructed into a 512 × 512 matrix (reconstructed spatial resolution 0.85 × 0.85 mm 2 ). Data were acquired in short-axis slices from base to apex of the left ventricle.
All subjects gave their informed consent for inclusion before they participated in the study with approval from the local institutional review board (in accordance with the Declaration of Helsinki). The institutional review board approved the CMR scanning (Development of Cardiovascular Magnetic Resonance Sequences for Future Clinical Use V3.0 October 10, 2018, approved by the NRES London East Research Ethics Committee). The ground truth of the myocardium segmentation was performed by a senior CMR physicist with 20+ years' experience in CMR.
We augmented the data by a random horizontal flipping (probability of 0.5) and a random rotation (angle = [0, 90 • ]) before the training. Table 1. Demographic information of the 19 subjects (18 for training and 1 for independent testing). ECG R-R intervals were approximately +/−100ms. For 10 of the subjects, volunteers were asked to provide their weights and heights without having them actually measured.

Network Architectures for the Myocardium Segmentation
We summarized the available standard strategies on supervised learning-based deep learning as below, where they are all U-Net-based approaches.
In addition, we designed the strategies below, allowing the model to understand the more latent relationships between different channels.
[d] We proposed a novel network structure ( Figure 1) based on standard U-Net structures and its encoders and decoders, where we input the magnitude (512 × 512 × 1) and phase velocity mapping (512 × 512 × 3) data separately into the network. Each input had its respective encoder to allow the model to extract information that would otherwise be fused at early stages. We then added a novel MCAB to disentangle the multi-channel information better before it entered the decoder following standard U-Net structure at each depth of the network. This model is abbreviated as AMU_MP below.
Within MCAB, we firstly concatenated the outputs of two encoders at each depth and added a convolutional block that was of the same setting as the encoder to allow the model to disentangle the latent relationship between the information between the two channels. Then, the convoluted data stream went through an attention block [23] before being delivered to the decoder at each depth. phase velocity mapping (512 × 512 × 3) data separately into the network. Each input had its respective encoder to allow the model to extract information that would otherwise be fused at early stages. We then added a novel MCAB to disentangle the multi-channel information better before it entered the decoder following standard U-Net structure at each depth of the network. This model is abbreviated as AMU_MP below. Within MCAB, we firstly concatenated the outputs of two encoders at each depth and added a convolutional block that was of the same setting as the encoder to allow the model to disentangle the latent relationship between the information between the two channels. Then, the convoluted data stream went through an attention block [23] before being delivered to the decoder at each depth.

Training Procedure
For the 121 CMR cine slices (each of 50 frames, giving 6050 2D cine frames in total) introduced in Section 2.1, we augmented each cine slice 4 times. Then, we stacked all augmented cine CMR cine frames and the original cine CMR cine frames on their frame axis, giving a total of 30250 stacked frames (6050 × 4 + 6050 = 30250) for training. All models specified in Section 2.2 were trained from scratch and evaluated using a five-fold crossvalidation scheme. As 121 is not a multiple of 5, each validation round takes 23-25 original CMR cine slices (1150, 1200 or 1250 testing frames) for testing and the augmented and original cine frames corresponding to the rest of the cine slices for training. We trained our network with cross-entropy loss using the Adam optimizer [24]. For hyperparameters per Kingma et al. [24] we set α = 0.001, β1 = 0.9, β2 = 0.999, ε = 1× 10 −7 .
For all experiments, we defined the batch size as 8 to accommodate the high image resolution of 512 × 512. The model was trained for 5 epochs. The training was completed with an NVIDIA TESLA V100 GPU.

Testing Procedure
The input for testing was prepared in the same way as training as part of the fivefold cross-validation. The testing was firstly performed on an NVIDIA TESLA V100 GPU and then repeated on a machine without a GPU to test its efficiency on a modern computer without a powerful GPU. This can also prove the feasibility of our model to be deployed in a clinical environment, in which normally a decent GPU may not be available.

Training Procedure
For the 121 CMR cine slices (each of 50 frames, giving 6050 2D cine frames in total) introduced in Section 2.1, we augmented each cine slice 4 times. Then, we stacked all augmented cine CMR cine frames and the original cine CMR cine frames on their frame axis, giving a total of 30250 stacked frames (6050 × 4 + 6050 = 30250) for training. All models specified in Section 2.2 were trained from scratch and evaluated using a five-fold cross-validation scheme. As 121 is not a multiple of 5, each validation round takes 23-25 original CMR cine slices (1150, 1200 or 1250 testing frames) for testing and the augmented and original cine frames corresponding to the rest of the cine slices for training. We trained our network with cross-entropy loss using the Adam optimizer [24]. For hyperparameters per Kingma et al. [24] we set α = 0.001, β 1 = 0.9, β 2 = 0.999, ε = 1× 10 −7 .
For all experiments, we defined the batch size as 8 to accommodate the high image resolution of 512 × 512. The model was trained for 5 epochs. The training was completed with an NVIDIA TESLA V100 GPU.

Testing Procedure
The input for testing was prepared in the same way as training as part of the five-fold cross-validation. The testing was firstly performed on an NVIDIA TESLA V100 GPU and then repeated on a machine without a GPU to test its efficiency on a modern computer without a powerful GPU. This can also prove the feasibility of our model to be deployed in a clinical environment, in which normally a decent GPU may not be available.

Post-Processing
For healthy subjects, the endocardial and epicardial borders of their CMR short-axis data were expected to be approximately ellipsoid. Any "broken" segmentations, which did not conform to this shape constraint, passed through an additional postprocessing stage. Examples of "broken" segmentation included semi-ellipses and ellipses that were not closed.
At the post-processing stage, we followed Halir et al. [25] to regress the predicted segmentation around the centre of the segmentation into a 3-pixel width ellipse, and we output the final prediction as the regressed ellipse on top of the originally predicted segmentation so that any "broken" segmentation could be rectified. Then, the largest and the second largest contours obtained were considered as the final coordinates of the epicardium and the endocardium layer, respectively, for the generation of CMR MVM. A flow chart is included ( Figure 2) for a clearer explanation of the workflow.
In addition, the analysis of the myocardial velocities in Section 2.6.2 requires the segmentation results generated to be a close ring-like shape, so it can calculate the velocity of movement for the whole myocardium. Therefore, this post-processing also helped to shape the model-generated results to be an acceptable input into the myocardium velocity analysis tool, so we can analyse these velocity data and markers as described. For healthy subjects, the endocardial and epicardial borders of their CMR short-axis data were expected to be approximately ellipsoid. Any "broken" segmentations, which did not conform to this shape constraint, passed through an additional postprocessing stage. Examples of "broken" segmentation included semi-ellipses and ellipses that were not closed.
At the post-processing stage, we followed Halir et al. [25] to regress the predicted segmentation around the centre of the segmentation into a 3-pixel width ellipse, and we output the final prediction as the regressed ellipse on top of the originally predicted segmentation so that any "broken" segmentation could be rectified. Then, the largest and the second largest contours obtained were considered as the final coordinates of the epicardium and the endocardium layer, respectively, for the generation of CMR MVM. A flow chart is included ( Figure 2) for a clearer explanation of the workflow.
In addition, the analysis of the myocardial velocities in Section 2.6.2 requires the segmentation results generated to be a close ring-like shape, so it can calculate the velocity of movement for the whole myocardium. Therefore, this post-processing also helped to shape the model-generated results to be an acceptable input into the myocardium velocity analysis tool, so we can analyse these velocity data and markers as described.

Dice Similarity Coefficient (or Dice Scores)
Dice Similarity Coefficient (DSC) is a broadly used metric to evaluate the segmentation performance based on the overlap between the automated segmentation result and the ground truth [26,27]. Let Sauto and Sgt be the automated segmentation result and the manually annotated ground truth, respectively; the DSC can be defined as the Equation (1) below [26,27].
To assess and compare the performance of each model, we first evaluated the accuracy of predictions using Dice Scores on raw predicted segmentation data before postprocessing (1) per cine frame, (2) per cine slice (50 cine frames) and (3)   Dice Similarity Coefficient (DSC) is a broadly used metric to evaluate the segmentation performance based on the overlap between the automated segmentation result and the ground truth [26,27]. Let S auto and S gt be the automated segmentation result and the manually annotated ground truth, respectively; the DSC can be defined as the Equation (1) below [26,27].
To assess and compare the performance of each model, we first evaluated the accuracy of predictions using Dice Scores on raw predicted segmentation data before post-processing (1) per cine frame, (2) per cine slice (50 cine frames) and (3) per subject (3-5 cine slices).
The three Dice Score metrics allowed a comprehensive observation and examination of the model's performance.

CMR MVM Velocity Data
While the Dice similarity coefficients examine the overlaps per segmentation data array, the CMR MVM velocity analysis observes the temporal derivatives of the 3D movements of the segmented LV myocardium, i.e., velocities, by three axes-the longitudinal, radial and circumferential [21].
The LV myocardial velocity analysis, especially regionally, has the potential to be recognised under clinical settings to analyse the function of the myocardium in terms of its temporal movements (i.e., LV contraction and relaxation).
After segmentation, global longitudinal, radial and circumferential LV myocardial velocities were calculated and recorded from simulated temporal LV myocardial movements within a cardiac cycle according to the LV myocardial segmentations. By analysing the curves, global peak radial/longitudinal velocities were recorded for systolic (PS), diastolic (PD) and atrial systolic (PAS) velocities at all slices across the axial direction. Additionally, from the circumferential velocity-time curves, we recorded the typically shown two early circumferential systolic peaks (C1 and C2) and a third circumferential early diastolic peak (C3). All these peaks are widely applied as markers in the myocardial movements under clinical settings and in reproducibility and reliability analysis [28,29]. In addition to the global velocity data, we split the myocardium into 6 regions in accordance with the advice from the American Heart Association [30]-anterior (A), mid anteroseptal (AS), inferoseptal (IS), inferior (I), inferolateral (IL) and anterolateral (AL), and reported all peak velocities for each region as well. From these peak velocity data collected, we can see how these variations may be able to help to form clinically valuable evidence for diagnosis.
In this study, we were able to evaluate these velocity curves and markers for a total of 8 subjects, each of whom was scanned twice, giving 16 datasets. We were not able to analyse the rest of subject population, since these datasets were acquired and processed by another system, which was incompatible with our current CMR MVM myocardium velocity analysis tool. However, the analysed 16 datasets are representative.
To assess the clinical significance of the predicted segmentation and its accuracy for obtaining clinical measurements, we compared the longitudinal, radial and circumferential markers of peak velocities from the LV myocardial velocity curves generated from the manual segmentation and the model-generated delineation using two-way mixed, single score inter-class coefficients (ICCs) [31][32][33] and also the Bland-Altman analysis. For ICC, we classified them as excellent (>0.75), good (0.6 < ICC ≤ 0.75), fair (0.4 < ICC ≤ 0.6) and poor (<0.4) per each region for regional peak values and per whole myocardium for global peak values [31].

Inter-Observer Study
We then invited another very skilled and experienced clinician (10+ years in CMR) as our second observer to delineate two randomly picked datasets independent of the first observer who previously defined our ground truth.
We compared the Dice Score distributions between the first observer and the second observer and the Dice Score distributions between the first observer and our best performing model (model [d] as defined in 2.2) in terms of their mean Dice Scores.
The model employed for testing here refers to one of the five cross cross-validation trained models. This model is trained by the corresponding augmented cine frames and original cine frames of 97 CMR cine slices.
This comparison can help us to see the variability of segmentation results of the model and the second human observer compared to the first human observer in terms of their segmentation overlaps by Dice Scores.

Cross-Validation Testing
We cross-validated all four models (Section 2.2 [a]-[d]) to demonstrate their capabilities in CMR MVM myocardium segmentation using the two evaluation methodologies. Then, we compared the results of different models by the mean and standard deviation. Results were tested for statistically significant differences using the Wilcoxon signed-rank test, and the tests were considered significant if associated p values were <0.05.
The quantitative results of the models validated by Dice Scores can be found in Table 2 and Figure 3. As the differences between models are relatively small, we conducted statistical tests to evaluate the differences more critically. By comparing the results between single-channel input-based models in Section 2.

Cross-Validation Testing
We cross-validated all four models (Section 2.2 [a]-[d]) to demonstrate their capabilities in CMR MVM myocardium segmentation using the two evaluation methodologies. Then, we compared the results of different models by the mean and standard deviation. Results were tested for statistically significant differences using the Wilcoxon signed-rank test, and the tests were considered significant if associated p values were <0.05.
The quantitative results of the models validated by Dice Scores can be found in Table  2 and Figure 3. As the differences between models are relatively small, we conducted statistical tests to evaluate the differences more critically. By comparing the results between single-channel input-based models in Section 2.     Following the segmentation outputs from the model, we performed CMR MVM velocity analysis for each model in the framework, where we considered the segmentation model and the post-processing stage as one segmentation framework. We excluded four cine slices whose myocardial velocities could not be generated from their segmentation from any of the models by the myocardial velocity analysis tool in the below comparison study. We firstly conducted ICC tests (Supplementary Table ST1) for each of these peak myocardial velocity values (PD, PA and PAS for the longitudinal velocity; PD, PA and PAS for the longitudinal velocity; and C1, C2 and C3 for the circumferential velocity, as specified in Section 2.6.2) generated from segmentation results. We then classified all ICCs per method specified in Section 2.6.2 (Table 4). We were able to observe that all ICCs of global values and the majority of the ICCs of regional values are classified as excellent. Additionally, we also made Bland-Altman plots for each of these peak myocardial velocity values (PD, PA and PAS for the longitudinal velocity; PD, PA and PAS for the longitudinal velocity; and C1, C2 and C3 for the circumferential velocity, as specified in Section 2.6.2) generated from different models (Supplementary Figure SF1). Table 4. Classification of inter-class coefficients (ICCs) of CMR MV markers across models.

Independent Testing
In addition, the trained model with the highest Dice Score (model [d]) also performed well on further two independent testing datasets, which were not included in the training and cross-validation process (Figures 4 and 5) with high Dice Score distributions ( Figure 6).
The independent testing by these two datasets further confirms the performance of our model [d] in addition to the cross-validation.

Inter-Observer Study
With the segmentation from the second observer (as specified in Section 2.7), we can calculate the Dice Scores between the first and the second observers as specified in Section 2.6.1, who are both very experienced CMR clinicians/physicists, as specified in Sections 2.1 and 2.7. This provides a reference for inter-observer manual variation of the myocardium segmentation in terms of the Dice Scores. Figure 5. Global velocity curves of the longitudinal, radial and circumferential myocardium velocities per cine slice for a sample cine slice from a subject generated from the manual segmentation and our automated segmentation system.  . Global velocity curves of the longitudinal, radial and circumferential myocardium velocities per cine slice for a sample cine slice from a subject generated from the manual segmentation and our automated segmentation system.
We then extracted the automated segmentation of the corresponding subjects by model [d] (highest performing as per Section 3.1) from Section 3.1 and calculated its distribution of Dice Scores against the first observer as per Section 2.6.1.

Inter-Observer Study
With the segmentation from the second observer (as specified in Section 2.7), we can calculate the Dice Scores between the first and the second observers as specified in Section 2.6.1, who are both very experienced CMR clinicians/physicists, as specified in Sections 2.1 and 2.7. This provides a reference for inter-observer manual variation of the myocardium segmentation in terms of the Dice Scores.
We then extracted the automated segmentation of the corresponding subjects by model [d] (highest performing as per Section 3.1) from Section 3.1 and calculated its distribution of Dice Scores against the first observer as per Section 2.6.1.
We were then able to observe a significantly higher distribution of Dice Scores per subject and per cine slice in the automated segmentation from their box plots of Dice Scores (Figure 7). Although in the box plot for per cine frame Dice Scores, we have a few outliers that fall below the distribution of the second observer, it did achieve statistical significance by Wilcoxon signed-rank tests against the distribution obtained from the second observer (Figure 7). We were then able to observe a significantly higher distribution of Dice Scores per subject and per cine slice in the automated segmentation from their box plots of Dice Scores (Figure 7). Although in the box plot for per cine frame Dice Scores, we have a few outliers that fall below the distribution of the second observer, it did achieve statistical significance by Wilcoxon signed-rank tests against the distribution obtained from the second observer (Figure 7).

Processing Time
By testing all models on an NVIDIA TESLA V100 GPU or using the option without the GPU as per Section 2.3, we were able to measure their processing time, respectively (Table 5). Within the processing time, the process of data loading and model loading solely takes less than 6 seconds for each of the models regardless of the employment of the GPU. We can conclude that all models are able to load and infer on a cine slice within 10.5 s on average using an NVIDIA TESLA V100 GPU as per Section 2.3. Even without the GPU, all models are able to complete segmentation of a single cine slice within 60.7 s on average. Thus, it is also worth noting that even without the GPU, the model is still able to infer each cine slice within a minute or slightly over a minute, as per the methodology mentioned in Section 2.4. This implied that a typical modern computer can also perform the automated segmentation within an acceptable period of time once the model is trained.
Its efficiency in segmentation, together with its efficacy and accuracy compared to the currently highly tedious and time-consuming manual segmentation processes, makes the developed framework welcomed for use by clinicians.

Processing Time
By testing all models on an NVIDIA TESLA V100 GPU or using the option without the GPU as per Section 2.3, we were able to measure their processing time, respectively (Table 5). Within the processing time, the process of data loading and model loading solely takes less than 6 seconds for each of the models regardless of the employment of the GPU. We can conclude that all models are able to load and infer on a cine slice within 10.5 seconds on average using an NVIDIA TESLA V100 GPU as per Section 2.3. Even without the GPU, all models are able to complete segmentation of a single cine slice within 60.7 seconds on average. Thus, it is also worth noting that even without the GPU, the model is still able to infer each cine slice within a minute or slightly over a minute, as per the methodology mentioned in Section 2.4. This implied that a typical modern computer can also perform the automated segmentation within an acceptable period of time once the model is trained.

Discussion
Our current work has served as a proof-of-concept study to introduce deep learning networks into the segmentation of left ventricles in 3Dir MVM CMR data. Sections 3.1 and 3.2 have provided evidence from both methodologies of crossvaluation and independent testing to evaluate the performance of the models. The results from Sections 3.1 and 3.2 have provided statistically strong evidence that the multi-channel can outperform the previously existing single-channel segmentation models. They have also proved that the integration of different channels (magnitudes and velocities) can provide more information and thus better performance for the automated segmentation framework. Our experiments have also provided evidence for the usefulness of these attention modules when integrating and processing multi-channel data.
Additionally, we performed myocardial velocity analysis, where we analysed the segmentation results in terms of the time derivatives of temporal physiological movements of the segmented myocardium, so we can see how similar the automatically segmented myocardium to the manually segmented myocardium when pumping blood across time frames across all three axes during an R-R interval in the ECG. From the very low bias (close to 0) and small limits of agreements in velocity curves and all peak values derived across all the longitudinal, radial and circumferential directions, we were then able to prove the efficacy and accuracy of our automated segmentation by the velocities derived from simulated temporal LV myocardium movements in addition to the Dice Scores evaluating the overlaps in the automated segmentation results.
Although we were not able to compare the segmentation results on all slices by myocardial velocity analysis as our myocardium velocity analysis tool has a very strict temporal and shape restriction on the segmentation result input, we could still analyse the majority of the cine slices and provided promising results as supporting evidence to our proposed method. We also believe that with the advancement of the algorithm with temporal and shape constraints on its output, we can mitigate this problem in future versions of this segmentation framework. From the inter-observer study, we are now able to observe the variability among human observers and have provided a reference for future technicians regarding the variation they should expect in the ground truths from different observers. The results have also provided strong evidence to suggest lower variations of our model [d] against its training dataset (or the first observer) than the second observer's and have further confirmed evidence regarding the existence of minor inter-observer variation in manual segmentation, which can be compensated and rectified by our automated segmentation framework. However, the inter-observer study can be applied to a greater number of subjects to supplement the generality of this study.
At the current stage, our proof-of-concept study has assessed the performance of the proposed deep neural networks on healthy subjects only. We expect that a more diverse group of subjects will be introduced in further development, where 3Dir MVM CMR images will be collected from patients with cardiac disease (e.g., dilated cardiomyopathy) and further analysis will be conducted. With a more diverse group of subjects, we will be able to critically evaluate and report the benefits and shortcomings of this methodology. Additionally, with the introduction of a more diverse range of subjects, we may need to adjust and re-design the post-processing stages, as the current design of the post-processing workflow is based on the assumption that all subjects are healthy and thus that the shapes of their LV myocardium should be ellipse-like. This may be considered as a limitation of our current work and may require another layer of post-processing to mitigate any "broken" segmentations. In this paper, we were able to assess the performance of models in our very unique data with a limited number of subjects. However, more experiments on multi-centre and multi-scanner datasets with a greater number of subjects may be needed to assess the generality and reproducibility of the proposed model design. Moreover, to facilitate a larger clinical trial, more comprehensive inter-and intra-observer study will be carried out that will involve manual labelling for the clustering of clinicians with various experience.
Another limitation of our work is that as the results show in Table 5 For the design of the model network, we have noticed that there are plenty of directions for further developments. This can include the incorporation of Long Short-Term Networks (LSTM) and/or Recursive Neural Networks (RNN) into our framework to process the temporal information of the cine slices.

Conclusions
In this study, we proved that the standard U-Net-based structure can be used for myocardium segmentation in 3Dir MVM CMR. Additionally, we proposed a multi-channel attention block to better incorporate the multi-channel information and used that to design our novel network architecture with post-processing stages to enhance the segmentation accuracy. In order to assess the performance of the designed frameworks, we evaluated them with relation to quantitative CMR MVM velocity peaks in addition to the Dice Scores. Our experiment results demonstrated the high efficacy and efficiency of our proposed automated segmentation framework.