Deep-Learning Segmentation of Epicardial Adipose Tissue Using Four-Chamber Cardiac Magnetic Resonance Imaging

In magnetic resonance imaging (MRI), epicardial adipose tissue (EAT) overload remains often overlooked due to tedious manual contouring in images. Automated four-chamber EAT area quantification was proposed, leveraging deep-learning segmentation using multi-frame fully convolutional networks (FCN). The investigation involved 100 subjects—comprising healthy, obese, and diabetic patients—who underwent 3T cardiac cine MRI, optimized U-Net and FCN (noted FCNB) were trained on three consecutive cine frames for segmentation of central frame using dice loss. Networks were trained using 4-fold cross-validation (n = 80) and evaluated on an independent dataset (n = 20). Segmentation performances were compared to inter-intra observer bias with dice (DSC) and relative surface error (RSE). Both systole and diastole four-chamber area were correlated with total EAT volume (r = 0.77 and 0.74 respectively). Networks’ performances were equivalent to inter-observers’ bias (EAT: DSCInter = 0.76, DSCU-Net = 0.77, DSCFCNB = 0.76). U-net outperformed (p < 0.0001) FCNB on all metrics. Eventually, proposed multi-frame U-Net provided automated EAT area quantification with a 14.2% precision for the clinically relevant upper three quarters of EAT area range, scaling patients’ risk of EAT overload with 70% accuracy. Exploiting multi-frame U-Net in standard cine provided automated EAT quantification over a wide range of EAT quantities. The method is made available to the community through a FSLeyes plugin.


Introduction
Epicardial adipose tissue (EAT) is a visceral fat depot surrounding the heart between the myocardium and the pericardium [1]. Its volume quantification holds potential as a novel biomarker for risks of coronary heart disease [2]. Pericardial fat, merging EAT and paracardial (PAT) fat, has been studied in the past in association with atherosclerotic disease [3] but these results have since been heavily criticized [4]. The inclusion of two fat depots as one single entity may not reflect the separate functions and clinical implications of each adipose tissue. Indeed, recent studies focusing on separating EAT and PAT concluded that EAT alone was involved in the corresponding disease [5,6]. Indeed, EAT is a metabolically active adipose tissue [1] compared to PAT. Its accumulation and subsequent inflammation add to cardiovascular risks, potentially impacting left ventricle (LV) diastolic dysfunction [7,8]. Even more recently, EAT overload has raised concern as a risk factor in generalized inflammation from COVID-19 [9,10]. It is now recognized that

Study Population
A retrospective mono-centric database was defined totaling 153 subjects, out of which 100 exams could be exploited. The 100 enrolled subjects including healthy controls, type-2 diabetic patients, and non-diabetic obese patients were selected based on 4Ch orientation and the absence of severe artifacts as shown in Figure 1. Patients were defined as having type 2 diabetes mellitus if they fulfilled any of the WHO criteria: HbA1c ≥ 6.5%, FBG level ≥ 7.0 mmol/L, oral glucose tolerance test result ≥ 11.1 mmol/L, or current treatment with antidiabetic agents. Obese non-diabetic patients were defined as the absence of any WHO criteria and a BMI ≥ 30 kg/m 2 . All enrolled subjects had normal left ventricular function, no history of heart failure or coronary heart disease.

MRI Acquisition
All subjects underwent cardiac MRI including the acquisition of a full stack of shortaxis slices and a single slice four-chamber cine on a 3-T MRI system (Magnetom Verio, Siemens Healthineers, Erlangen, Germany) with a dedicated cardiac 32-channel coil array (Invivo, Gainesville, FL, USA). The cine series were acquired with a retrospectively ECGgated balanced steady-state free precession (bSSFP) sequence with in-plane image resolution varying from 1.3 × 1.3 mm 2 to 1.8 × 1.8 mm 2 (depending on subjects), slice thickness of 6 mm, TE/TR = 1.2/3.2 ms, GRAPPA 2 (24 auto-calibration signal lines), temporal resolution of 28-35 ms, with 25 frames reconstructed. Further details of the cardiac MRI protocols were previously described [20,[28][29][30][31]. N4 bias field correction [32] was applied to all image series before further processing.

EAT Segmentation
For reference, EAT volume was segmented by expert readers provided with full stack short-axis series using Argus viewer (Siemens Medical Solutions, Erlangen, Germany). In an independent session, two expert readers were provided with full 4Ch series and performed blinded segmentation of three labels using the FSLeyes viewer [33] (version 0.31, Patients were defined as having type 2 diabetes mellitus if they fulfilled any of the WHO criteria: HbA1c ≥ 6.5%, FBG level ≥ 7.0 mmol/L, oral glucose tolerance test result ≥ 11.1 mmol/L, or current treatment with antidiabetic agents. Obese non-diabetic patients were defined as the absence of any WHO criteria and a BMI ≥ 30 kg/m 2 . All enrolled subjects had normal left ventricular function, no history of heart failure or coronary heart disease.

MRI Acquisition
All subjects underwent cardiac MRI including the acquisition of a full stack of shortaxis slices and a single slice four-chamber cine on a 3-T MRI system (Magnetom Verio, Siemens Healthineers, Erlangen, Germany) with a dedicated cardiac 32-channel coil array (Invivo, Gainesville, FL, USA). The cine series were acquired with a retrospectively ECG-gated balanced steady-state free precession (bSSFP) sequence with in-plane image resolution varying from 1.3 × 1.3 mm 2 to 1.8 × 1.8 mm 2 (depending on subjects), slice thickness of 6 mm, TE/TR = 1.2/3.2 ms, GRAPPA 2 (24 auto-calibration signal lines), temporal resolution of 28-35 ms, with 25 frames reconstructed. Further details of the cardiac MRI protocols were previously described [20,[28][29][30][31]. N4 bias field correction [32] was applied to all image series before further processing.

EAT Segmentation
For reference, EAT volume was segmented by expert readers provided with full stack short-axis series using Argus viewer (Siemens Medical Solutions, Erlangen, Germany). In an independent session, two expert readers were provided with full 4Ch series and performed blinded segmentation of three labels using the FSLeyes viewer [33] (version 0.31, Paul McCarthy, University of Oxford, UK): heart ventricles (HV) (including both ventricle muscles and blood pools), epicardial (EAT), and paracardial (PAT) adipose tissues. EAT was defined as hyperintense signal within the pericardium around the ventricles. Peri-atrial fat was not included as it has been shown that peri-ventricle EAT alone had a stronger correlation with coronary diseases than total EAT [26]. All isles of periventricular fat were included to form EAT area. PAT was defined as fat adjacent but outside the pericardium. Segmentations were performed on three cardiac phases determined by readers having the entire series at their disposal: first phase, peak systole, and late diastole. The three segmented masks were propagated to the remaining frames using an automatic label propagation algorithm based on non-linear registrations, as previously described [34] resulting in 25 images segmented per subjects. Series in the test dataset were segmented by both readers, and reader 1 repeated blinded segmentations 6 weeks later.

Network Architecture
Two different fully convolutional networks (FCNs) were investigated: U-Net [35] with 48 filters for the first layer and FCN developed by Bai et al. [36] with 48 filters for the first layer, later referenced as FCNB. These networks are based on an encoder-decoder structure but differ in their decoder structure. The encoder part processes an image of arbitrary size as input and applies convolutional layers for extracting image features while the decoder upsamples and combines low-resolution featured map to the original input resolution. The absence of a dense layer allows these networks to process images of various sizes.
The U-Net [35] has been the most popular 2D segmentation network for biomedical images and a fundamental component of many state-of-the-art cardiac image segmentation approaches [37][38][39]. The specificity of the U-Net is to employ skip connections between encoder and decoder to recover spatial information lost in downsampling layers as shown in Figure 2.
muscles and blood pools), epicardial (EAT), and paracardial (PAT) adipose tissues. EAT was defined as hyperintense signal within the pericardium around the ventricles. Periatrial fat was not included as it has been shown that peri-ventricle EAT alone had a stronger correlation with coronary diseases than total EAT [26]. All isles of periventricular fat were included to form EAT area. PAT was defined as fat adjacent but outside the pericardium. Segmentations were performed on three cardiac phases determined by readers having the entire series at their disposal: first phase, peak systole, and late diastole. The three segmented masks were propagated to the remaining frames using an automatic label propagation algorithm based on non-linear registrations, as previously described [34] resulting in 25 images segmented per subjects. Series in the test dataset were segmented by both readers, and reader 1 repeated blinded segmentations 6 weeks later.

Network Architecture
Two different fully convolutional networks (FCNs) were investigated: U-Net [35] with 48 filters for the first layer and FCN developed by Bai et al. [36] with 48 filters for the first layer, later referenced as FCNB. These networks are based on an encoder-decoder structure but differ in their decoder structure. The encoder part processes an image of arbitrary size as input and applies convolutional layers for extracting image features while the decoder upsamples and combines low-resolution featured map to the original input resolution. The absence of a dense layer allows these networks to process images of various sizes.
The U-Net [35] has been the most popular 2D segmentation network for biomedical images and a fundamental component of many state-of-the-art cardiac image segmentation approaches [37][38][39]. The specificity of the U-Net is to employ skip connections between encoder and decoder to recover spatial information lost in downsampling layers as shown in Figure 2. Networks' optimized architecture. The two networks evaluated in this study: U-Net and fully-convolutional network (FCNB) architectures included a first 3D convolution layer to allow multiple cardiac frames as input. Following 2D convolution layers encoded images from 48 features up to 768 features. Eventually, the decoder targeted three labels for segmentation in the central input frame: epicardial adipose tissue (EAT), paracardial adipose tissue (PAT), and heart ventricles (HV). Networks' optimized architecture. The two networks evaluated in this study: U-Net and fully-convolutional network (FCNB) architectures included a first 3D convolution layer to allow multiple cardiac frames as input. Following 2D convolution layers encoded images from 48 features up to 768 features. Eventually, the decoder targeted three labels for segmentation in the central input frame: epicardial adipose tissue (EAT), paracardial adipose tissue (PAT), and heart ventricles (HV).
The second network investigated is the FCN developed by Bai et al. [36], later referred to as FCNB. FNCB has demonstrated excellent segmentation performances on the largest available cardiac MR dataset (UK-Biobank [40] Its specificity is based on the decoder that only consists of the concatenation of all featured maps, upsampled to the original resolution, as shown in Figure 2. In their original papers, the cross-entropy loss was used to train those networks. However, this loss has shown limits to address class imbalance. In our study, regions of interest (ROI) were sparsely represented compared to the background and cross-entropy loss is inadequate to handle it. Thus, the loss function was defined as the mean dice between the probabilistic label map without background and the manually annotated label map.

Training
Specifically, optimized FCNB and U-Net were trained on three consecutive cine frames for segmentation of the central frame, providing a crucial temporal information often necessary for the experts to segment EAT. Input images were normalized to the range of [0,1] with fixed size (256 × 192 × 3), mask zero-padding or cropping was applied when needed.
For each batch (N = 30), on-the-fly data augmentation was performed using rotational transformation and/or image scaling before feeding them to the network. Both data augmentation were set using a random clipped normal distribution spanning from −30 • /0.4 up to 30 • /1.6 for rotational transformation and image scaling respectively. The Adam optimization [41] was used for minimizing the dice loss function with a constant learning rate of 1e-3. It took approximatively 35 min to train either the U-Net or FCNB on a Graphics Processing Unit (GPU) (NVidia Tesla K80).
The networks investigated were implemented using Python within the TensorFlow 2 framework. The FCNB model was adapted from the original implementation [42], whereas U-Net was custom-designed. To adapt to the proposed multi-frame approach, both 2D networks were modified to accept 2D+t inputs, considering the cardiac time dimension as a third dimension with limited horizon. Thus, the first convolution layer of each network was replaced with a 3D convolution layer with valid padding. The following layers were kept identical, processing extracted features independently of the input dimensions.
To perform a robust evaluation, networks were trained using cross-validation and evaluated on an independent dataset: the database was split in five subsets (500 images/20 subjects each reflecting our database populations distribution: 4 healthy controls, 13 type 2-diabetics, 3 nondiabetic obese patients). One subset (500 images) was used as a test set whereas the 4 other subsets were used for stratified cross-validation training, resulting in a 4-fold cross-validation. Thus, a single subset is retained as validation (500 images) whereas the 3 others (1500 images) are used for training, ensuring that validation and training dataset reflects the database population distribution.

Evaluation Metrics
Segmentation performances were evaluated for accuracy, propinquity, and surface estimation error. Dice similarity coefficient (DSC) measured segmentation accuracy from the overlap between the manual and automatic segmented surfaces (S M and S A ), defined as The mean surface distance (MSD) calculated the propinquity between segmentations as is the mean distance (in mm) between segmented contours, defined as To evaluate the clinical final purpose, which is the quantitative measurement of EAT area, absolute relative surface error (RSE) was utilized, defined as To further assess accuracy, positive predicted value (PPV) which is an indicator of over-segmentation (PPV << 1) was calculated on the entire database, defined as

Statistical Analysis
Statistical analysis was conducted using R (version 3.6.3) [43]. Analysis of linear regression was used to study the correlation between manually evaluated EAT volume and 4Ch area. The metrics' distribution normality was assessed using the Shapiro-Wilk test. Wilcoxon signed rank and Wilcoxon rank sum tests were used to investigate significant differences for each metrics between intra-inter observers and FCNs. To account for segmentation difficulty and clinical relevance [44] that scale with the quantity of EAT, networks' performances were assessed per quartile of manually segmented EAT area (Q 1 < 8.22 cm 2 ≤ Q 2 < 12.70 cm 2 ≤ Q 3 < 15.55 cm 2 ≤ Q 4 ).
Corresponding EAT areas as measured on 4Ch views correlated well with total EAT volume measured from the stack of short-axis cine ( Figure 3) with a slightly higher correlation in systole (Pearson r = 0.77) than in diastole (Pearson r = 0.74). Thus, a wide range of EAT 4Ch areas was available from 1.2 cm 2 to 37.2 cm 2 , with a lower range for healthy subjects from 2.5 to 13.7 cm 2 , from 1.2 cm 2 to 23.2 cm 2 for non-diabetic obese subjects and from 5.3 cm 2 to 37.2 cm 2 for type 2 diabetic patients.
As shown in Table 2, intra and inter-observer DSC confirmed excellent reproducibility for HV segmentation (DSC Intra = 0.98 and DSC Inter = 0.96 resp.). EAT and PAT differed between the two observers (DSC Inter = 0.76 and 0.78 for EAT and PAT resp.), although segmentations performed twice by the same observer proved to be more reproducible (DSC Intra = 0.83 and 0.85 for EAT and PAT resp.). Intra-observer DSC and MSD were significatively lower (p < 0.05) concerning EAT segmentation in the diastolic frame compared to the segmentation in the systolic frame. For inter-observer bias, differences in DSC, MSD, or RSE metrics were not statistically significant between diastolic and systolic frames. Diagnostics 2022, 12, x FOR PEER REVIEW 7 of 15 Figure 3. Comparison of reference total epicardial fat volume andproposed EAT area measured on four-chamber cine. EAT area was measured in end-systolic or end-diastolic frame across the 100 subjects' database. The three cohorts merged for the database were identified by markers color.
As shown in Table 2, intra and inter-observer DSC confirmed excellent reproducibility for HV segmentation (DSCIntra = 0.98 and DSCInter = 0.96 resp.). EAT and PAT differed between the two observers (DSCInter = 0.76 and 0.78 for EAT and PAT resp.), although segmentations performed twice by the same observer proved to be more reproducible (DSCIntra = 0.83 and 0.85 for EAT and PAT resp.). Intra-observer DSC and MSD were significatively lower (p < 0.05) concerning EAT segmentation in the diastolic frame compared to the segmentation in the systolic frame. For inter-observer bias, differences in DSC, MSD, or RSE metrics were not statistically significant between diastolic and systolic frames.
FCNB and U-Net segmentations performance measured by DSC, were significantly lower (p < 0.05) than intra-observer bias for all labels (for EAT: DSCIntra = 0.83, DSCU-Net = 0.77, DSCFCNB = 0.76). Both networks provided equivalent DSC, MSD, and RSE performance than inter-observer bias for all labels (for instance PAT: DSCInter = 0.78, DSCU-Net = 0.80, DSCFCNB = 0.78). Across the four quartiles of data defined by equally populated ranges of EAT areas, both networks provided reliable segmentation of the heart ventricles (HV, FCNB: DSCQ1-Q4 = 0.97-0.96, U-Net: DSCQ1-Q4 = 0.97) as shown in Table 3. Interestingly, the network performances to segment EAT strongly depended on the population quartile. Indeed, U-Net DSC was significantly higher (p < 0.001) for upper quartiles as observed using U-Net: DSCQ4 = 0.83 > DSCQ3 = 0.80 > DSCQ2 = 0.76 > DSCQ1 = 0.69 as illustrated in Figure 4. DSC Figure 3. Comparison of reference total epicardial fat volume andproposed EAT area measured on four-chamber cine. EAT area was measured in end-systolic or end-diastolic frame across the 100 subjects' database. The three cohorts merged for the database were identified by markers color. FCNB and U-Net segmentations performance measured by DSC, were significantly lower (p < 0.05) than intra-observer bias for all labels (for EAT: DSC Intra = 0.83, DSC U-Net = 0.77, DSC FCNB = 0.76). Both networks provided equivalent DSC, MSD, and RSE performance than inter-observer bias for all labels (for instance PAT: DSC Inter = 0.78, DSC U-Net = 0.80, DSC FCNB = 0.78).
Across the four quartiles of data defined by equally populated ranges of EAT areas, both networks provided reliable segmentation of the heart ventricles (HV, FCNB: DSC Q1-Q4 = 0.97-0.96, U-Net: DSC Q1-Q4 = 0.97) as shown in Table 3. Interestingly, the network performances to segment EAT strongly depended on the population quartile. Indeed, U-Net DSC was significantly higher (p < 0.001) for upper quartiles as observed using U-Net: DSC Q4 = 0.83 > DSC Q3 = 0.80 > DSC Q2 = 0.76 > DSC Q1 = 0.69 as illustrated in Figure 4. DSC and RSE metrics demonstrated a gap of segmentation quality between the lower two quartiles and the upper two quartiles for both PAT and EAT segmentation (for EAT FCN: RSE Q4 = 15.60, RSE Q3 = 15.87 < RSE Q2 = 21.91 < RSE Q1 = 27.98). Across all quartiles, both networks had more difficulty separating PAT from EAT than identifying total pericardial fat (EAT+ PAT) in the image (with U-Net, RSE EAT + PAT << RSE EAT or RSE PAT for all quartiles). Over the database and for all labels, U-net outperformed (p < 0.0001) FCNB for segmenting accurately (DSC), nearer to the ground truth (MSD), thus providing a more reliable (i.e., accurate) measurement (RSE).
FCNB and U-net performed significantly better (p < 0.05) for segmenting EAT area on the systolic frame compared to the diastolic frame (DSC UNet-diastole = 0.76 DSC UNet-systole = 0.80). These differences were not significant in PAT (see Appendix A Figure A1).
Classification of our database split by quartile of EAT burden was observed by confusion matrices. From Figure 5, the confusion matrices diagonal (in green) gave a measure of correct classification (66% for FCNB and 71% for U-Net), whereas the subdiagonal and the superdiagonal (in yellow) allowed evaluating a misclassification by one quartile (32% for FCNB and 27% U-Net) and the second subdiagonal and superdiagonal (in red) gave an estimate of a misclassification by two quartiles (2% for FCNB 2% for U-Net). As shown by subdiagonal confusion matrices and confirmed by PPV, FCNB significantly over-estimated EAT area compared to U-Net (PPV FCNB = 0.73 < PPV U-Net = 0.75, p < 0.0001).

Discussion
This study aimed at providing a rapid and fully integrable evaluation of epicardial fat burden. To achieve this evaluation, automated segmentation of the EAT layer was performed on four-chamber cine MRI series using Deep Learning approaches.

Four-Chamber-View Intrapericardial Fat Area Is a Relevant Measure of EAT
Confirming previous literature [24,25], the correlation found in this work between EAT area and volume across a wide range of EAT volumes (from 29 to 376 cm 3 ) comforted the relevant use of four-chamber EAT area as a rapid but realistic measure of EAT burden. Already in past studies, the 2D EAT area has been linked to left ventricular diastolic dysfunction [22,26], hypertension and severity of insulin resistance [25], and non-alcoholic fatty liver disease patients [27]. Thus, four-chamber view holds potential as a surrogate to quantify EAT in routine clinical practice. Moreover, in four-chamber view, the pericardium beyond the apex of the heart could be visualized with more reliability. However, our database gathered retrospective studies in which EAT volume segmentation had been measured in short-axis views by different investigators over the years, which could lead to unaccounted volume imprecision. Ideally, the gold standard CCT EAT volume quantification would have been preferred but this examination is not commonly indicated for metabolic patients.

A Specific Database with Possible Extensions
This work leverages a unique database that combines a population spanning a large range of EAT quantity and manual segmentation of EAT on cine series. The strength of our dedicated database stands in its diversity in BMI, sex, age, health condition across many subjects (n = 100) ( Table 1). Despite a large diversity of subjects, a disparity of age remains between younger healthy subjects and diabetic and/or obese patients. The addition of data from older healthy subjects, as well as elderly subjects (>65 years) would benefit the current database to reinforce our network training as elderly have been shown to be significantly more EAT burdened than younger individuals [45]. Our database could also be extended by including image sets from different MRI scanner types. Currently, this is a monocentric study and database. As a result, the trained models might not adapt well on datasets from scanners of different vendors and field strengths. Nevertheless, the database was made up of multiple protocols acquired over a decade, which already featured a variety of acquisition parameters and image quality levels. To further leverage the number of annotated data (2500 ground-truth, 25 images segmented per subject), generative adversarial network could be explored to extend beyond proposed data augmentation [46]. Another challenge are recurrent artifacts (aliasing, dark bands, flux artifacts) commonly observed in 3T bSSFP cine-MRI images, particularly pronounced in obese patients. This might preclude EAT segmentation and disturb networks accuracy. Training networks on artifacted images is another important addition to strengthen models for them to be ready for the clinic.

Challenge of EAT Segmentation
Experts and networks provided excellent results on large structures such as heart ventricles (DSC ≥ 0.96) and pericardial fat (DSC ≥ 0.88). However, one major challenge for the segmentation of EAT on cine MRI is to distinguish between burdening EAT and its extrapericardial neighbor PAT. The pericardial fascia that separates those two fat compartments is about 2 mm thick [47,48] which is of the same order of magnitude as the image resolution (1.3-1.8 mm). This explains why both networks were able to segment combined EAT + PAT pericardial fat with appreciable precision, but the identification of individual fat was less satisfying. Nevertheless, FCN networks provided segmentation results on par with experts' precision. Additionally, since cardiac contraction pulls onto the pericardium, its visualization improves in peak-systole [22], making this frame more suitable for the measurement of EAT when compared to diastole (p intra (DSCdia/DSCsys) = 0.0282).
One novelty has been to input multiple cardiac frames from the cardiac cycle to networks using a 3D first convolutional layer. It could be interesting in future work to enhance temporal information which is essential to detect the pericardial fascia. A map of cardiac deformations could enhance input images to be supplied to the network. It would be also interesting to investigate other network architectures, such as recurrent neural network, that could memorize information from adjacent slices to improve inter-slices coherence [49], but these extensions fall outside the scope of this work.

Comparing FCNs Performances
Specific complementary metrics (DSC, MSD, and RSE) have been chosen to evaluate EAT area segmentation and quantification. Alternatively, the Hausdorff distance metric is a common choice to evaluate segmentation performance [50], measuring the maximal pixel distance error between segmentations. However, EAT region is sparsely distributed around the heart, thus the Hausdorff distance was not considered in this work since it might range rapidly high, even when comparing two segmentations with similar areas.
From chosen metrics, U-Net outperformed FCNB for all labels, thus appearing preferrable to quantify EAT 4Ch area. Alternative semi-and fully automatic methods have been proposed for the EAT quantification on MRI-cine. Cristobal-Huerta et al. [51] developed an automatic pipeline composed of Law texture filters, snakes and K-cosine curvature analysis to partially quantify EAT volume, albeit on 10 subjects only. In a semi-automatic processing, Fulton et al. [52] applied landmarks on short-axis images from 12 subjects to unroll images into polar coordinates before employing a neural network for detection of epicardial fat contours. We were unable to compare our results with those previous works as segmentation metrics (e.g., DSC metric or Jaccard similarity index) were not provided. Recently, automatic total pericardial fat quantification has been developed in 4Ch cine MRI. Bard, Raisi-Estabragh et al. [23] obtained segmentation performances (DSC EAT+PAT = 0.8) very similar to ours (DSC EAT+PAT = 0.88) on their respective test-set. In their study, only the end-diastolic frame had been segmented while we segmented the full 4Ch cine MRI and trained on three consecutive cine frames to leverage cine temporal information. Finally, the optimized multi-frame U-Net was integrated in a FSLeyes plugin made available to the community [53] allowing comparison with further work and providing clinicians with a rapid EAT area segmentation (see Appendix A Figure A2).

Performances across Quartiles
Splitting the database in quartiles of EAT enabled to differentiate segmentation performances depending on EAT area. Indeed, segmentations quality from FCNs proved to be degraded in group Q1, in which EAT (as well as PAT) was thin and sparse as illustrated in Figure 4. However, EAT segmentations were on a par with inter-observers' manual segmentation for the three upper quartiles and remained relevant for identifying patient at risk (Q 2 , Q 3 , Q 4 ≥ 8.22 cm 2 ) by measuring their EAT burden within 14% and 18% precision for U-Net and FCNB respectively.

Conclusions
This study provides a methodology for fully automated segmentation of epicardial fat on multi-frame cardiac cine MRI, demonstrated across 100 subjects exhibiting low to high EAT quantities. EAT is often overseen in diagnosis but has received increasing attention as a relevant biomarker of cardiac risk. Automatic EAT evaluation could help to identify patients at risk, especially for diabetic patients. The comparison with EAT volume supports the potential of four-chamber cine EAT area as a surrogate for clinical evaluation, with higher segmentation robustness in systolic frame. Between the two FCNs investigated, the optimized U-Net was better suited to provide EAT area estimation with a 14.2% precision for the clinically relevant upper three quarters of targeted EAT range. EAT evaluation on cine, leveraging multi-frame information, could be further integrated to explore both retrospective and prospective cardiac studies without the need for a specific acquisition thanks to publicly provided automatic EAT area segmentation.