Automatized Hepatic Tumor Volume Analysis of Neuroendocrine Liver Metastases by Gd-EOB MRI—A Deep-Learning Model to Support Multidisciplinary Cancer Conference Decision-Making

Simple Summary Quantification of liver metastases on imaging is of utmost importance in therapy response assessment, wherein gadoxetic acid (Gd-EOB)-enhanced magnetic resonance imaging (MRI) shows the highest accuracy. Common criteria for assessing therapy response simplify measuring liver metastasis, as full 3D quantification is very time-consuming. Therefore, we trained a deep-learning model using manual 3D segmentation of liver metastases and hepatic parenchyma in 278 Gd-EOB MRI scans of 149 patients with neuroendocrine neoplasms (NEN). The clinical relevance of the model was evaluated in 33 additional consecutive patients with NEN and liver metastases, comparing the model’s segmentation of baseline and follow-up examinations with the therapy response evaluation of an expert multidisciplinary cancer conference (MCC). The model showed high accuracy in quantifying liver metastases and hepatic tumor load, and its measurements matched the response evaluation of an MCC so that its use to support treatment decision-making would be possible. Abstract Background: Rapid quantification of liver metastasis for diagnosis and follow-up is an unmet medical need in patients with secondary liver malignancies. We present a 3D-quantification model of neuroendocrine liver metastases (NELM) using gadoxetic-acid (Gd-EOB)-enhanced MRI as a useful tool for multidisciplinary cancer conferences (MCC). Methods: Manual 3D-segmentations of NELM and livers (149 patients in 278 Gd-EOB MRI scans) were used to train a neural network (U-Net architecture). Clinical usefulness was evaluated in another 33 patients who were discussed in our MCC and received a Gd-EOB MRI both at baseline and follow-up examination (n = 66) over 12 months. Model measurements (NELM volume; hepatic tumor load (HTL)) with corresponding absolute (ΔabsNELM; ΔabsHTL) and relative changes (ΔrelNELM; ΔrelHTL) between baseline and follow-up were compared to MCC decisions (therapy success/failure). Results: Internal validation of the model’s accuracy showed a high overlap for NELM and livers (Matthew’s correlation coefficient (φ): 0.76/0.95, respectively) with higher φ in larger NELM volume (φ = 0.80 vs. 0.71; p = 0.003). External validation confirmed the high accuracy for NELM (φ = 0.86) and livers (φ = 0.96). MCC decisions were significantly differentiated by all response variables (ΔabsNELM; ΔabsHTL; ΔrelNELM; ΔrelHTL) (p < 0.001). ΔrelNELM and ΔrelHTL showed optimal discrimination between therapy success or failure (AUC: 1.000; p < 0.001). Conclusion: The model shows high accuracy in 3D-quantification of NELM and HTL in Gd-EOB-MRI. The model’s measurements correlated well with MCC’s evaluation of therapeutic response.


Introduction
The incidence of neuroendocrine neoplasms (NEN) has increased in the past 30 years considerably, while at the same time, multiple treatment options are available for this disease [1]. The radiological workload for follow-up of patients with NENs has, therefore, increased accordingly. However, not only because of the increasing incidence but also because of the lower aggressiveness of NENs compared to liver metastases of other entities (e.g., colorectal carcinoma), the number of follow-up examinations is increasing [2][3][4][5][6][7][8][9]. Based on the indolent clinical course of NENs, patients often present at an advanced stage for first diagnosis [4,6,9,10]. The liver represents the predominant site for metastases, and accurate calculation of the hepatic metastatic tumor burden is mandatory for therapeutic follow-up [11]. The measurement of diffuse liver lesions, which occur in 60-70% of patients, can be challenging and is-at present-time-consuming. A further challenge is that common therapeutic response criteria intended to characterize how metastases develop over time are not always suitable for each patient [9]. Response criteria in solid tumors (RECIST, Version 1.1) are based on changes in diameters of a few lesions, which are considered representative [12]. However, hepatic tumor load (HTL), which is neglected if only measuring the diameter of metastases, is an important prognostic marker in hepatically metastasized NEN [4,[13][14][15][16]. The quantitative evaluation of the metastatic volume can potentially provide a practical method for assessing the disease's course and may show improved prognostic value.
Magnetic resonance imaging (MRI) is the most sensitive technique to detect and quantify neuroendocrine liver metastases (NELMs) compared to conventional computed tomography (CT), ultrasound (US), and somatostatin receptor imaging [16][17][18]. Gadoxetic acid-enhanced (Gd-EOB) MRI is even more sensitive than conventional extracellular gadolinium chelate-enhanced MRI [17,19,20]. In addition to the use of contrast-enhanced MRI, the use of diffusion-weighted imaging (DWI) sequences increases the sensitivity in the detection of NELM [21][22][23]. Thus, the combination of DWI and hepatobiliary phase (HBP) sequences with Gd-EOB is now the imaging modality with the highest sensitivity for NELM [24]. Hepatic metastases of NELM typically demonstrate a hypervascularization pattern in dynamic contrast phases (arterial, portal-venous and transitional phase), which aids in the differentiation of NELM from other liver lesions [19,25,26]. Despite the value of dynamic contrast phases in differential diagnoses, lesion measurement, and thus response evaluation, is preferably performed in the hepatobiliary phase (HBP) when hepatocyte-specific contrast agents are used [27].
Advances in artificial intelligence (AI) technology have led to generating image recognition algorithms poised to aid and improve medical imaging procedures. AI has already demonstrated strong performance in various medical applications, especially in imagebased diagnoses [28]. Although several studies suggest that the performance of AI in imaging diagnosis is superior to human experts, the consensus is that AI should play a supporting role to radiologists and that AI tools could especially be used to save time in clinical routine [28][29][30][31][32][33]. The various fields of AI support in liver imaging include segmentation, lesion detection and classification of diffuse or focal liver diseases [34,35].
Here we provide the first data using a high-precision AI algorithm for the 3D quantification of the hepatic tumor burden of NELM and provide a useful tool for clinical decision-making, for example, in multidisciplinary cancer conferences (MCC).  5T), Siemens Healthcare, Erlangen, Germany) between January 2015 and August 2018 at our institution were retrospectively identified from our radiology database. 120 of these scans were not suitable for the model's training because of missing evidence of NELM (n = 112) or due to non-standard scan protocols (n = 8), resulting in a total inclusion of 278 Gd-EOB MRI datasets. Pretreatments (e.g., partial liver resection, transarterial or local ablative therapies), which may influence the morphology of the liver, were not an exclusion criterion.

MCC Cohort
In a second institutional database search, we consecutively identified 33 patients discussed in our MCC between January 2019 and January 2020 and received a Gd-EOB MRI both as a baseline and as a follow-up examination (n = 66). All 33 patients had liver metastases and were selected independently of the hepatic tumor volume or their disease history. In these patients, all MCC decisions were based on the course of the metastatic liver disease. Patients in whom the MCC decision was based on extrahepatic tumor manifestations were excluded.
MCC cohort: Gd-EOB MRI scans were performed on five different institutional MRI scanners and included both 1.5 T and 3 T examinations. All examinations contained a 3D T1w GRE FS sequence during HBP. Due to the different scanners, the scan parameters (TR, TE, FA and matrix) varied between the examinations. The HBP sequence was acquired between 10 and 20 mins after contrast administration. Among others, diffusion-weighted imaging (DWI) sequences were acquired in the time between Gd-EOB injection and the HBP sequence. All DWI sequences contained at least two b values (b = 0 and b = 800) [36].
All examination protocols corresponded to the ENETS consensus guidelines for the standard of care in neuroendocrine tumors [15].

Manual Segmentation
All HBP sequences of the MRI scans (AI dev and MCC cohort) were anonymized and segmented using the Medical Imaging Interaction Toolkit (MITK) [37]. Volumetry (3D segmentation) of the liver and all liver metastases was performed in the HBP 3D T1w-GRE FS sequence. There was no limit on the number of metastases segmented per patient. Segmentation was performed manually using the polygonal region of interest (ROI) tool and is based on the planimetry method. Margins of the liver metastases were defined by the signal difference between hypointense liver metastases and the contrast-enhanced liver parenchyma. Adjacent vessels and biliary ducts were excluded if reasonably possible. All segmentations were refined by a radiologist with >5 years of experience in abdominal MRI. Distribution patterns of NELM were scored according to the number: singular, multiple (≤10 metastases) and diffuse (>10 metastases) and distribution: unilobar (left or right) and bilobar.
For subanalysis, all NELM and livers were manually segmented in DWI sequences in the MCC cohort. The segmentation process was equivalent to that previously described in the HBP sequences.

Model Training and Validation
The model was trained using the MIC-DKFZ nnU-Net (Division of Medical Image Computing, German Cancer Research Center (DKFZ), Heidelberg, Germany) deep-learning framework. nnU-Net is an open-source tool. The source code and comprehensive documentation are publicly available on GitHub [38]. nnU-Net enables 3D semantic segmentation in many biomedical imaging applications without requiring designing respective specialized solutions [39]. Out of the 278 MRI scans, 222 (80%) HBP sequences were randomly chosen for the model training.
The HBP sequences of the remaining 56 scans (20%) were used to test the model's accuracy (internal validation). External validation of the model's accuracy was performed in the MCC cohort (different scanners (1.5 T and 3 T) and sequence parameters were used compared to the model's training).

Clinical Correlation
Our model analyzed the NELM volume and liver volume of the 33 patients with MCC decisions in the baseline (BL) scan and the follow-up (FU) scan on which the MCC decisions were based. The MCC is part of our European Neuroendocrine Tumor Society (ENETS) center of excellence and consists of specialized gastroenterologists, endocrine surgeons, pathologists, nuclear medicine specialists, radiotherapists and radiologists. Absolute and relative changes in NELM volume and HTL calculated by the model were analyzed and compared to the MCC decisions. MCC decisions were classified as therapy success (stable disease (SD) or partial regression (PR)) or therapy failure (progressive disease (PD)) based on the presented images. The evaluation within the board was guided by the response criteria in solid tumors (RECIST, Version 1.1).

Statistics
Statistical analysis was performed using SPSS Statistics (IBM, Version 25, Armonk, NY, USA). The Kolmogorov-Smirnov test showed a non-normal distribution of the data. Therefore, nonparametric testing was performed.
Descriptive data were accordingly presented as the median and interquartile range (IQR). Relative size differences in segmentations were calculated by the following formula: (model's volume-radiologists' volume)/radiologists' volume. Matthew's correlation coefficients (ϕ) were calculated to measure the model's segmentation accuracy as previously published [40]. MCC decisions were compared to the automatized volume evaluation of the model. HTL was calculated by the formula: (NELM volume/(liver volume-NELM volume)) × 100. Absolute volume changes were calculated by the difference: Volume Follow-up -Volume Baseline . Relative volume changes were calculated by the formula: ((Volume Follow-up -Volume Baseline )/Volume Baseline ) × 100. Mann-Whitney U test was used as a dominance test comparing two independent groups of quantitative data. A sign test was used to compare two related samples. Spearman's rank test was used for correlation analysis in continuous variables, and the corresponding correlation coefficients (r s ) were calculated. ROC analysis was performed, and Youden indexes were calculated to determine optimal cutoff values.

AI dev Cohort
Characteristics of the 149 patients with NEN are summarized in Table 1. The most common primary tumor sites were the ileum (51.0%) and the pancreas (43.0%). Confirmed (histologically or with the aid of SR imaging) liver metastases were present in 118 patients (79.2%), which were used for the model training. Out of these 118 patients, 4 patients (3.4%) had singular liver metastasis, 59 patients (50%) had multiple metastases (≤10 metastases), and 55 patients (46.6%) had a diffuse metastatic pattern (>10 metastases). Both liver lobes were involved in 91 patients (77.1%). Unilobar disease limited to a single liver lobe was found in 27 patients (22.9%) (right liver: 24 patients, left liver: 3 patients).

MCC Cohort
Characteristics of the 33 patients with MCC decisions are summarized in Table 1. Comparably to the training cohort, the most common primary tumor sites were the pancreas and ileum (36.4% each). Therapeutic response was classified by the MCC as therapeutic success in 16 (48%) patients (SD: n = 14; PR: n = 2) and therapeutic failure in 17 (PD, 52%) patients.

Validation of the Model
The median NELM volume in the 56 patients (internal validation) of the AI dev group was 17. ume (0.1 and 0.2 cm ). The third patient showed atypical, hyperintense signal intensities of the metastases in HBP; NELM were subsequently missed by the model. Dividing the cohort by the median NELM volume (16.17 cm 3 ) into high and low NELM volume, the model showed significant higher φin patients with higher NELM volume (median φ: 0.80; IQR: 0.73-0.84) compared to low NELM volume (median φ: 0.71; IQR: 0.64-0.78; p = 0.003). For liver segmentation, the median volume was 1639.9 cm 3 (IQR: 1366.

Automatized NELM Volume Analysis and Clinical Correlation (MCC Cohort)
The model's measurements of the MCC cohort are summarized in Table 2 and exemplarily visualized in Figure 2.
The comparison between patients with therapy success (n = 16) and therapy failure (n = 17) showed significant differences for all absolute and relative volume changes (p < 0.001). Patients classified as therapy success by the MCC showed significant lower values in median absolute NELM volume change (∆ abs NELM), median absolute HTL change (∆ abs HTL), median relative NELM volume change (∆ rel NELM) and median relative HTL change (∆ rel HTL) than patients with therapy failure (p < 0.001) (Figure 3).

Automatized NELM Volume Analysis and Clinical Correlation (MCC Cohort)
The model's measurements of the MCC cohort are summarized in Table 2 and exemplarily visualized in Figure 2.   The comparison between patients with therapy success (n = 16) and therapy failure (n = 17) showed significant differences for all absolute and relative volume changes (p < 0.001). Patients classified as therapy success by the MCC showed significant lower values in median absolute NELM volume change (ΔabsNELM), median absolute HTL change (ΔabsHTL), median relative NELM volume change (ΔrelNELM) and median relative HTL change (ΔrelHTL) than patients with therapy failure (p < 0.001) (Figure 3). The case-wise analysis of the 33 MCC patients is summarized in Table 3. The casewise analysis showed that the model correctly detected increased NELM volume in all of The case-wise analysis of the 33 MCC patients is summarized in Table 3. The casewise analysis showed that the model correctly detected increased NELM volume in all of the 17 patients with therapeutic failure (100%). The ∆ abs NELM increase in these 17 patients ranged from +3.02 cm 3 to +864.45 cm 3 and ∆ abs HTL ranging from +0.18 vol.-% to +36.41 vol.-%. The relative increase of ∆ rel NELM ranged from +58.52% to +4513.64% and in ∆ rel HTL from +64.97% to +2497.20%. In patients with therapeutic success (n = 16), the ∆ abs NELM ranged from −394.57 cm 3 to −34.75 cm 3 (in PR) and −35.70 cm 3 to +61.56 cm 3 (in SD) and the ∆ abs HTL from −16.96 vol.-% to −1.73 vol.-% (in PR) and −1.64 vol.-% to +3.48 vol.-% (in SD). The relative change variables of ∆ rel NELM ranged from −74.68% to −63.68% in PR and from −20.19% to 55.25% in SD, and the ∆ rel HTL ranged from −71.13% to −65.03% in PR and from −21.23% to +50.51% in SD (Figure 4). In the total cohort, ROC analysis of MCC decision and the relative changes (∆ rel NELM and ∆ rel HTL) showed an area under the curve (AUC) of 1.000 (p < 0.001) for both variables. The absolute changes showed an AUC of 0.908 for ∆ abs NELM and of 0.926 for ∆ abs HTL (p < 0.001). To determine the best cutoff values for progressive disease, a Youden index was calculated. For ∆ rel NELM, the highest Youden index (1.000; 100% sensitivity and 100% specificity) was found at the cutoff value +56.88%. For ∆ rel HTL, the highest Youden index (1.000; 100% sensitivity and 100% specificity) was found at a cutoff value of +57.73% ( Figure 5). tients ranged from +3.02 cm to +864.45 cm and ΔabsHTL ranging from +0.18 vol.-% to +36.41vol.-%. The relative increase of ΔrelNELM ranged from +58.52% to +4513.64% and in ΔrelHTL from +64.97% to +2497.20%. In patients with therapeutic success (n = 16), the ΔabsNELM ranged from −394.57 cm 3 to −34.75 cm 3 (in PR) and −35.70 cm 3 to +61.56 cm 3 (in SD) and the ΔabsHTL from −16.96 vol.-% to −1.73 vol.-% (in PR) and −1.64 vol.-% to +3.48 vol.-% (in SD). The relative change variables of ΔrelNELM ranged from −74.68% to −63.68% in PR and from −20.19% to 55.25% in SD, and the ΔrelHTL ranged from −71.13% to −65.03% in PR and from −21.23% to +50.51% in SD (Figure 4).  In the total cohort, ROC analysis of MCC decision and the relative changes (ΔrelNELM and ΔrelHTL) showed an area under the curve (AUC) of 1.000 (p < 0.001) for both variables. The absolute changes showed an AUC of 0.908 for ΔabsNELM and of 0.926 for ΔabsHTL (p < 0.001). To determine the best cutoff values for progressive disease, a Youden index was calculated. For ΔrelNELM, the highest Youden index (1.000; 100% sensitivity and 100% specificity) was found at the cutoff value +56.88%. For ΔrelHTL, the highest Youden index (1.000; 100% sensitivity and 100% specificity) was found at a cutoff value of +57.73% ( Figure 5).

Comparison of 3D Quantification between HBP and DWI Sequences
In the MCC cohort, manual segmentations of NELM (rs: 0.981; p < 0.001), livers (rs: 0.966; p < 0.001) and HTL (rs: 0.956, p < 0.001) showed a high correlation between HBP and DWI sequences. However, direct comparison of the measured values for NELM and livers showed significant differences between HBP and DWI (p < 0.001; Table 4). When

Comparison of 3D Quantification between HBP and DWI Sequences
In the MCC cohort, manual segmentations of NELM (r s : 0.981; p < 0.001), livers (r s : 0.966; p < 0.001) and HTL (r s : 0.956, p < 0.001) showed a high correlation between HBP and DWI sequences. However, direct comparison of the measured values for NELM and livers showed significant differences between HBP and DWI (p < 0.001; Table 4). When looking at the changes between BL and FU, a high correlation between DWI and HBP sequences was also shown for ∆ abs NELM (r s : 0.919; p < 0.001), ∆ rel NELM (r s : 0.960; p < 0.001), ∆ abs HTL (r s : 0.883, p < 0.001) and ∆ rel HTL (r s : 0.952; p < 0.001). There were no significant difference of ∆ abs NELM, ∆ rel NELM, ∆ abs HTL and ∆ rel HTL between DWI and HBP-based measurements (p = 0.072 to 0.719; Table 4).

Discussion
This is the most extensive study presenting AI data quantifying the total volume of hepatic tumor burden in NEN using a deep-learning model combined with Gd-EOB MRI. The model achieved high accuracy, especially in patients with higher NELM volume and delivered results corresponding to the MCC consensus decision-making regarding therapeutic success or failure.
The presented deep-learning model differs from previous studies in several aspects. First, the training data set of 278 Gd-EOB MRI examinations is the largest published to date in the automated assessment of focal liver lesions [41]. The high proportion of patients with more than ten metastases resulted in more than 2000 segmented metastases. Second, various hepatic conditions were included in the model's training. Previous liver resection, excessive pretreatment, ablation therapies or preceding intraarterial treatments (e.g., transarterial chemoembolization (TACE) or selective internal radiation therapy (SIRT)) were no exclusion criteria for training. The combination of high case numbers and various pretreatments should improve the robustness of the model in preparation for everyday clinical usage [42,43]. Due to the broad training, it is possible to quantify patients under different therapies with the model. However, individual pitfalls must be considered. Therapy-induced hemorrhage of NELM affects the visualization of lesions in HBP sequences. Our study identified one case in which the model had achieved low accuracy for this reason. In addition, in two cases with very low tumor burden, our model showed only unsatisfactory accuracy. Though, these cases are also of less interest for an automated volume analysis since a conventional, manual evaluation could easily be performed. The aim of our study was not to replace manual evaluation but to complement and improve it.
Our results demonstrate that accurate, automated 3D segmentation of NELM is feasible in HBP from Gd-EOB MRI examinations. Due to the comparatively lower growth dynamics of NELM compared to metastases from other primary tumors, we believe automated quantification is particularly valuable based on the numerous follow-up studies in patients with NEN. Even if NELM is characterized by marked arterial hypervascularization or cystic components on imaging, these features do not affect the HBP sequence [25]. Liver metastases from a wide variety of primary tumors show the same typical imaging characteristics in the HBP sequences with marked hypointensity of the lesion compared to the surrounding liver parenchyma [44]. Therefore, by using HBP in Gd-EOB MRI, our model is not limited to the segmentation of NELM, and its use should also be investigated for liver metastases of other primary tumors.
The high value of Gd-EOB HBP sequences in the determination of NELM size has already been shown and corroborates our approach to using this sequence for 3D segmentation [45]. The high lesion to liver contrast also provides optimum conditions for automated segmentation [46,47]. However, besides its excellent imaging characteristics, Gd-EOB MRI has some disadvantages. These include the comparatively high costs due to the contrast agent itself and the resulting examination time, the general side effects and possible deposition of gadolinium [48]. As a non-contrast alternative with high sensitivity, DWI sequences can also be effectively used to measure NELM without the disadvantages of Gd-EOB MRI [49,50]. Currently, however, DWI sequences are used for detection rather than measurements of liver lesions. In particular, a 3D measurement may be limited by the lower axial resolution of commonly used DWI sequences. In our subanalysis, we could show that DWI-based measurements correlate strongly with those in HBP. However, the absolute measurements of NELM and livers showed significant differences between the two sequences, so that an exact 3D quantification using DWI was not possible. Nevertheless, this inaccuracy was relativized when the measurements were compared in evaluating treatment response. The relative and absolute changes of NELM volume and HTL between baseline and follow-up examination showed no significant difference between HBP and DWI so that evaluation of treatment response using 3D measurements in DWI seems feasible. Therefore, the results of our study encourage developing similar automation for non-contrast DWI MRI as well.
The limitation to lesion diameters versus volume in clinical routine can be best explained by the time required for full 3D volumetry. Up to now, 3D volumetry of liver lesions has only been carried out within the framework of studies [51,52]. Besides the volumetric assessment of tumor burden, the 3D segmentations generated by the model presented in this study could be used for further lesion analysis, such as texture analysis, radiomics or contrast-uptake used in Choi criteria [53,54]. To date, most studies concerning artificial intelligence and liver imaging focus on diffuse liver disease or the classification of liver tumors [55][56][57]. With the help of the presented model and the associated time saving by the automatized segmentation, not only 3D quantification of HTL but also more sophisticated tumor analyses could find their way into clinical routine.
Assessment of therapeutic response in liver metastases, independently from primary tumor origin, is most commonly based on the Response Evaluation Criteria in Solid Tumors (RECIST, Version 1.1). RECIST1.1 is suitable for study cohorts and facilitates response evaluation by defining a limited number (maximum two per organ) of target lesions [58]. From a practical point, response criteria vary regarding increasing versus decreasing tumors. Partial response (PR) is defined as a decrease of at least 30% in the sum of the largest diameter of target lesions. By contrast, progressive disease is defined as increasing at least 20% of target lesions or the appearance of one or more new lesions in a 2D measurement [59]. Considering this somehow simplified approach, the pure volumetric determination of growth behavior should allow a more precise measuring method for therapeutic decision-making in the individual patient. The simplification of RECIST1.1 can lead to patients being interpreted incorrectly or inconsistently during their illness. The limitations of RECIST1.1 become even more evident when evaluating the effects of targeted molecular agents, especially in slow-growing tumors, such as NENs [60,61]. RE-CIST1.1 treatment response strongly depends on which target lesions were chosen at the baseline scan. Heterogeneous treatment response, which can be seen in different types of primary cancers and systemic treatments, is not represented by RECIST1.1 [62]. Additionally, volumetric measurement methods show a higher intra-observer reproducibility compared to RECIST1.1 [63]. Quantification of total HTL in clinical routine is not routinely performed, and in most cases, tumor load is visually estimated by the radiologist. How-ever, several studies have shown that hepatic tumor burden is an important prognostic imaging marker [13,64,65]. Volumetric evaluation of the HTL, as performed by our model, provides useful information on lesion distribution and allows a more realistic quantification of hepatic tumor extent than the (2D) diameter measurements, which are commonly used [66]. In addition, the model considers all lesions, which would also allow capturing of heterogeneous treatment responses.
The new challenge in volumetric tumor mass determination will be developing new cutoff values. If metastasis is seen as a sphere mathematically, an increase of the diameter of the lesion of 20%, which defines a lesion to be classified as a progressive disease in RECIST1.1, would result in a volume increase of approximately 73%. In our cohort, the MCC stated progressive disease and therapy failure when the tumor volume, as determined by the model, increased by 57%. Furthermore, a NELM volume decrease of −57% correctly identified the two patients with partial response. Our results show that 3D assessment of NELM could be useful, but further studies are needed to evaluate its superiority over 2D methods regarding clinical endpoints [67].
MCCs are designed to optimize patient outcomes by elaborating the best treatment plans or changes in cases of therapy failure in a multidisciplinary approach [68,69]. The number of cases discussed in each MCC is steadily rising. This can be explained by the increasing acceptance of the multidisciplinary approach and the rising incidence of cancers due to improved diagnostics [70]. Our study shows that deep-learning models can assist the MCC's decisions by automatized the quantification of HTL. Besides the time-saving aspect, the model could also provide decision support to physicians who have no access to a regularly held MCC.
Our study has some limitations. As mentioned above, the 3D assessment approach needs to be further evaluated on larger clinical collectives with direct comparison to 2D measurements and the impact on clinical endpoints. Another limitation of the study is that the ground truth of accuracy is based on manual segmentation of liver metastasis. Due to the sometimes pronounced, even small foci of liver metastases, manual segmentation is not perfect. To minimize this limitation, all segmentations were checked multiple times to capture all metastases (no limit on the number of lesions per patient) and to train the model as realistically as possible.

Conclusions
In conclusion, the deep-learning model presented shows high accuracy in 3D volumetry of NELM and determination of HTL in Gd-EOB MRI and paves the way for fully automated 3D assessment of hepatic disease. The model also provides useful (potentially prognostic) information about HTL and NELM volume and can be used to assist physicians in response evaluation and the decision-making about therapeutic success or failure comparable to the decisions of an expert multidisciplinary cancer conference.