Radiomics of Liver Metastases: A Systematic Review

Simple Summary Patients with liver metastases can be scheduled for different therapies (e.g., chemotherapy, surgery, radiotherapy, and ablation). The choice of the most appropriate treatment should rely on adequate understanding of tumor biology and prediction of survival, but reliable biomarkers are lacking. Radiomics is an innovative approach to medical imaging: it identifies invisible-to-the-human-eye radiological patterns that can predict tumor aggressiveness and patients outcome. We reviewed the available literature to elucidate the role of radiomics in patients with liver metastases. Thirty-two papers were analyzed, mostly (56%) concerning metastases from colorectal cancer. Even if available studies are still preliminary, radiomics provided effective prediction of response to chemotherapy and of survival, allowing more accurate and earlier prediction than standard predictors. Entropy and homogeneity were the radiomic features with the strongest clinical impact. In the next few years, radiomics is expected to give a consistent contribution to the precision medicine approach to patients with liver metastases. Abstract Multidisciplinary management of patients with liver metastases (LM) requires a precision medicine approach, based on adequate profiling of tumor biology and robust biomarkers. Radiomics, defined as the high-throughput identification, analysis, and translational applications of radiological textural features, could fulfill this need. The present review aims to elucidate the contribution of radiomic analyses to the management of patients with LM. We performed a systematic review of the literature through the most relevant databases and web sources. English language original articles published before June 2020 and concerning radiomics of LM extracted from CT, MRI, or PET-CT were considered. Thirty-two papers were identified. Baseline higher entropy and lower homogeneity of LM were associated with better survival and higher chemotherapy response rates. A decrease in entropy and an increase in homogeneity after chemotherapy correlated with radiological tumor response. Entropy and homogeneity were also highly predictive of tumor regression grade. In comparison with RECIST criteria, radiomic features provided an earlier prediction of response to chemotherapy. Lastly, texture analyses could differentiate LM from other liver tumors. The commonest limitations of studies were small sample size, retrospective design, lack of validation datasets, and unavailability of univocal cut-off values of radiomic features. In conclusion, radiomics can potentially contribute to the precision medicine approach to patients with LM, but interdisciplinarity, standardization, and adequate software tools are needed to translate the anticipated potentialities into clinical practice.


Introduction
The liver is a frequent target for distant metastases from several tumors. Liver metastases (LM) are associated with poor prognosis and may occur early in gastrointestinal malignancies because of hematogenous spread through the portal venous system [1][2][3][4][5]. Selected patients with LM, mainly those with liver-only metastases, can be considered for aggressive systemic and loco-regional treatments to prolong survival expectancy and optimize quality of life. Several studies have focused on LM from colorectal cancer, for which significant progress has been achieved. Effective chemotherapy may lead to a relevant improvement in survival, exceeding 30 months in the most favorable reports [6][7][8]. Liver resection in selected patients obtained 5-year survival rates as high as 50% [9][10][11][12]; percutaneous ablation gained consensus, as it can grant effectiveness approaching that of surgery in small LM [13]. The treatment of non-colorectal LM is also evolving, but therapies other than chemotherapy are still less codified [14][15][16].
Such an aggressive policy, including several therapeutic options, requires a precision medicine approach. The selection of the appropriate course of action should rely on an adequate understanding of tumor biology and robust clinical biomarkers. However, the availability of reliable prognostic indices is currently an unmet need. Pathologic details of LM can be identified only ex-post after resection. Response of LM to chemotherapy is strongly associated with prognosis [17,18], but it is overestimated by standard imaging modalities [19][20][21]. Genetic mutations are promising biomarkers, but they are still under evaluation [22,23].
In recent decades, we became aware that imaging contains a great amount of data, namely in the form of grey level patterns, which are invisible to the human eye [24]. These texture characteristics can be correlated with pathology data and outcomes [25], potentially allowing diagnostic and prognostic evaluation. The analysis of textural features in medical images, which rely on mathematical functions, such as histogram analysis and matrices, is termed radiomics [24,26]. Recently, radiomic features have been standardized by the imaging biomarker research initiative [27]. This technology is attractive because it could be used to extract biological data directly from the radiological images, without invasive procedures, thus sparing costs and time and avoiding any risk for the patients. It would ideally embody the concept of "virtual biopsy." For many tumors, radiomic analyses have already provided an accurate evaluation of biology, allowing the identification of indices correlated with clinical outcomes [28][29][30][31]. In LM, where multiple therapeutic options are often available, a radiomics-based approach could be used to attain the most appropriate treatment decision. Based on the available literature, the present systematic review aims to elucidate the contribution of radiomic analyses to the management of patients with LM. Figure 1 depicts the selection process. After screening for duplicates and eligibility, 32 studies were included in the qualitative synthesis. More than half of the publications (n = 18, 56%) were published in the last eighteen months. Most papers (n = 28) described retrospective analyses, while four reported planned secondary analyses of prospectively acquired data [32][33][34][35]. Nineteen authors analyzed computed tomography (CT) [32][33][34][35][36][37][38][39][40][41][42][43][44][45][46][47][48][49][50], eight magnetic resonance imaging (MRI) [51][52][53][54][55][56][57][58], three positron-emission tomography (PET)/CT [59][60][61], and two multiple imaging modalities (CT and MRI; PET and MRI, respectively) [62,63]. Various software applications were used for texture analysis, with these being custom-made in a large proportion of cases (n = 10). For the qualitative synthesis, we distinguished four groups of studies according to their subject: (1) radiomics of colorectal LM; (2) radiomics of non-colorectal LM; (3) capability of radiomics to perform differential diagnosis of focal liver lesions, distinguishing LM from other tumors (benign and malignant); (4) technical aspects of radiomics of LM. In the first group (radiomics of colorectal LM), we further distinguished four subgroups of studies according to their endpoints: prediction of survival, prediction of response to chemotherapy, correlation with pathological data, and miscellaneous. For details, refer to Section 4.3. Figure 2 summarizes the organization of qualitative analysis. Most papers (n = 18) analyzed radiomics of colorectal metastases. Due to the heterogeneity of studies, some papers fitted into more categories.

Assessment of Study Quality
The average radiomic quality score (RQS) [64] was in the 10 ± 6.5 (range , roughly 25% of the maximum score (n = 36). Only four studies (16%) [33,34,39,40] had a score higher than 18 (>50% of the maximum score). The main limitations in quality were the following: no cost-effectiveness analysis (32 studies, 100%); lack of open-data repositories (n = 31, 97%); no phantom calibration (n = 31, 97%); failure to include a calibration statistic (n = 30, 94%); lack of prospective design (n = 28, 87%); and missing validation cohort (n = 18, 56%). At the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) checklist [65] (31 elements), studies had an average score of 18 ± 3 points (range 14-29), i.e., 58 ± 10% of the maximum possible score. According to the quality assessment of diagnostic accuracy studies (QUADAS-2) [66], there was a high risk of a patient selection bias in 34% of papers because of selection/inclusion criteria in most cases. One-fourth of studies had a high risk of bias related to the index test or the reference standard, while only one study (3%) had a high risk of bias in flow and timing. The RQS and TRIPOD scores of studies are reported in Table 1. Details of QUADAS-2 assessment and summary of its findings are presented in Table 2 and  Supplementary Table S1.  Before analyzing the studies' results in detail, it is helpful to elucidate terminology that is commonly used in radiomics. The definition of radiomic features investigated in the studies is detailed in Table 3. In addition, region of interest (ROI) is defined as the selected area or volume of any imaging modality to analyze for the extraction of radiomic features.

Kurtosis
Distribution of voxel values. Low kurtosis: most data points are close to the mean (few outliers). High kurtosis: data are spread far from the mean (more outliers).

Second-Order Statistics
Textural features quantifying tumor heterogeneity by analyzing the spatial distribution of pixel/voxel intensities Gray level co-occurrence matrix (GLC M) Measures the arrangements of voxel pairs along a fixed direction (homogeneity, contrast, correlation, entropy, dissimilarity, and angular second moment/energy) Gray level run length matrix (GLRLM) Consecutive voxels with the same intensity along with fixed directions (can have long-or short-run, as well as low-and high-gray level emphasis).
Gray level size zone matrix (GLSZM) Clusters of connected pixels with the same grey value. They can have small-and large-area as well as low-or high-gray emphasis.
Neighborhood gray tone/level difference matrix (NGTDM/NGLDM) The difference in gray level between one voxel and its 8/26directions (in 2D/3D). Includes rate, intensity, and frequency of intensity change.
Three studies showed an association between entropy and prognosis. Andersen et al., and Lubner et al., reported that the higher the entropy of LM, the better the OS (HR ranging from 0.16 to 0.63 in the Andersen et al., study, according to the filter used; HR = 0.65, 95%CI = 0.44-0.95 at coarse filter level in the Lubner et al., study) [32,43]. On the other hand, Beckers et al., reported some prognostic value of the ratio between entropy of LM and entropy of parenchyma (the higher the value, the shorter the OS, HR = 1.9, 95%CI = 0.95-3.78) [38].
Additional radiomic features have been reported. In the Simpson et al., study, LM correlation and contrast (combined into a single texture parameter) were associated with OS (HR = 2.35, 95%CI = 1.21-4.55) [47]. Dohan et al., analyzed imaging modalities before and after treatment and identified three predictors of OS: a decrease in the sum of the target liver lesions, high baseline density of dominant liver lesion, and drop in kurtosis [33]. Those three features (combined into a texture analysis score) evaluated after two months of chemotherapy had a strong association with OS (SPECTRA Score >0.02 vs. ≤0.02, HR = 2.82, 95%CI = 1.85-4.28 in the training dataset; HR = 2.07, 95%CI = 1.34-3.20 in the validation dataset). Radiomic score at two months had the same prognostic value of RECIST criteria after six months of chemotherapy. Shur et al., reported an association of minimal pixel value (negative prognostic factor, HR = 1.66, 95%CI = 1.28-2.16) and gray level size zone matrix (GLSZM) small area emphasis (positive prognostic factor, HR = 0.62, 95%CI = 0.47-0.83) with the PFS [62]. Finally, the following features have been associated with OS: standard deviation [32], LM density at CT scan [46], future liver remnant energy and entropy combined into a single linear predictor [47], ShapeSI4 (included in a radiomic signature) [40], and area under the curve of volume histograms at PET-CT [61].
The results of studies about radiomic features associated with the prediction of survival are summarized in Table 4 and Supplementary Table S2.
In the Rao et al., paper, the entropy of LM after chemotherapy decreases in responders, while uniformity increases (entropy:  [46]. In the study by Beckers et al., treatment success was slightly associated with higher entropy (6.65 ± 0.26 in responders vs. 6.51 ± 0.34 in non-responders, p = 0.08) [38]. The Zhang et al., analysis of T2 MRI images before chemotherapy showed that responding lesions had a higher variance and lower angular second moment (two measures of homogeneity) than non-responding ones (variance: 446.07 ± 329.60 in responders vs. 210.23 ± 183.39 in non-responders, p < 0.001; angular second moment: 0.96 ± 0.02 vs. 0.98 ± 0.01, respectively, p < 0.001) [56]. Dercle et al., built a signature, based on two measures of entropy, gray-tone difference matrix contrast and shape, which allowed to predict responsiveness to anti-angiogenic treatment (AUC = 0.80, CI95% = 0.69-0.94 for patients with high imaging quality; AUC = 0.72, CI95% = 0.59-0.83 for patients with standard imaging quality) [40]. Andersen et al., depicted LM modification after treatment with regorafenib. They observed data discordant with previous analyses (increase in entropy and decrease in uniformity), but none of the patients displayed a measurable response (85% had stable disease, while the remaining ones had progression) [32]. Considering skewness, in the study of Ahn et al., low baseline values (indicating a higher spread towards higher gray levels) were associated with response (0.02 ± 0.32 in responders vs. 0.33 ± 0.44 in non-responders, p = 0.001) [37]. One study demonstrated a skewness increase during treatment [32]. In opposition to CT and MRI, high entropy detected at 18 F-FDG PET images before treatment predicted a worse response to therapy (AUC = 0.74, 95%CI = 0.52-0.97) [61].
Other features have been associated with response: high mean attenuation [37]; narrow standard deviation [37]; high baseline density of dominant liver lesion [33]; and mean values of histogram parameters for apparent diffusion coefficient maps [54].
The results of studies about radiomic features associated with the prediction of response to chemotherapy are summarized in Table 4 and Supplementary Table S3.

Prediction of Pathology Data
Three studies evaluated the association between radiomic features and pathology data [39,43,45]. Lubner et al., demonstrated that entropy, mean of positive pixels, and standard deviation are inversely associated with tumor grading (p = 0.007 for entropy, p = 0.002 for mean positive pixels, and p = 0.004 for standard deviation), while skewness and kurtosis showed a trend for an inverse association with KRAS mutation (p = 0.04 for skewness, and p = 0.058 for kurtosis) [43]. Cheng et al., reported that growth patterns of LM (desmoplastic, replacing, and pushing) can be successfully discerned on CT images by using second-order radiomic features, in particular gray level size zone matrix and gray level non-uniformity (AUC

Other Papers
Three additional papers studied the radiomic features of colorectal LM. Reimer et al., analyzed the evaluation of response of LM undergoing trans-arterial radio-embolization [55]. In post-treatment MRI, higher kurtosis in arterial and portal phases and higher skewness in portal phase identified patients with a progressive disease earlier than standard RECIST criteria. Li Y et al., reported a model based on radiomic features of the primary tumor and LM before resection (heterogeneity, entropy, energy of vertical wavelet, and low-gray-level run emphasis) that was able to predict the future appearance of further LM [42]. Wagner et al., analyzed CT and PET-CT imaging of primary tumor and LM [60]. They demonstrated that colon cancer and LM have different skewness and kurtosis at both imaging modalities (CT and PET-CT), while colon cancers with or without LM have similar features. Table 4 summarizes the data of studies dealing with colorectal LM.

Radiomics of Non-Colorectal LM
Four papers focused on non-colorectal LM. A single study assessed CT-based radiomic indices in LM from esophageal cancer [41]. The study found that the characteristics of pre-treatment CT related to heterogeneity and gray-level intensity, such as wavelet gray level co-occurrence matrix correlation and gray level distance zone matrix with large dependence emphasis, were predictors of response to chemotherapy. Two studies explored radiomic analyses in LM from neuroendocrine tumors (NET) [44,63]. Martini et al., analyzed a small series of patients (n = 49) but observed a number of associations: pancreatic NET had lower skewness and higher mean HU than non-pancreatic ones; entropy in the arterial phase was negatively associated with PFS in pancreatic NET and with OS in non-pancreatic NET; kurtosis was associated with lower OS in pancreatic NET, while skewness with higher one [44]. Weber et al., investigated the correlation between parameters derived from the somatostatin receptor agonist ( 68 Ga-DOTATOC) PET and MRI with the proliferation index Ki67 [63]. Entropy and dissimilarity (from both PET and MRI) had a direct correlation with Ki67, while homogeneity had an inverse one. Moreover, it was possible to distinguish G1 and G2 LM on the basis of entropy, homogeneity, and dissimilarity (on PET data only). Finally, Trebeschi et al., reported heterogeneity-related radiomics parameters as predictors of response to immunotherapy in LM of melanoma and non-small-cell lung carcinoma [49].

Differentiation of LM from Other Hepatic Lesions
Four studies investigated whether radiomic features could discriminate LM from other hepatic lesions. Jansen et al., analyzed metastases, primary hepatic tumors, and benign lesions (adenomas, cysts, and hemangiomas) on MRI images [52]. A model using, among other features, the time to peak histograms and the sum of squared variance could distinguish different liver lesions. In the paper by Gatos et al., selected texture characteristics (inverse different moment, sum variance, and long-run emphasis) could differentiate metastases, hepatocellular carcinoma (HCC), and benign lesions [51].
Li et al., tested a model to distinguish hemangiomas, LM, and HCC, using second-order features (gray level co-occurrence matrix, gray-level run-length matrix, and intensity-size zone matrix) [53]. In their model, no feature combination could differentiate the three types of lesion at the same time. Differential diagnosis of the two malignant entities (LM and HCC) required a more complex model, with a higher number of features, than differential diagnosis between benign and malignant lesions (LM vs. hemangiomas or HCC vs. hemangiomas). Finally, a study by Song et al., identified kurtosis, variance, and inverse difference moment as distinguishing criteria between benign and malignant hypervascular lesions [48]. Only the latter study used pathology data as the reference standard for all the analyzed patients. Figure 3 provides an overview of the potential contribution of radiomics to the management of patients with LM.

Influence of Technical Features on Radiomic Analyses
Some studies set off to investigate whether acquisition or reconstruction parameters could influence the values of texture analysis indices. Ahn et al., tested three different reconstruction modes of CT images, i.e., filtered back-projection, iterative reconstruction model, and hybrid iterative reconstruction. The reconstruction method affected numerous parameters, including entropy, homogeneity, skewness, kurtosis, and gray-level co-occurrence matrices [36]. Lubner et al., compared the effect of 2D and 3D reconstruction on radiomic parameters by performing a Bland-Altman analysis on a subset of 20 patients [43]. The results were similar for the two methods. Those results were confirmed by a further investigation by Ahn et al. [37]. The latter study also compared the influence of different CT scanners, ranging from 8 to 64 rows, on the radiomic parameters, without finding any significant difference. Similar results were reported in the MRI setting: Peerlings et al., used the concordance correlation coefficient to test the reproducibility of an array of first-and second-level radiomics parameters over time (multiple MRI) and different MRI systems on apparent diffusion coefficient maps, finding good stability with most parameters [58]. Conversely, two studies reported that radiomic parameters derived from CT scans are affected by slice thickness setting and ROI size [34,50]. Dercle et al., demonstrated that ROI area size, metastatic site, and the individual characteristics of image acquisitions should be considered as confounding factors in the evaluation of tumor entropy [35].
Inter-observer agreement was assessed by four studies [33,52,54,60]. Although they used different indices, such as K-statistic, intra-class correlation, and correlation index (r-value), all studies reported a substantial or excellent agreement among different readers. Finally, one study by Chatterjee et al., devised a method to reduce the rate of false discovery when analyzing radiomic parameters in small datasets [57].

Discussion
Our review identified a consistent number of research papers dealing with radiomics analyses of LM, mostly published in the last three years and with an evident increase over time. The role of texture analysis has been explored in different clinical settings, leading to innovative insights into the management of patients with LM.
In the diagnostic field of research, radiomics can distinguish different types of hepatic lesions, differentiating metastases from benign lesions and primary tumors [48,[51][52][53]. LM appear to be characterized by a high gray level entropy, heterogeneity, and variance. This phenomenon can be explained by the co-existence of different cell clones, the presence of necrosis and, more relevantly, by the unregulated sprout of new tumor vasculature. Radiomic analyses could lead to a conclusive and reliable diagnosis after a single imaging modality, preventing the need for an additional radiological examination or an invasive biopsy.
Most studies focused on LM from colorectal cancer. Higher entropy and lower homogeneity of LM at diagnosis have been associated with a better prognosis and response to therapy [38,40,45,46,56]. Such indices could predict good vascularization of LM, while more homogeneous tumors could reflect a tighter cellular structure or necrosis, which might purport reduced therapy effectiveness. Conversely, a decrease in entropy and an increase in homogeneity after treatment on CT have been associated with tumor response [40,45]. Similar considerations are possible for skewness: low baseline values and an increase after chemotherapy were associated with response to chemotherapy [32,37]. A higher asymmetry index describes a higher prevalence of voxel with lower gray level values, which is compatible with the onset of necrosis in the target tissue. All those radiomic features are consistent with the reduction in neoangiogenesis and the onset of necrosis. It is worth noting that measures of heterogeneity could have a different meaning in 18 F-FDG PET-CT [59,61], indicating therapy resistance. Further studies are needed for this imaging modality.
Some studies, such as the one by Rao et al. [45], demonstrated the superiority of radiomic features over standard biomarkers and predictors of response to chemotherapy. Such data are of major clinical relevance considering that, to date, prognosis assessment in patients with colorectal LM is still limited: it relies on morphological criteria, while tumor biology assessment by genetic factors is largely unsatisfactory [22,23]. Similarly, traditional RECIST criteria for response evaluation are associated with prognosis [17,18] but show major discrepancies with real LM modifications at the pathological level (TRG) [19,20]. Radiomic features not only demonstrated an earlier evaluation of response than standard RECIST criteria [33], but also an adequate assessment of TRG [45]. Texture analyses were also able to predict additional pathology details of LM that have a prognostic impact, such as tumor growth patterns [39]. Those data can allow a real precision medicine, planning treatment based on a reliable evaluation of the effectiveness of therapies and prognosis, that, to date, can be assessed only ex-post. However, when evaluating the radiomic data, it is of utmost importance to place the texture information into the appropriate clinical context. For instance, indices related to uniformity are negative biomarkers at baseline, but hallmarks of good response in the post-treatment setting.
Evidence was much less robust for non-colorectal LM. The predictive value of heterogeneity at baseline staging is shared by LM from colorectal and esophageal cancer, but not by NET [44]. In the latter group, the possibility of identifying the origin and grading of NET through radiomics is appealing [44], since both data drive treatment and prognosis. Nonetheless, data are still limited and must be confirmed by further studies.
The present review highlights some significant limitations of studies, as reflected by the methodological RQS assessment and the clinical TRIPOD checklist. The selected papers presented a wide variability in sample size, relying on small series in most cases. Only a few studies had a prospective design (radiomic analyses being a secondary endpoint in all of them) or a validation dataset. Even comparison with reference standard was not adopted by all authors. There was a high heterogeneity of utilized techniques and inconsistencies in the number and order of analyzed features, as almost every institution used a different software application. Furthermore, most studies did not provide a univocal cut-off value of radiomic features or, when provided, it was not coherent among studies, precluding any broader applicability of mentioned parameters. To date, only a few studies considered data from surgical specimen or biopsy. This is a major limitation as long as pathological data are the mandatory reference to definitively assess the capability of radiomics to discriminate LM from other diseases and to identify the biological characteristics of tumors.
All these limitations hinder the quantum leap of radiomics from the investigation field into clinical practice, representing a central issue of future radiomics research, and should be addressed as soon as possible. The instability of radiomic features across different devices and acquisition protocols, especially for MRI images [67], could further limit the real application of radiomics to daily clinical practice. To this purpose, a collaboration between clinicians and medical imaging experts is pivotal, as interdisciplinarity correlates with the quality of the published research [68]. Cooperation between institutions is warranted to find methods capable of countering features variability and instability, based on the analysis of large databases [69]. Continuous standardization of radiomic features is also crucial [27]. The radiomic analysis should be performed with a validated and user-friendly software interface, as the heterogeneity of methods prevents the attainment of reproducible cut-off values [70]. Transition to clinical practice is impossible without a reliable and fast segmentation tool, which must be able to identify and isolate the target structure semi-automatically. In this setting, an adaptive threshold could be applied [71]. Machine learning methods, such as convoluted neural networks, in combination with radiomics, appear particularly promising, especially for the identification and segmentation of small lesions [26,72,73]. To date, no study has pursued this approach for LM.
These limitations notwithstanding, some data are encouraging. Independently of the adopted methodology, studies addressing similar questions came to similar conclusions. Analogously, radiomic parameters relevant for a given clinical situation were reproducible across studies. Different analysis techniques, such as 3D or 2D feature extraction, did not have a relevant impact on the obtained values. Likewise, using different scanning devices or switching the operator performing the analysis did not affect the information's reproducibility. Finally, even imaging modalities based on entirely different physical principles (magnetic resonance and CT) yielded similar results in some settings.
Some limitations of the present review could be argued. The study was designed according to a wide scientific question rather than according to specific PICO questions. This is a first explorative review about the role of radiomics in LM, for which available studies were expected not to provide high-level evidence. We aimed to give an overview of a cutting-edge topic. The extreme heterogeneity of imaging modalities and software packages used, of clinical scenarios (LM from different tumors, patients with/without chemotherapy, data before/during/after systemic or loco-regional therapies), and of clinical endpoints (diagnosis, prognosis, and effectiveness of treatment) precluded the possibility of performing a meta-analysis of data. We did not consider ultrasonography despite its wide diffusion for liver tumors. In fact, the operator-dependent origin of image data would have carried a relevant risk of bias. As mentioned, we did not include machine learning methods, but, to date, no study has used such combination of artificial intelligence and texture analysis for LM. In such a rapidly evolving field, these limitations should be insights for future research perspectives.

Database Search Strategy
We performed a systematic search of PubMed, Science Citation Index, Embase, and clinicaltrial.gov databases and web sources (Google Scholar) for articles relevant to radiomics of LM. The adherence of the present review to PRISMA guidelines was assessed by the PRISMA checklist (Supplementary Table S4). We decided not to formulate PICO questions because of the expected heterogeneity of studies and the low level of available evidence. The study was registered on the PROSPERO database (CRD42020193930) at the end of the analysis.
The search algorithm was constructed using the following terms: "radiomics" OR "texture analysis" OR "radiological features" OR "radiomic features" OR "textural features" AND "hepatic metastases" OR "liver metastases". Only full-text articles in English, reporting on human subjects, written and published (including those distributed as "online first") as of May the 31st, 2020, were considered. The search was then expanded by reviewing the reference list of the selected articles. Two authors (F.F. and N.G.) reviewed each manuscript and eliminated those not fitting the inclusion criteria (detailed below); in cases of discordance, a consensus was reached after discussion with an independent author (L.V.).

Study Selection and Quality Appraisal
Studies reporting feature extraction from diagnostic images in patients affected by LM were included in the present review. Studies describing analyses of purely semantic (visual qualitative) features, such as size, lobulation, spiculation, and radiological signs of vascular invasion, were not included. Papers describing computer-assisted tumor recognition, such as convoluted neural networks, were included only if at least one textural feature was used in the process. No study was excluded because of sample size (except for case reports). All tomographic radiological and nuclear medicine modalities were allowed: this included contrast-enhanced and unenhanced CT and MRI, as well as PET/CT with any tracer. The following papers were excluded:

1.
Articles not matching the field of interest of the current review.

2.
Other review articles (however, these articles were screened for references).

4.
Reports of single cases.

5.
Reports on ultrasound imaging or other operator-dependent technique. 6.
Phantom, simulation or small animal studies.
In the first step of the selection process, the article title and abstract were screened; whenever ineligible, according to the aforementioned criteria, the article was omitted. In the second step, the full text of the articles was assessed to determine paper eligibility. In the case of positive evaluation, the entire reference list was manually examined to detect other potential candidate articles, which might have been left out by the search algorithm. The quality of the included studies was assessed by using the methods-related Radiometrics Qualitative Score, as proposed by Lambin et al. [64], and the clinically-oriented TRIPOD checklist, as proposed by Park et al. [65]. The presence of relevant bias in the included studies was evaluated according to QUADAS-2 [66]. Two readers (F.F. and N.G.) evaluated the scores, with a third senior reader (L.V.) being referred to whenever a consensus was needed.

Articles and Features Classification
For each article that passed the selection process, the following data were extracted and organized in a table: basic article metrics, including name of the first author, institution of the corresponding author, journal, and year of publication. Then, information related to the study design, type of primary tumor, target of analysis (LM only or LM and healthy liver parenchyma/primary tumor), endpoints, sample size, and radiological technique were inserted. Considering the study endpoints, data about survival, response to chemotherapy, pathological details, and technical issues were collected. The term "survival" includes overall survival, progression-free survival, event-free survival, and recurrence-free survival. The radiological and the pathological assessment of response to chemotherapy were considered separately because of the discrepancy that may occur between the two [19,20]. Pathological data include tumor characteristics (e.g., grading and growth pattern), and genetic mutations. Finally, data concerning the software package used to carry out the analysis and the radiomic features extracted were collected. Textural features included descriptors of the voxel distribution curve (mean, skewness, and kurtosis), of the homogeneity of the intensity values (energy, entropy, angular second moment), of the frequency of adjacent voxels with the same values (gray level co-occurrence matrices, gray level run length matrices, and gray level size-zone matrices), and, finally, the intensity difference between voxels (neighboring gray level difference matrices). Table 3 provides an overview of the most common radiomic features.

Conclusions
A number of clinical messages can already be extrapolated. Radiomics allow non-invasive differential diagnosis of hepatic lesions. More importantly, radiomic characteristics can foretell the outcome of patients, and the therapeutic effectiveness of treatments, outperforming standard predictive and prognostic models ( Figure 3). Altogether, radiomics has the potential to offer a significant contribution to the precision medicine approach. However, interdisciplinarity, standardization, and reproducible software applications are indispensable tools for the transition of radiomics into clinical practice.