Role of MRI-Derived Radiomics Features in Determining Degree of Tumor Differentiation of Hepatocellular Carcinoma

Background: To investigate radiomics ability in predicting hepatocellular carcinoma histological degree of differentiation by using volumetric MR imaging parameters. Methods: Volumetric venous enhancement and apparent diffusion coefficient were calculated on baseline MRI of 171 lesions. Ninety-five radiomics features were extracted, then random forest classification identified the performance of the texture features in classifying tumor degree of differentiation based on their histopathological features. The Gini index was used for split criterion, and the random forest was optimized to have a minimum of nine participants per leaf node. Predictor importance was estimated based on the minimal depth of the maximal subtree. Results: Out of 95 radiomics features, four top performers were apparent diffusion coefficient (ADC) features. The mean ADC and venous enhancement map alone had an overall error rate of 39.8%. The error decreased to 32.8% with the addition of the radiomics features in the multi-class model. The area under the receiver-operator curve (AUC) improved from 75.2% to 83.2% with the addition of the radiomics features for distinguishing well- from moderately/poorly differentiated HCCs in the multi-class model. Conclusions: The addition of radiomics-based texture analysis improved classification over that of ADC or venous enhancement values alone. Radiomics help us move closer to non-invasive histologic tumor grading of HCC.


Introduction
Hepatocellular carcinoma (HCC) is among the most common causes of cancer in the world, and with an increase in its incidence, it now became the second most common cause of cancer-related mortality worldwide [1]. Despite the new surgical and chemotherapeutic techniques in treating HCC tumors, treatment outcome is still suboptimal. Several studies have shown that HCC tumors' histological grade is one of the critical factors in determining treatment outcome and patients' overall survival [2]. For this reason, determining the degree of differentiation in HCC tumors at presentation can help practitioners to make an optimal treatment strategy and predict the outcomes more accurately [3]. The characteristic features of HCC on contrast-enhanced MRI are the gold standard in HCC diagnosis [4]. Lesions are typically hypervascular in the hepatic arterial phase, with washout and rim enhancement in the venous/delayed phases [4]. Liver-specific contrast agents like gadoxetate disodium (Gd-EOB-DTPA) can be added to conventional MR imaging, which helps to better visualize liver vasculature and assess data regarding hepatocytes function. These features of Gd-EOB-DTPA have resulted in improved accuracy in HCC diagnosis, as compared to other MRI techniques and dynamic contrast-enhanced computed tomography [5]. Additionally, new advancements in imaging technology have provided opportunities to investigate the microenvironment of tumors. Methods like dynamic contrast-enhanced (DCE) MRI and diffusion-weighted imaging (DWI) facilitated the accurate examination of tumors' metabolism and proliferation [6]. DWI can provide information regarding tissue cellularity, necrosis, and cell membrane integrity [7]. On the other hand, Gadoxetic acid is a liver-specific contrast agent that has been regularly used in liver MRI scans and emerged as an important tool for HCC diagnosis [8]. Published data regarding the role of DWI and DCE-MRI in prediction of tumor grade and/or microvascular environment in HCC were inconsistent. In addition, while mean apparent diffusion coefficient (ADC) values are typically used, others have suggested the use of minimum ADC or true diffusion as possible measures. Similarly, in the use of DCE-MRI, both quantitative and semi-quantitative parameters have been explored [9][10][11][12]. New developments in the field of artificial intelligence (AI) and machine learning have made it possible to use analytical algorithms to extract a large number of features from imaging data [13]. The use of radiomics may help build more accurate and reproducible results and additionally has the potential to provide new information regarding tumors' texture and other characteristics [14]. Radiomics has been performed in a variety of imaging sequences for tumor differentiation and grading, including T1-weighted images, T2-weighted images, ADC maps, DCE maps, and T1 maps [15][16][17][18]. In this study, we aimed to use radiomics on imaging data from volumetric ADC and volumetric venous enhancement maps to predict the histopathologic degree of differentiation in HCC tumors.

Study Population
This was a monocentric, retrospective study compliant with Health Insurance Portability and Accountability Act (HIPAA) policies. Informed patient consent was waived by our Institutional Review Board (IRB). Inclusion criteria are shown in Figure 1. A total of 129 HCC patients with baseline MRI and pathologic report were identified between January 2003 and June 2017. All lesions were hypervascular on hepatic arterial phase imaging, with washout on portal venous/delayed phases and rim enhancement.

Histopathology
All sample tissues were obtained following transplantation, resection, or biopsy of the liver and stained with Hematoxylin-eosin. A pathology resident evaluated all tissue samples, and the findings were confirmed by an attending pathologist. Both pathologists

Histopathology
All sample tissues were obtained following transplantation, resection, or biopsy of the liver and stained with Hematoxylin-eosin. A pathology resident evaluated all tissue samples, and the findings were confirmed by an attending pathologist. Both pathologists were blinded to the MRI findings. Tumor differentiation was defined as well, moderately, or poorly differentiated based on Edmondson-Steiner's system [19] and using the most dominant grading among the specimens.

MRI Technique and Tumor Segmentation
Our standard protocol was used to perform MR imaging, as shown in Table 1. Tumor segmentation was performed as reported in prior studies (Appendix A). Up to 4 dominant lesions in each patient were segmented using semi-automated "Random-Walker" 3D s algorithm on portal venous phase (PVP) images by a postdoctoral fellow (-) with >2 years of experience in using the software, who was blinded to pathology results. Then target tumor, histograms, and volumetric statistics were obtained for volumetric ADC and VE (vADC and vVE) parameters [20,21].
Global features included the mean, maximum, and minimum voxel intensity for both ADC and VE maps. In addition to the 6 global features above, ADC-map derived tumor solidity, surface area, and volume were also included. The texture analysis consisted of the following histogram-based features: Variance, skewness, and kurtosis. Histogram-based features, unlike all other texture features, were spatially invariant such that the arrangement of the pixels relative to one another did not affect the analysis.
The GLCM, GLRLM, GLSZM, and NGTDM matrix features included are shown in Table 2. fellow (---) with >2 years of experience in using the software, who was blinded to pathol-ogy results. Then target tumor, histograms, and volumetric statistics were obtained for volumetric ADC and VE (vADC and vVE) parameters [20,21].
Global features included the mean, maximum, and minimum voxel intensity for both ADC and VE maps. In addition to the 6 global features above, ADC-map derived tumor solidity, surface area, and volume were also included. The texture analysis consisted of the following histogram-based features: Variance, skewness, and kurtosis. Histogrambased features, unlike all other texture features, were spatially invariant such that the arrangement of the pixels relative to one another did not affect the analysis.
The GLCM, GLRLM, GLSZM, and NGTDM matrix features included are shown in Table 2.   * GLCM elements in row (i) and column (j) represent the frequency in which a given gray level of value (i) is horizontally adjacent to gray-level (j). For the purposes of this study, these calculations were performed in vertical, horizontal, 45 • , and 135 • directions, which were then averaged together to minimize directional dependence in the samples. ** Rows (i) represent the gray-levels while the columns (j) represent the run-length, or the consecutive number of pixels with a particular gray-level. Elements within the matrix represent the frequency of pixel line segments with a run-length (j) and a gray-level (i). *** Rows (i) represent the gray-levels while the columns (j) represent the 3D zone-size, or the consecutive number of 3D zones with a particular gray-level. Elements within the matrix represent the frequency of zones with a zone-size (j) and a gray-level (i). **** these features provide a histogram of the absolute gradient values in the tissue segment. In this analysis, differences in all pixel values within a tumor segment were analyzed using a 3 × 3 neighborhood GLCM features were calculated based on a symmetric matrix with rows (i) and columns (j) ranging from 0 to Ng, such that Ng is equal to the number of gray-levels within the image.

Grey Level Discretization
To investigate the dependence of texture features on the number of gray levels (Ng), texture features were extracted with resampled Ng values of 16, 32, 64, 128, and 256. Of these, 64 was the optimal value in our analysis and provided the best classification performance.

Statistical Analysis
In our analysis, we compare the ability of radiomics to classify tumor differentiation compared to the mean ADC and venous enhancement values alone. Once the features were extracted using texture analysis, statistical machine learning was used to identify the performance of the combined set of texture features in the classification of tumors into well, moderately, and poorly differentiated classes. We used both a two-class (well vs. moderate/poor) and a multi-class (well vs. moderate vs. poor) random forest classification algorithm [22] for that purpose and then tuned the algorithm to optimize the parameters. The Gini index was used for split criterion. Parameters used in the multi-class classification were: Minimum leaf node size = 9, number of variables tried at each split = 44, and number of decision trees = 500. Predictor importance was estimated based on the minimal depth of the maximal subtree. If a predictor is influential in prediction, then the variable is likely to occur nearer to the root rather than the leaf nodes. The out-of-bag error from the random forest algorithm was used as the metric to quantify classification error and assess performance ( Figure 3). algorithm [22] for that purpose and then tuned the algorithm to optimize the parameters. The Gini index was used for split criterion. Parameters used in the multi-class classification were: Minimum leaf node size = 9, number of variables tried at each split = 44, and number of decision trees = 500. Predictor importance was estimated based on the minimal depth of the maximal subtree. If a predictor is influential in prediction, then the variable is likely to occur nearer to the root rather than the leaf nodes. The out-of-bag error from the random forest algorithm was used as the metric to quantify classification error and assess performance ( Figure 3). OOB estimates measure the prediction error of random forest models utilizing bootstrap aggregating (bagging). Bagging uses subsampling with replacement to create training samples for the model to learn from. Bootstrap aggregating allows one to define an out-of-bag estimate of the prediction performance improvement by evaluating predictions on those observations which were not used in the building of the base learner. The ability of the model to distinguish well from moderate/poorly differentiated tumors was quantified using the area under the receiver operating characteristic (ROC) curve (AUC) and misclassification error rates. OOB AUC's between the models were compared to assess if the differences in AUC were significant (pROC package). A chi-squared test was used to compare if the differences in the OOB misclassification rates were significant.
Stata software (Version 15, StataCorp, College Station, TX, USA) was used for the analysis of demographic and clinical parameters. Normality of these variables was assessed by Shapiro-Wilk test, and appropriate statistical tests were used for univariate associations. Kruskal-Wallis and Chi square tests were used for continuous and categorical OOB estimates measure the prediction error of random forest models utilizing bootstrap aggregating (bagging). Bagging uses subsampling with replacement to create training samples for the model to learn from. Bootstrap aggregating allows one to define an outof-bag estimate of the prediction performance improvement by evaluating predictions on those observations which were not used in the building of the base learner. The ability of the model to distinguish well from moderate/poorly differentiated tumors was quantified using the area under the receiver operating characteristic (ROC) curve (AUC) and misclassification error rates. OOB AUC's between the models were compared to assess if the differences in AUC were significant (pROC package). A chi-squared test was used to compare if the differences in the OOB misclassification rates were significant. Stata software (Version 15, StataCorp, College Station, TX, USA) was used for the analysis of demographic and clinical parameters. Normality of these variables was assessed by Shapiro-Wilk test, and appropriate statistical tests were used for univariate associations. Kruskal-Wallis and Chi square tests were used for continuous and categorical parameters, respectively. All p-values were considered statistically significant at p < 0.05.

Descriptive Findings
A total of 171 HCC lesions were assessed in 129 patients. The median time between MRI and biopsy was 30 days (range 12-68 days). The median time between MRI and resection/transplantation was 191 days (range 93-316 days). All demographic characteristics (Age, Sex, Race) were similar between well, moderate, and poorly differentiated groups. There was no significant difference in patients' Child scores between the three groups. There was no correlation between lesion location and its degree of differentiation. Alpha-Fetoprotein (AFP) was significantly higher in poorly differentiated HCCs as compared to well and moderately differentiated HCCs (p = 0.003). Table 3 summarizes the demographic and clinical characteristics of all patients.

Radiomics Feature Extraction
Of the total of 95 texture features that were extracted, ADC features performed better in distinguishing well differentiated from poorly differentiated HCCs. The top radiomics

Classification Ability of Radiomics-Based Model
The mean ADC and venous enhancement map alone had an out-of-bag error rate of 39.8%, the error decreased to 32.8% (p < 0.01) with the addition of the radiomics features in the multi-class model.
The AUC improved significantly from 75.2% to 83.2% (p = 0.03) with the addition of the radiomics features for distinguishing well differentiated from moderate/poorly differentiated HCCs in the multi-class model ( Figure 5).

Classification Ability of Radiomics-Based Model
The mean ADC and venous enhancement map alone had an out-of-bag error rate of 39.8%, the error decreased to 32.8% (p < 0.01) with the addition of the radiomics features in the multi-class model.
The AUC improved significantly from 75.2% to 83.2% (p = 0.03) with the addition of the radiomics features for distinguishing well differentiated from moderate/poorly differentiated HCCs in the multi-class model ( Figure 5). In the two-class problem, the addition of radiomics features decreased the overall error rate from 27.5% to 22.2% (p < 0.01) and increased the AUC from 77.9% to 81.5% (p = 0.18). (Figure 6).

Discussion
In this study, we used a radiomics framework that included texture extraction followed by statistical machine learning to analyze the role of volumetric ADC and venous enhancement maps in determining the histopathology of HCC. Using a random forest classification algorithm, we demonstrated that ADC radiomics features were among the top classifiers in variable importance ranking for classifying HCC tumors as compared to VE features. From all the features, we identified five that were superior to other features In the two-class problem, the addition of radiomics features decreased the overall error rate from 27.5% to 22.2% (p < 0.01) and increased the AUC from 77.9% to 81.5% (p = 0.18). (Figure 6). In the two-class problem, the addition of radiomics features decreased the overall error rate from 27.5% to 22.2% (p < 0.01) and increased the AUC from 77.9% to 81.5% (p = 0.18). (Figure 6).
(a) (b) Figure 6. ROC curves showing the performance of models for classifying well from moderate/poorly differentiated tumors without (a) and with (b) radiomics-derived texture features in the two-class model.

Discussion
In this study, we used a radiomics framework that included texture extraction followed by statistical machine learning to analyze the role of volumetric ADC and venous enhancement maps in determining the histopathology of HCC. Using a random forest classification algorithm, we demonstrated that ADC radiomics features were among the top classifiers in variable importance ranking for classifying HCC tumors as compared to VE features. From all the features, we identified five that were superior to other features

Discussion
In this study, we used a radiomics framework that included texture extraction followed by statistical machine learning to analyze the role of volumetric ADC and venous enhancement maps in determining the histopathology of HCC. Using a random forest classification algorithm, we demonstrated that ADC radiomics features were among the top classifiers in variable importance ranking for classifying HCC tumors as compared to VE features. From all the features, we identified five that were superior to other features in Diagnostics 2022, 12, 2386 9 of 13 tumor classification. The addition of radiomics textures improved classification compared to ADC or venous enhancement values alone, indicating a potential role for radiomics in non-invasive histologic grading of HCC tumors.
To date, despite the major improvement in surgical and chemotherapy techniques, survival of HCC patients remains poor. High recurrence rate is among the main reasons for poor survival in treated patients [23]. Poorly differentiated HCCs have been reported to have higher recurrence rates as compared to well and moderately differentiated tumors and with worse overall survival [2]. Poorly differentiated HCCs need more extensive resection in order to reduce the recurrence rate after surgery [24]. Additionally, more frequent followups might be needed in poorly differentiated HCCs for early detection of recurrence and metastasis [25]. Fine-needle aspiration (FNA) biopsy is the pre-operative gold standard in determining histopathological grading of HCC. However, it is an expensive and invasive method with an increased risk of adverse outcomes, including bleeding, tumor seeding, and sampling errors [26]. Due to the heterogeneous nature of HCC, tissue samples provided by biopsy might not be a good representative of the entire tumor. Therefore, identifying a non-invasive and accurate method to assess tissue characteristics of tumors can be of great importance.
The radiomics' potential role has been studied for predicting treatment response, recurrence, and overall survival in HCC patients [27,28]. Additionally, several studies have exploited radiomics on pre-contrast T1-W, post-contrast T1-W, T2W MRI, and also contrast-enhanced computed tomography (CE-CT) imaging to distinguish between well differentiated and moderately/poorly differentiated HCC [18,29,30]. Zhang et al. [31] used DWI radiomics features in combination with T1-and T2-weighted imaging to predict microvascular invasion (MVI) in HCC. Their results showed that radiomics features can classify MVI HCCs and non-MVI HCCs with an accuracy of 78% in the training cohort and 82% in the validation cohort.
The results of a study by Hectors et al. showed no association between ADC radiomics features and degree of tumor differentiation of HCCs [32]. In their study, they placed an ROI on the HCC tumor on a single slice of the ADC map. Therefore, the results might not be representative of the whole tumor.
Another study done by Zhou et al. used a convolutional neural network to extract deep features from log maps resulting from three-b-value images of DWI. Their results showed higher performance of their model for HCC grading as compared to ADC maps [26]. Gadolinium-based contrast agents have been used previously in differentiating HCC from benign lesions [33], and studies showed inconclusive results regarding their role in distinguishing HCC histopathological grading [34,35]. With the use of machine learning, several studies exploited contrast-enhanced imaging in predicting early and late recurrence in HCC patients [36,37].
We exploited random forest (RF) classifiers in our study as the machine learning algorithm of choice. RFs have several inherent advantages over other classifiers like statistical logistic regression techniques that are routinely used: (1) They deal well with non-linear associations as the tree method identifies several cut-points during branching, (2) the variables do not have to be normally distributed, (3) the algorithm provides added robustness to prevent overfitting by randomly choosing a subset of variables at each node and also choosing a subset of patient data for each tree, and (4) the algorithm also provides the importance of each variable used.
There were some limitations to our study. First, this study was retrospective, and patients' data were recorded for several years. The variation in study setting over time is an inherent limitation to all retrospective studies. This limitation was minimized by adhering to consistent protocols in our institute. The other limitation is the use of either biopsy or the entire tumor examination for histopathology grading. This could have potentially affected our findings as the accuracy of biopsy might be lower than tumor excision. However, we performed subgroup analysis in these two groups and also adjusted our final multivariable models for the sampling method, which demonstrated consistent results. Future prospective studies with a better control of confounders can determine other predictors of tumor grading and patients' survival.

Appendix A.2. Segmentation
From all sequences of contrast-enhanced MR imaging, DWI, ADC maps, pre-contrast T1-weighted images, and portal venous phase (PVP) images (T1-weighted images at 70 s) were selected and retrieved as digital imaging and communications in medicine (DICOM) format from our picture archiving and communicating system (PACS, Carestream, Rochester, NY, USA) and anonymized.
Primarily, the pre-contrast (P) and portal venous phase (PVP) images were transferred to a prototype software (MR Arithmetics, Siemens Healthineers, Erlangen, Germany). Then, P images were aligned to PVP images using a non-rigid (deformable) 3D registration method to improve the capture range and accuracy. Then the portal venous enhancement map was calculated as VE = (PVP − P)/P × 100 and exported in DICOM format. These results were reported as relative intensity ratio (RIR), which showed the increase in signal intensity during the portal phase relative to the pre-contrast phase and are an estimate of accumulated contrast agent in tumor cells.
Then the PVP image, the previously computed VE map, the lowest acquired b-value (b = 50 s/mm 2 ), and the ADC map were loaded into a prototype software (MR Multiparametric Analysis, Siemens Healthineers, Erlangen, Germany) for segmentation. The low-b-value images with high anatomic detail were elastically co-registered to the PVP images to ensure accurate anatomic alignment of images, and the resulting transformation was also applied to the ADC map, which was calculated based on both b-values [39,40].
Up to four dominant lesions in each patient were selected. A semi-automated "Random-Walker" 3D segmentation algorithm was used to segment the entire lesion of interest on PVP images [20] by a postdoctoral fellow (-) with >2 years of experience in using the software and reading abdominal MRI, who was blinded to pathology results. After semi-automated segmentation of the target tumor, the prototype software automatically reconstructed the tumor in three dimensions and generated the histograms and volumetric statistics as volumetric ADC and VE (vADC and vVE) parameters [21].