MRI-Based Radiomics Models to Discriminate Hepatocellular Carcinoma and Non-Hepatocellular Carcinoma in LR-M According to LI-RADS Version 2018

Differentiating hepatocellular carcinoma (HCC) from other primary liver malignancies in the Liver Imaging Reporting and Data System (LI-RADS) M (LR-M) tumours noninvasively is critical for patient treatment options, but visual evaluation based on medical images is a very challenging task. This study aimed to evaluate whether magnetic resonance imaging (MRI) models based on radiomics features could further improve the ability to classify LR-M tumour subtypes. A total of 102 liver tumours were defined as LR-M by two radiologists based on LI-RADS and were confirmed to be HCC (n = 31) and non-HCC (n = 71) by surgery. A radiomics signature was constructed based on reproducible features using the max-relevance and min-redundancy (mRMR) and least absolute shrinkage and selection operator (LASSO) logistic regression algorithms with tenfold cross-validation. Logistic regression modelling was applied to establish different models based on T2-weighted imaging (T2WI), arterial phase (AP), portal vein phase (PVP), and combined models. These models were verified independently in the validation cohort. The area under the curve (AUC) of the models based on T2WI, AP, PVP, T2WI + AP, T2WI + PVP, AP + PVP, and T2WI + AP + PVP were 0.768, 0.838, 0.778, 0.880, 0.818, 0.832, and 0.884, respectively. The combined model based on T2WI + AP + PVP showed the best performance in the training cohort and validation cohort. The discrimination efficiency of each radiomics model was significantly better than that of junior radiologists’ visual assessment (p < 0.05; Delong). Therefore, the MRI-based radiomics models had a good ability to discriminate between HCC and non-HCC in LR-M tumours, providing more options to improve the accuracy of LI-RADS classification.


Introduction
The Liver Imaging Reporting and Data System (LI-RADS) was developed by the American College of Radiology to standardise the interpretation and reporting of imaging for hepatocellular carcinoma (HCC). According to the possibility of liver lesions from definitely benign to definitely HCC, LI-RADS provides 5 categories from LR-1 to LR-5, which play crucial roles in guiding diagnosis and clinical treatment [1]. Previous studies have suggested that the LR-5 class was associated with unfavourable pathological features of resected HCC [2], and confirmed the potential prognostic role of LI-RADS classification, supporting hepatectomy especially for the LR-5 subclass [3]. In the field of liver transplant, although no significant differences were observed between LR-4 and LR-5 HCC probability when eligible for this study, two radiologists evaluated liver tumours that confirmed the LR-M MRI diagnostic criteria from picture archiving and communication systems according to LI-RADS v2018 from January 2011 to January 2020. The inclusion criteria were as follows: (1) patients with chronic hepatitis B virus infection via laboratory tests; (2) liver cirrhosis confirmed by pathological examination via liver biopsy or surgery; and (3) pathological results of the tumours obtained through puncture or surgery within one month after MR examination. Patients with liver cirrhosis younger than 18 years, congenital or vascularrelated cirrhosis, and who received any treatment for liver tumours were excluded. For patients with multiple lesions, the target lesions matched the surgically resected lesions, and the pathological results were selected for analysis. Any discrepancies in the results were resolved by consensus between the two observers. Finally, a total of 90 patients with 102 tumours were included in this study ( Figure 1).

Image Analysis
Image analysis was performed independently by two abdominal radiologists (Y.Y.L. and H.P.Z., with 8 and 12 years of experience in abdominal MRI, respectively), blinded to the patient's clinical history and pathological diagnosis. Before this analysis, two observers reached a specific agreement for each LR-M feature defined by LI-RADS v2018 and conducted exercises with several cases not included in the study. According to LI-RADS v2018, the LR-M criteria included targetoid or non-targetoid masses. The imaging manifestations of targetoid masses were defined as follows: targetoid dynamic enhancement (including rim arterial phase hyperenhancement (APHE), peripheral "washout," and delayed central enhancement) and targetoid appearance on DWI or hepatobiliary phase (HBP) (including targetoid restriction and targetoid HBP appearance). Nontargetoid masses were defined as tumours with one or more of the following characteristics: infiltrative appearance, marked diffusion restriction, necrosis, or severe ischaemia, liver surface retraction, and adjacent biliary obstruction. The MR scanning programme of our institution and the DWI and Gd-EOB-DTPA protocols were unnecessary, so the targetoid mass evaluated by the observers mainly included rim APHE, peripheral "washout," and delayed central enhancement. During the image evaluation, any discrepancies in the results were resolved by the consensus of the two observers.
Visual evaluation ability was tested by another pair of junior and senior abdominal radiologists (X.F.Q. and X.J.H., with 4 and 15 years of experience in abdominal MRI, respectively), who were responsible for marking HCC and non-HCC in selected cases without knowing the pathological diagnosis. Because there are no specific radiological diagnostic criteria to provide reference for the diagnosis of atypical enhanced HCC, two radiologists can only distinguish HCC and non-HCC in LR-M based on their own clinical experience.

Histopathologic Evaluation
All selected lesions obtained histopathological information through surgical resection or puncture, which was confirmed by pathologists with more than 5 years of experience in pathology. For patients with multiple lesions, we used the Couinaud Liver Segmentation method to locate the lesions to ensure that the obtained pathological information matched the lesions.

Image Segmentation and Radiomics Feature Extraction
Radiomics analysis was performed on the axial T2WI, AP, and PVP images. The grey-level standardisation of all the images was executed before the data were downloaded from the picture archiving and communication systems. In-house software (Artificial Intelligence Kit, v3.2.2; GE Healthcare) was used for image registration, and all the voxels were resampled to a uniform pixel size of 1 × 1 × 1 mm 3 . Subsequently, a free open source software package (ITK-SNAP, Version 3.6.0) was used to segment the region of interest (ROI) of the tumour layer-by-layer and automatically merge each layer of ROI into a volume of interest (VOI). The extent of ROIs included the boundary of the tumours as much as possible but did not involve the adjacent background liver tissue. The ROIs of all the tumours were delineated independently by two radiologists (Z.F. and J.Z., with 16 and 9 years of experience in MRI, respectively), and the interobserver reproducibility was evaluated by calculating the intraclass correlation coefficient (ICC).
The radiomics feature extraction process followed the image biomarker standardisation initiative (IBSI) [18]. First, the VOIs of T2WI, AP, and PVP were imported into AK software in batches, and then automatically extracted a total of 851 features in each phase, including the first-order features, shape factors, the grey-level cooccurrence matrix (GLCM), run-length matrix (RLM), grey-level size zone matrix (GLSZM), neighbourhood grey-tone difference matrix (NGTDM), and transform features (including wavelet features). The complete delineation of the workflow is shown in Figure 2. Schematic diagram of the processing and analysis flowchart. ROIs were manually delineated over the whole tumour layer by layer on T2WI, AP, and PVP images and automatically merged into a VOI. Radiomics features were extracted automatically using Artificial Intelligence Kit software. Standardisation, max-relevance, mRMR, and LASSO analyses were used to reduce the redundancy or selection bias of the features. Finally, different models were constructed and verified. MRI, magnetic resonance imaging; ROI, region of interest; VOI, volume of interest; GLCM, grey-level cooccurrence matrix; RLM, run-length matrix; mRMR-40, max-relevance and min-redundancy; LASSO, least absolute shrinkage and selection operator.

Radiomics Features Analysis
All included tumours were randomly divided into a training cohort (n = 72) and a validation cohort (n = 30) according to a 7:3 ratio for modelling and verification. The feature dimensionality reduction and selection were fulfilled as follows. First, the outlier values were replaced by the median value of the particular variance vector once the values were beyond the range of the mean and standard deviation. Standardisation was performed to normalise the data in a specific interval. Second, the max-relevance and min-redundancy algorithm (mRMR) was used to remove the redundant features. Finally, the least absolute shrinkage and selection operator (LASSO) analysis was performed to reduce the redundancy of the features. The regularisation parameter (λ) of LASSO was used to perform 10-fold cross validation and select features with nonzero coefficients (Figure 3). The radiomics score (rad-score) of each tumour was calculated through a linear combination of the valuable features multiplied by their respective coefficients.

Model Construction and Validation
Stepwise logistic regression analysis was used to construct a radiomics model to identify HCC and non-HCC and included T2WI, AP, PVP, and the corresponding combined models. The model was further validated in the validation cohort. Simultaneously, receiver operating characteristic (ROC) curves were generated to evaluate the performance of two radiologists (junior and senior radiologists) and various targetoid masses (rim APHE, peripheral "washout," and delayed central enhancement) in distinguishing HCC and non-HCC. The discriminative performance of the radiologists, targetoid masses, and different models was compared using the DeLong test.

Statistical Analysis
The ICC was first used to test the consistency of the radiomics features extracted between operators. Cohen's k statistic was used to assess the consistency of the imaging features and LI-RADS category between two observers. The Kolmogorov-Smirnov test was used to test the normal distribution of continuous variables. A two-sample t-test was performed for normally distributed data, and the Mann-Whitney U test was used for nonnormally distributed data. The identification performance of the radiomics model was quantified by the area under the ROC curve (AUC) and the Delong test. All statistical analyses were performed using R software (version 3.6.0; http://www.Rproject.org). A two-tailed p < 0.05 indicated a statistically significant difference.

Tumour Characteristics
A total of 90 patients (average age, 54.0 ± 11.5 years) with 102 tumours were enrolled in this study and included 71 tumours in the non-HCC group (including 53 ICCAs, 1 CHC, 14 metastases, and 3 carcinosarcomas) and 31 tumours in the HCC group (including 8 well differentiated HCCs, 19 moderately differentiated HCCs, and 4 poorly differentiated HCCs). All tumours were randomly split into training (n = 72) and test cohorts (n = 30) at a ratio of 7:3, and the positive rates of HCC were 30.6% (22/72) and 30% (9/30) in the training and validation cohorts, respectively. No statistically significant differences were found in the clinical characteristics or laboratory indicators between the groups (p > 0.05) ( Table 1).

Feature Selection and Radiomics Signature Construction
A total of 851 features were extracted from T2WI, AP, and PVP, and the features with coefficients >0.8 were selected by the ICC test. Finally, 761, 777, and 785 features from T2WI, AP, and PVP were entered into the subsequent analysis. After data standardisation and dimensionality reduction by max-relevance, mRMR, LASSO, and logistic regression analysis, 3 features in T2WI, 7 features in AP, 4 features in PVP, 5 features in T2 + AP, 4 features in T2 + PVP, 3 features in AP + PVP, and 7 features in T2 + AP + PVP were used to establish the radiomics model (Table 2).

Performance of Radiomics Models and Verification
Seven models were established: T2WI, AP, PVP, T2WI + AP, T2WI + PVP, AP + PVP, and T2WI + AP + PVP. The ROC curves of each radiomics model in the training cohort and validation cohort are shown in Figure 4A,B. Among these models, the combined model based on T2WI + AP + PVP showed the best performance. No difference was detected between the training cohort and validation cohort (p > 0.05; Delong test) ( Table 3). Each tumour's rad-score in the training cohort and validation cohort is shown in Figure 5.

Classification Performance Verification of Visual Evaluation and Imaging Features
The consistency of the two observers marking rim APHE, peripheral washout, delay central enhancement, and LI-RADS category was substantial, and the Cohen's kappa coefficients were 0.76, 0.71, 0.74, and 0.81, respectively.
The ROC curves of the junior radiologist, senior radiologist, and each targetoid mass are shown in Figure 6A,B. Delong analysis indicated that the ability of senior physicians to judge HCC and non-HCC in LR-M tumours was significantly higher than that of junior physicians (p = 0.007; Delong test). The single and multiple targetoid masses showed moderate discrimination, and no significant differences were found between them (p > 0.05 Delong test) ( Table 4). The ability of junior radiologists and targetoid masses to distinguish HCC and non-HCC was significantly lower than that of various radiomics models (p < 0.05 Delong test), and only the ability of senior radiologists was equivalent to that of radiomics models (p > 0.05 Delong test).

Discussion
In the present study, we developed and validated radiomics models based on MRI to distinguish HCC from LR-M tumours. Our study showed that the radiomics model based on T2WI, AP, and PVP images achieved good results in the training and validation cohorts. All seven models achieved encouraging discrimination performance; among them, the combined model based on T2WI + AP + PVP showed the best discrimination performance. Simultaneously, the identification performance of the radiomics model was better than that of junior radiologists' visual assessment and pure LR-M tumour imaging feature evaluation.
In contrast to other reporting systems, LR-M, as a special type of LI-RADS, is defined as a malignant tumour other than HCC. However, in actual clinical work, many HCC cases are included in LR-M [19,20], challenging clinicians to formulate treatment strategies. Considering that HCC and non-HCC are significantly different in treatment and prognosis, distinguishing them effectively in LR-M tumours is clinically significant. Relying on the knowledge and experience of radiologists to interpret MRI images is the traditional way to solve this puzzle. In our study, the junior radiologist and senior radiologist showed different abilities in distinguishing the subtypes of LR-M tumours (p = 0.007; Delong test). A senior radiologist could better distinguish HCC in LR-M tumours (AUC = 0.799) but still divided 9 of 31 HCCs into a non-HCC group, while a junior radiologist divided 14 HCCs into a non-HCC group. This result is disappointing and will likely lead to a more perplexing follow-up diagnosis and treatment.
Among LR-M tumours, HCC and ICCA account for 36% and 30%, respectively [6,21,22]. CHC, metastases, and sarcomas only account for a small proportion of LR-M tumours. However, these tumours have become a risk factor affecting the visual evaluation diagnosis [23]. Our study also confirmed that distinguishing HCC and non-HCC in LR-M tumours by relying solely on the targetoid mass with visual inspection is challenging. When the tumour had only a single targetoid mass, its ability to discriminate between HCC and non-HCC was general (AUC: 0.685-0.723). Even if the tumour had three targetoid masses concurrently, its AUC to distinguish HCC and non-HCC was also only 0.754. The imaging features of LR-M tumour subtypes overlap, making visual evaluation limited. Additionally, the lack of DWI and HBP images may explain the unsatisfactory results of the present study. Perhaps the dual evaluation strategy of junior and senior radiologists can improve this dilemma, and further research is required.
Radiomics, as an emerging discipline that has emerged in the context of big data, has the characteristics of stable calculation, high reproducibility, and freedom from human subjective initiative interference [14,24]. Regarding liver tumour research, radiomics has provided encouraging results in identifying benign and malignant liver tumours [25], predicting the recurrence of HCC after surgical resection [26] and prognosis after transcatheter arterial chemoembolisation [27], and predicting HCC histological grade [28] and microvascular invasion (MVI) [29,30]. Currently, most radiomics studies on LI-RADS have focused on classification and diagnosis [17,31], and their application in LR-M tumours, particularly the identification of LR-M subtypes, has not yet been reported. Theoretically, although HCC and non-HCC in LR-M have similar imaging manifestations, differences exist in their cell origin, spatial arrangement and distribution of tissue cells, vascular heterogeneity, and other tumour characteristics. These differences cannot be distinguished by visual inspection, but radiomics is a promising approach. In our study, both models based on AP, PVP, and T2WI, or the corresponding combined models, showed good discrimination ability (AUC: 0.768-0.884), which was confirmed in the validation cohort. The discrimination ability of each radiomics model was proved to be significantly better than that of junior radiologists, and was equivalent to senior radiologists, indicating that radiomics can not only provide a better identification method for junior radiologists in the differential diagnosis of HCC and non-HCC in LR-M tumours, but also serve as an important reference method for senior radiologists. Additionally, among the three models of AP, PVP, and T2WI, the AP model was better than the other two models, while a previous study demonstrated that the PVP model showed better performance [27]. This difference may be due to the different research subjects. Their model was mainly established for HCC. However, in the present study, most of the tumours were ICCA, which was only enhanced in the marginal tumour in AP, a finding that was significantly different from the obvious enhancement of typical HCC.
The first-order features describe the distribution of voxel intensities in ROI through the commonly used fundamental matrix but do not involve spatial information. In our study, the first-order feature minimum is the original image-based feature retained by all the other six models except model1 after dimensionality reduction. The origi-nal_firstorder_Minimum represents the minimum grayscale intensity of the original image in the ROI region, and the subtle differences between them cannot be recognised by visual evaluation, but the radiomics method can. The radiomics models assembled based on the original_firstorder_Minimum extracted from MR images have been proven to be effective in predicting the efficacy of chemoradiotherapy for advanced cervical cancer and the pathological features of rectal cancer [32,33]. In addition, Skewness (Model 4, Model 7) and Idmn (Model 2, Model 4) in the original features were also the key features of participating in model construction. Skewness, as a first-order feature, reflects the asymmetry of image grayscale value relative to the mean value, which can describe the shape of the histogram and the "procrastination" direction of the tail. In the application field of liver tumours, a radiomics model including original_firstorder_Skewness and other features showed good predictive efficiency in predicting the MVI of HCC (AUC:0.858) [34]. Idmn is a second-order texture feature based on GLCM, which describes the inverse difference moment normalised of image grayscale and reflects the homogeneity of image texture; the larger its value is, the smaller the change between different regions of image texture. Huang et al. [35] showed that MRI-derived GLCM_Idmn was among the key features in the predictive models for recurrence-free survival of breast cancer, but the application of this feature in liver tumours needs more studies to confirm. The shape feature Flatness was also found to be an important feature in the construction of Model 7 in our study, which was currently only applied in a few CT-based radiomics models [36]; we hope to verify it in subsequent research based on MR images.
The wavelet transform uses the wavelet function to decompose the original image to obtain wavelet-based features. The features after the wavelet transform often carry more tumour information, which can more accurately reflect the heterogeneity of tumours [37]. In this study, the features obtained after dimensionality reduction were mostly wavelet features. After wavelet transform, the first-order features (Mean, 10Percentile, and Kurtosis) were important constituent features in our model, which were consistent with the features extracted by the previous model for discriminating benign and malignant prostate lesions [38]. The wavelet transformed first-order feature skewness and GLCM feature Idmn were still retained in several models after feature screening, which proved again that the distribution of image grayscale of HCC and non-HCC in LR-M was asymmetric relative to the average value, and the variation and uniformity of image texture in tumour region were inconsistent. The GLCM feature Idn, as another metric feature reflecting the uniformity of the image, standardises the difference by dividing the sum of adjacent intensity values by the total discrete intensity values. A predictive model constructed with wavelet-transformed GLCM_Idn as one of 14 radiomics features proved to be reliable in predicting which patients would develop extrahepatic spread or vascular invasion following initial TACE monotherapy (the AUC of the training cohort and validation cohort were 0.911 and 0.847, respectively) [39]. In addition, in our study, the wavelet transformed first-order feature (RobustMeanAbsoluteDeviation), GLCM features (Imc1, MCC, and DependenceVariance), NGTDM feature (Strength) and GLSZM feature (LargeAreaLowGrayLevelEmphasis) participated in the establishment of some models, indicating that these features probably have great medical value in discriminating HCC and non-HCC in LR-M tumours. However, few studies have explained the correlation between these features and tumour pathophysiology, which is still a very challenging task at this stage.
Although this study was novel, it also had some limitations. First, this study was a single-centre retrospective study, selection bias was inevitable, and external verification was lacking; further validation is required in multicentre research with larger samples. Second, DWI and HBP were not necessary in our institution's MR scanning programme, and the corresponding targetoid masses, such as targetoid restriction and targetoid HBP appearance, could not be included in the study. Third, the sample size of this study was small, and the data distribution between the HCC and non-HCC groups was unbalanced. Prospective studies that recruit more patients will help verify and improve the practicality of the model. Finally, HCC in LR-M was not a typical HCC enhancement method, and we could not use the known HCC diagnostic criteria to determine HCC. Therefore, during visual evaluation, two radiologists could only distinguish between HCC and non-HCC according to their own clinical experience, leading to the influence of subjective consciousness on the results of visual evaluation.

Conclusions
This study provides radiomics models based on AP, PVP, and T2WI for the noninvasive evaluation of HCC and non-HCC in LR-M tumours and verifies that the radiomics methods are superior to junior radiologists' visual assessment. Thus, more reference methods are provided to classify HCC and non-HCC in LR-M tumours, and a favourable guarantee for junior radiologists is offered to preoperatively diagnose the subtypes of LR-M tumours.  Informed Consent Statement: This retrospective study was approved by our institutional review board, which waived the requirement for patients' informed consent.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.