Machine Learning Pipeline for the Automated Prediction of MicrovascularInvasion in HepatocellularCarcinomas

: Background: Microvascular invasion (MVI) is a necessary step in the metastatic evolution of hepatocellular carcinoma liver tumors. Predicting the onset of MVI in the initial stages of the tumors could improve patient survival and the quality of life. In this study, the possibility of using radiomic features to predict the presence/absence of MVI was evaluated. Methods: Multiphase contrast-enhanced computed tomography (CECT) images were collected from 49 patients, and the radiomic features were extracted from the tumor region and the zone of transition. The most-relevant features were selected; the dataset was balanced, and the presence/absence of MVI was classiﬁed. The dataset was split into training and test sets in three ways using cross-validation: the ﬁrst applied feature selection and dataset balancing outside cross-validation; the second applied dataset balancing outside and feature selection inside; the third applied the entire pipeline inside the cross-validation procedure. Results: The features from the tumor areas on CECT showed both the portal and the arterial phases to be the most predictive. The three pipelines showed receiver operating characteristic area under the curve (ROC AUC) scores of 0.89, 0.84, and 0.61, respectively. Conclusions: The results obtained conﬁrmed the efﬁciency of multiphase CECT and the ZOT in detecting MVI. The results showed a signiﬁcant difference in the performance of the three pipelines, highlighting that a non-rigorous pipeline design could lead to model performance and generalization capabilities that are too optimistic.


Introduction
Hepatocellular carcinoma (HCC) is the most-common primary hepatic malignant tumor and represents an important global health problem, currently being the third-leading cause of cancer-related deaths in the general population and the first among cirrhotic patients [1].Furthermore, its incidence has continuously increased and since 1980, and the number of cases has more than tripled.There are several treatment options for HCC; nevertheless, those that are associated with the highest 5-year survival rate, such as ablation, surgical resection, and liver transplantation (LT), can only be applied in the very early and the early stages of the disease [2], accounting for fewer than 20% of cases at presentation [3].Unfortunately, all these potentially curative treatments are burdened by a high rate of recurrence [4], which negatively impacts overall survival [5,6].For example, the cumulative tumor recurrence rates of HCCs  3 cm at three years after ablation in patients in the low-, middle-, and high-risk groups were 68.2%, 100%, and 100%, respectively [7].Five-year HCC recurrence complicates 35% of cases after LT and 70% of cases after hepatic resection [8][9][10][11].Vascular invasion, both macroinvasion and microvascular invasion (MVI), has been universally recognized as a predictor of recurrence and poor overall survival after treatment for HCC [4,12].For example, vascular invasion is currently considered to be more informative than the Milan Criteria in predicting the prognosis [5,6,13].
The pre-treatment clinical significance of the two forms of vascular infiltration is clearly different.Macrovascular infiltration can be correctly diagnosed by imaging in the pretreatment phase, thus allowing correctly classifying those patients in the advanced stage and directing them to the best treatment (systemic therapy) [2].Conversely, the presence of MVI can be confirmed only post-operatively by histopathological analysis [13][14][15].Therefore, its utility in the pre-treatment selection of patients to undergo the different types of treatment is not possible since, even with state-of-the-art treatment, its diagnosis still remains postprocedural.Moreover, in patients having the same Barcelona clinical liver cancer stage, such as patients in the very early or the early stages, who are fit for surgery or ablation and have the same tumor burden, it is important to differentiate those with the MVI of HCC in order to choose the best treatment for each patient.For example, since MVI has been proven to be one of the most-relevant prognostic factors for HCC recurrence after surgical treatment [16,17], in recent years, much effort has been made to identify reliable imaging findings in order to reach an accurate pre-procedural diagnosis of MVI in HCC patients [16,18,19].Unfortunately, to date, these criteria for a preoperative radiological diagnosis of MVI in HCC have not been widely recognized and have not been reported in the current guidelines for the management of HCC [2].In fact, despite its clinical relevance, the detection of MVI represents one of the most-difficult tasks to achieve in conventional imaging analysis since it can be obtained only indirectly, in relation to capsule disruption, irregular tumor margins, peritumoral enhancement, etc. [13].
A possible explanation for failing to evolve towards the pre-treatment radiological diagnosis of MVI is that it is merely based on direct observation by radiologists who rely on contrast differences between adjacent tissues; however, these fade in the peritumor, where the MVI is supposed to most frequently occur [17].In addition, basic knowledge, diagnostic experience, and work status are other factors that might potentially further limit the accuracy of diagnosing MVI.
However, radiological images contain more information than what is visible to the radiologist's eye.Texture and, more in general, radiomic analysis of contrast-enhanced CT (CECT) scans have already been applied by several authors in many medical image pipelines [18][19][20][21][22][23], proving their efficiency as a valuable tool in aiding the clinical diagnosis [6] and in predicting patient prognosis [5,24].
Radiomics is a form of imaging analysis that uses a series of data mining and machine learning (ML) algorithms to extract high-throughput information for the quantification of the predictive and prognostic features associated with clinical outcomes [18,25].Therefore, the prediction of MVI on CT scans using radiomic analysis is considered to be an important research topic, potentially being able to overcome the current radiologically based imaging diagnosis in favor of the pre-treatment diagnosis of MVI in HCC.In particular, various papers [18,26] have focused their attention on the peripheral area of HCC called the zone of transition (ZOT), in which the CECT attenuation is lower than the attenuation of the solid center of the tumor, but is still higher than the attenuation of the surrounding non-tumoral tissue.The use of the ZOT in HCC radiomic analysis is motivated by the potential presence of MVI in the penumbra zone [18,26].
The present study proposed radiomic feature extraction and ML pipelines to predict MVI in HCC tumor volumes of interest (VOIs).A set of radiomic features was extracted from the tumor VOI and the ZOT; three different pipelines were used to investigate the problem of data contamination.In the first pipeline, data pre-processing was carried out, namely data balancing and feature selection before the training-test split, i.e., on the entire dataset at the beginning of the analysis.In the second pipeline, the feature selection step was moved inside the cross-validation procedure and the dataset balancing outside, i.e., performing part of the pre-processing on the entire dataset and the other part on the training set only.In the last pipeline, the pre-processing step was moved inside the cross-validation procedure, i.e., after the data splitting into the training and test sets [27].The three pipelines were applied to the radiomic features extracted from the CECT data, monitoring the difference in model performance.The same pipelines were also applied to a synthetic radiomic dataset in order to additionally validate the authors' hypothesis.In this way, it was verified that the performance observed depended only on the design of the pipeline and not on specific dataset characteristics.
In the radiomic literature, it is possible to find several examples in which the analysis has been carried in a way that introduces the cross-contamination between the training and test datasets.The authors felt that this was a critical point in radiomic research, which should be addressed in order to produce more reliable and reproducible results.For this reason, in the present analysis, the importance of considering the data cross-contamination was stressed.

Patient Selection
This study involved 49 patients selected from the Department of Experimental, Diagnostic and Speciality Medicine (DIMES) of the IRCCS Policlinic Sant'Orsola Malpighi Hospital.This single-center retrospective study was approved by the Institutional Review Board (No.197/2020/Oss/AOUBo), and the requirement for informed consent was waived.All procedures involving human participants were performed in accordance with the 1975 Declaration of Helsinki and its subsequent amendments.
The patients included met the following criteria: The CECT images were acquired using the following parameters: two different Multi-Slice CTs (64 slices, GE VCT or PHILIPS Ingenuity), with keV range 100-120, and tube current modulation with a low-quality index to optimize patient dose (slice thickness range 1-2 mm); the images were reconstructed with high-resolution kernels.The same dataset was already introduced in the study of Tovoli et al. [28].
Of the patients selected, there were 38 males (77.6%), having an age distribution of 44/65/83 years (min/mean/max).The ages reported were those at the time of the CT scan and radiological analysis.Six radiologists manually identified and labeled the nodules using the ImageJ software [29] (accessed on 10 January 2021).An expert radiologist with more than ten years of experience validated the annotations.
The dataset used in this study was a subset of the data already presented by Renzulli et al. [20].

Detection of the Zone of Transition
The identification of the ZOT allowed including the peritumoral area and overcoming the limits of human segmentation.Currently, there is no standard definition of the ZOT; many studies have used different approaches for its detection [18,20].Three-dimensional morphological erosions and dilations with a circular kernel of size 3 ⇥ 3 ⇥ 3 were applied to the tumor volumes.The difference between the dilated and the eroded VOIs was defined as the ZOT.
Management and analyses of the CT scans were carried out using the SimpleITK [30] python package (data accessed 14 July 2022).All the analyses were carried out using a 64-bit workstation machine (64 GB RAM memory and 1 CPU i9-9900K Intel, with 8 cores, and a GeForce RTX 2070 SUPER NVIDIA GPU).From a computational point of view, the extraction of a single ZOT VOI required less than 1 min on the authors' server-grade machine.

Feature Extraction
A set of radiomic features was extracted from the manually identified VOIs and the ZOT derived.The radiomic features extracted consisted of the 1st-order statistics, 3D shapebased scores, 3D gray-level co-occurrence matrix statistics, gray-level run length matrix statistics, gray-level size zone matrix, neighboring gray-tone difference matrix statistics, and gray-level dependence matrix statistics.The whole set of features was extracted from the original images and from the transformed images obtained by the application of wavelet (W) and Laplacian of Gaussian (LoG) transformations.The same set of 1130 features was extracted from the tumor VOIs and the tumor ZOT on both the arterial and the portal phases.A set of 4520 features was collected for each nodule.
We used the Pyradiomics [31] python package for the extraction of the radiomic features (data accessed 14 July 2022).From a computational point of view, the extraction of radiomic features required fewer than 10 min on the authors' server-grade machine.

Pipeline Overview
The MVI prediction can be considered to be a classification problem in which the two classes are represented by MVI+ and MVI .
Each radiomic feature was scaled according to the median and the interquartile range of its distribution of values.Ridge regression was applied by selecting the 15 most-relevant features according to the absolute values of their coefficient; the minority class (MVI+) was then oversampled using the synthetic minority oversampling technique (SMOTE) [32], with the aim of balancing the dataset classes.Finally, a classification was established using support vector classification (SVC) with a linear kernel on the reduced and balanced dataset.The choice of the SVC classifier model was determined by analyses carried out during the preliminary phase of this study.For the sake of brevity, a comparison with other classifier models was excluded from this study with the aim of focusing the attention on the comparison of the pipelines developed.Due to the limited number of samples, the prediction and evaluation were carried out using the leave-one-out (LOO) cross-validation technique, and the prediction results were used to evaluate the model performance.
Three versions of this pipeline were introduced.In the first one (Pipeline 1), the pre-processing was applied outside the cross-validation, i.e., on the entire dataset.In the second one (Pipeline 2), the pre-processing step order was changed.The dataset balancing was applied to the entire dataset, namely outside the cross-validation, followed by the feature selection and classification applied after each splitting, i.e., only on the training data.In the third one (Pipeline 3), the pre-processing part was applied inside the cross-validation procedure, i.e., only on the training data.The schemes of the three proposed pipelines are reported in Figure 1.
The three pipelines were tested on a synthetic dataset created using a toy model generator provided by the scikit-learn package [33].This was carried out to verify that the authors' assumption regarding the effect of data cross-contamination in radiomic analyses also applied to more general contexts and was not an incidental effect induced by the limited number of samples in the present dataset.The synthetic data model generator creates normally distributed clusters of points on a hypercube of arbitrary dimension and assigns an equal number of clusters to each class (2 in the present case).It introduces interdependence between features and adds various types of noise to the data.The model allows setting the number of classes and informative, redundant, and repeated features.Two-thousand features for 100 samples were randomly generated using the synthetic data model generator.A number of informative features were used, realistically small as compared to the noise, i.e., 20 features out of 2000.Eighty samples out of one-hundred from Class 0 and twenty from Class 1 were generated, introducing an unbalancing between the classes.In this way, a dataset was created having characteristics similar to a radiomic dataset, i.e., low number of informative features, strongly correlated features, redundant features, and a limited number of samples.
The performance of each pipeline was estimated according to the following metrics: where TP, TN, FP, and FN are the true positive, true negatives, false positives, and false negative scores, respectively, and the AUC score is evaluated as the area under the ROC curve, i.e., true positive rate vs. false positive rate.We used the scikit-learn [33] and imbalanced-learn [34] python packages (accessed on 14 July 2022) for implementing all the ML analyses.

MVI Prediction
The radiomic analysis on the features extracted from tumor and ZOT VOIs was carried out using the schemes of the three proposed pipelines.In Table 1, the classification results are shown.For all the pipeline schemes, the features selected by the ridge and their importance in the ridge feature selection were monitored.In Figure 2a, the absolute values of the ridge coefficient for the 15 selected features are reported.In Figure 2b,c, the mean and standard deviation of the absolute value of the most-frequent 15 features selected by Pipelines 2 and 3 are reported.For each fold of the LOO cross-validation paradigm, the 15 most-informative features were computed.The mean and standard deviation of the absolute values of the ridge coefficient of the 15 most frequently selected features were computed.
Pipelines 1, 2, and 3 were applied to the synthetic dataset, and the results are reported in Table 2.

Discussion
The procedure proposed in Pipeline 3 obtained an ROC AUC score of 0.61.This result was in agreement with the hypothesis that radiomic features extracted from tumor VOIs and the ZOT allowed identifying the presence of MVI from CT images.However, the results obtained were not as good as other state-of-the-art results [18,19,21].According to the authors, this limit could have been due to the limited number of samples involved in the study.This limit was a direct consequence of the single-center nature of the present study, which reduced the possible heterogeneity of the samples and affected the generalization capabilities of the pipelines in this study.Novel techniques capable of increasing the amount of data based on synthetic image generation have already been proposed in the literature [35]; however, their use could introduce unpredictable biases in the radiomic analysis and go beyond the aim of the present study.Furthermore, the dataset used in this study involved small tumor VOIs, making the radiomic analysis more difficult [20].Moreover, many studies have considered not only radiomic features, but also clinical features [18,19,21].The authors obtained notably better results using the scheme proposed in Pipelines 1 and 2, achieving an ROC AUC score of 0.89 and 0.84, respectively.These results are in agreement with the current state-of-the-art results [20,23] obtained with equivalent pipelines.
The results obtained highlighted the criticality of the pre-processing step inside a radiomic pipeline.The application of Pipeline 1 led to results 45% higher than Pipeline 3, by simply excluding the pre-processing steps from the cross-validation procedure.In fact, Pipeline 1 carried out the feature selection and oversampling using the entire set of available data, introducing a putative source of cross-contamination of the training and the test samples.The same behavior was observed when comparing Pipelines 2 and 3. Pipeline 2 obtained results 37% higher than Pipeline 3, by excluding the oversampling step from the cross-validation.The pipeline designs that carried out the oversampling before splitting could show the contamination of the test set with the training data.In both Pipeline 1 and Pipeline 2, the presence of almost redundant data in both the training and the test sets introduced the probability of overfitting, producing classification results that were too optimistic.He et al. and Zhang et al. [19,21] assessed model performance with configurations equivalent to those presented in this study, i.e., they evaluated the pipeline performances using both the training and the test datasets.Their results showed a higher ROC AUC score for the pipeline tested on the training set, i.e., with a design equivalent to the one presented in Pipelines 1 and 2. This behavior validated the authors' hypothesis regarding cross-contamination and its effect on evaluating pipeline performance and generalization capabilities.In this study, the different pipeline designs were presented for the sake of completeness and a direct comparison of the present results with studies already published.
Pipelines 1, 2, and 3 were applied to a synthetic dataset, obtaining ROC AUC scores of 0.89, 0.89, and 0.78, respectively.Notably better results were observed using Pipeline 1 and Pipeline 2. This enforced the results obtained on the real radiomic dataset, confirming the potential cross-contamination problem in general radiomic studies.The same behavior observed on a synthetic dataset additionally validated the authors' hypothesis, confirming that cross-contamination is a methodology problem.
Pipelines 1, 2, and 3 identified the same set of features as the most relevant.It was observed that the features extracted from the VOIs (11 out of 15) were prominent, with a balancing between the portal (6) and the arterial (5) phases, depicting the importance of multi-phase CECT acquisition, which allowed considering both the arterial and the portal veins for the evaluation of the MVI lesions.This result confirmed the hypothesis proposed regarding the stronger predictive value of multi-phase analysis.The pipelines selected 4 out of 15 features from the ZOT volumes; however, their coefficient relevance suggested that the tumor penumbra could play a valuable role in MVI classification.This result was in agreement with the biomedical evidence of possible MVI presence in ZOT volumes.

Conclusions
In this study, a radiomic pipeline was introduced for the automated identification of MVI presence/absence using CT images, i.e., avoiding invasive procedures.Features extracted from CECT scans related to the arterial and the portal phases were used.
The importance of using features from multiple CECT phases to identify the presence/absence of MVI was shown.The importance of including information from the peritumoral volumes was highlighted: ZOT features were shown to be relevant for MVI classification.Introducing radiomic analysis, also of the ZOT volumes, has the poten-tial to improve the robustness of the features extracted, providing information regarding tumor proliferation.
The impact of data pre-processing on the final classification model was also highlighted.The application of feature selection and data oversampling on the entire dataset introduced cross-contamination, leading to over-estimation of the classifier's performance and generalization capabilities.The results obtained by these types of pipelines could be affected by biases, in particular for studies involving a limited number of patients.In conclusion, more attention should be given to avoiding these possible sources of overfitting during the design of radiomic pipelines.
(a) HCC diagnosis; (b) CECT (at least in arterial and portal phases) before resection; (c) MVI presence/absence confirmed by histological evaluation.

Figure 1 .
Figure 1.Scheme of the three pipelines proposed.Starting from the CECT, the manual VOI, and derived ZOT, the entire set of 4520 features was extracted.(a) Pipeline 1, feature selection and data oversampling were carried out on the entire radiomic dataset.It was then classified using the leave-one-out cross-validation, i.e., splitting the dataset into training and test sets, where the test set consisted of only one sample.(b) Pipeline 2, the second version of the pipeline in which oversampling was carried out on the entire dataset and feature selection and classification were carried out using leave-one-out cross-validation, i.e., feature selection classification was carried out only on the training dataset.(c) Pipeline 3, the third version of the pipeline, in which feature selection, oversampling, and classification were carried out using leave-one-out cross-validation, i.e., feature selection and data oversampling were carried out only on the training dataset.
Area Under the Receiver Operative Characteristic (ROC) Curve.

Figure 2 .
Figure 2. The importance of the 15 features selected according to the absolute value of the ridge coefficients.For each feature selected, it is reported whether it belonged to the original image (O), wavelet (W), or Laplacian of Gaussian (LoG).Moreover, whether they were computed from the CECT in the arterial (A) or the portal (P) phase is specified as the tumor (VOI) or zone of transition (ZOT) volumes.The features from the arterial phase, the portal phase, and the ZOT volume are highlighted in red, blue, and bold, respectively.(a) Absolute values of the ridge coefficients for the 15 selected features by Pipeline 1, i.e., computed considering the entire dataset.(b) The mean and standard deviation of the absolute values of the ridge coefficients for the 15 most-frequent features selected by Pipeline 2. (c) The mean and standard deviation of the absolute values of the ridge coefficient for the 15 most-frequent features selected by Pipeline 3.

Table 1 .
Evaluation metrics obtained by applying Pipelines 1, 2, and 3 to the radiomic dataset.For each pipeline, the sensitivity, specificity, precision, balanced accuracy, and ROC AUC scores obtained on the test sets are reported.

Table 2 .
Evaluation metrics obtained by the application of Pipelines 1, 2, and 3 on the synthetic radiomic dataset.The sensitivity, specificity, precision, balanced accuracy, and ROC AUC scores obtained on the test sets are reported for each pipeline.