Machine Learning to Predict the Response to Lenvatinib Combined with Transarterial Chemoembolization for Unresectable Hepatocellular Carcinoma

Simple Summary The objective response rate of lenvatinib combined with transarterial chemoembolization for unresectable hepatocellular carcinoma is unsatisfactory. We aimed to develop predictive models using demographic characteristics, pre-treatment serum biomarkers and tumor characteristics of unresectable hepatocellular carcinoma patients by five machine learning algorithms to predict the response under combined treatments. We identified the 10 most important predictors, including K, low-density lipoprotein, D-dimer, red blood cell, alanine aminotransferase, albumin, monocyte, tumor size, triglyceride, and age. In addition, we applied the Shapley Additive exPlanation to explain the best-performing random forest predictive model to provide a reasonable explanation of the efficacy prediction at an individualized level. The combination of machine learning and Shapley Additive exPlanation can provide valuable suggestions for clinical decision making. Abstract Background: Lenvatinib and transarterial chemoembolization (TACE) are first-line treatments for unresectable hepatocellular carcinoma (HCC), but the objective response rate (ORR) is not satisfactory. We aimed to predict the response to lenvatinib combined with TACE before treatment for unresectable HCC using machine learning (ML) algorithms based on clinical data. Methods: Patients with unresectable HCC receiving the combination therapy of lenvatinib combined with TACE from two medical centers were retrospectively collected from January 2020 to December 2021. The response to the combination therapy was evaluated over the following 4–12 weeks. Five types of ML algorithms were applied to develop the predictive models, including classification and regression tree (CART), adaptive boosting (AdaBoost), extreme gradient boosting (XGBoost), random forest (RF), and support vector machine (SVM). The performance of the models was assessed by the receiver operating characteristic (ROC) curve and area under the receiver operating characteristic curve (AUC). The Shapley Additive exPlanation (SHAP) method was applied to explain the model. Results: A total of 125 unresectable HCC patients were included in the analysis after the inclusion and exclusion criteria, among which 42 (33.6%) patients showed progression disease (PD), 49 (39.2%) showed stable disease (SD), and 34 (27.2%) achieved partial response (PR). The nonresponse group (PD + SD) included 91 patients, while the response group (PR) included 34 patients. The top 40 most important features from all 64 clinical features were selected using the recursive feature elimination (RFE) algorithm to develop the predictive models. The predictive power was satisfactory, with AUCs of 0.74 to 0.91. The SVM model and RF model showed the highest accuracy (86.5%), and the RF model showed the largest AUC (0.91, 95% confidence interval (CI): 0.61–0.95). The SHAP summary plot and decision plot illustrated the impact of the top 40 features on the efficacy of the combination therapy, and the SHAP force plot successfully predicted the efficacy at the individualized level. Conclusions: A new predictive model based on clinical data was developed using ML algorithms, which showed favorable performance in predicting the response to lenvatinib combined with TACE for unresectable HCC. Combining ML with SHAP could provide an explicit explanation of the efficacy prediction.


Introduction
Hepatocellular carcinoma (HCC) is one of the most fatal malignancies and the predominant type of primary liver cancer (PLC), accounting for approximately 90% of PLC cases [1,2]. Currently, the incidence and disease burden of HCC are increasing annually, with roughly 850,000 new cases and 810,000 deaths occurring per year [3]. HCC patients have no significant symptoms at the early stage and are generally diagnosed at an advanced stage, leading to a loss of curative surgical treatment [4]. Patients with unresectable HCC have poor prognosis with limited effective treatment options [5]. Due to the limited treatment options for unresectable HCC patients, systemic therapies such as molecular target agents have become the primary treatment choices [5,6].
As a small molecule, multitarget, multireceptor tyrosine kinase inhibitor (TKI), lenvatinib has become a first-line therapy for HCC [7,8]. The previous randomized phase 3 trial by Masatoshi et al. confirmed that lenvatinib showed a higher objective response rate (ORR) and longer median overall survival (OS) and median progression-free survival (PFS) compared with sorafenib [9]. Similarly, transarterial chemoembolization (TACE) is another first-line treatment for intermediate stage and advanced HCC as well [10]. TACE could trigger tumor ischemic necrosis by infusing cytotoxic chemotherapeutic agents and delivering embolic components to the tumor artery [11]. However, TACE could potentially expose the tumor to a hypoxic state and induce the upregulation of vascular endothelial growth factor (VEGF) and fibroblast growth factor (FGF), resulting in tumor revascularization [10]. Lenvatinib appears to be capable of inhibiting the revascularization caused by TACE, which targets VEGFR1-3 and FGFR1-4 [12,13]. Lenvatinib combined with TACE has shown a superior efficacy to sorafenib combined with TACE treatment, with a similar treatment safety [14]. However, the heterogeneity and chemoresistance of HCC prevent patients from significantly responding to lenvatinib combined with TACE, leading to an unsatisfactory ORR. Therefore, it is particularly critical to define the accurate predictive factors for the response to lenvatinib combined with TACE prior to treatment.
As an integral part of artificial intelligence (AI), machine learning (ML) can identify trends and patterns by inputting and processing data, setting up different algorithms, and applying computer analysis [15]. ML provides significant advantages over traditional statistical methods for the analysis and evaluation of medical data and big data [16]. ML applications can encompass multiple fields in medical research, including distinguishing benign and malignant tumors, identifying potential biomarkers, screening patients, and disease prediction and diagnosis [17]. In particular, the prediction and diagnosis of diseases can be improved significantly by establishing and developing models with ML algorithms [18]. Since human diseases are commonly accompanied by complicated dynamic changes during progression that are virtually impossible to identify artificially, ML methods have been widely used in the diagnosis, treatment, and prognosis of various diseases [19].
There have been rare studies focusing on predicting the response of patients with unresectable HCC to lenvatinib combined with TACE using ML algorithms. Therefore, this study aimed to predict the efficacy of combination therapy for patients with unresectable HCC using ML algorithms based on clinical data.

Study Population
This study was conducted based on a real-world cohort of consecutive patients treated with lenvatinib combined with TACE from two medical centers (the First Affiliated Hospital of Wenzhou Medical University and the First Affiliated Hospital of Zhejiang Chinese Medical University). A total of 453 patients with unresectable HCC were retrospectively registered from January 2020 to December 2021. HCC was diagnosed radiologically or histologically following the Asian Pacific Association for the Study of the Liver (APSAL) [20]. The inclusion criteria were as follows: (1) HCC diagnosed by radiology or histology; (2) receiving lenvatinib combined with TACE treatment; (3) age > 18 years; (4) Eastern Cooperative Oncology Group performance status (ECOG-PS) grade 0-1; (5) Child-Pugh score ≤ 7. The exclusion criteria were as follows: (1) with other malignancies; (2) receiving other antitumor treatments; (3) unable to estimate lenvatinib combined with TACE treatment efficacy; (4) lost to follow-up; (5) incomplete clinical data. To avoid the bias between the data from the two medical centers, we performed a principal component analysis (PCA) on all 64 standardized clinical features to explore the bias. The study was conducted according to the ethical principles of the Declaration of Helsinki and was approved by the Ethics Committee of the local institutional review boards (KY2022-R095). Written informed consent was obtained from each patient at the first time of admission to hospital.

Treatment Protocol and Response Evaluation
Lenvatinib was orally administered within 7 days before or after the TACE therapy [21]. According to the guidelines, the dose of lenvatinib was 12 mg/d for patients weighing ≥60 kg or 8 mg/d for patients weighing <60 kg [8]. Lenvatinib combined with TACE therapy was discontinued when unacceptable adverse events or significant disease progression were observed, or the patient withdrew consent. The tumor response was assessed by experienced hepatobiliary surgeons C. G. (15 years of clinical experience) and Y. Z. P. (35 years of clinical experience) with radiological methods with reference to the modified RECIST (mRECIST) criteria within 4-12 weeks after combination therapy [22]. The tumor response included the following 4 categories: (1) complete response (CR)-disappearance of intratumoral arterial enhancement in typical target lesions and disappearance of atypical intrahepatic and extrahepatic target; (2) partial response (PR)-the sum of the diameters of the target lesions (including viable tumor diameters for typical intrahepatic target lesions and short axis diameters for nodal lesions) was reduced by at least 30%, referenced to the sum of the longest diameters at baseline; (3) progressive disease (PD)-the sum of the diameters of the target lesions (including viable tumor diameters for typical intrahepatic target lesions and short axis diameters for nodal lesions) was increased by at least 20% AND at least 5 mm, referenced to the nadir sum of the diameters at baseline; (4) stable disease (SD)-neither sufficient reduction to achieve PR nor sufficient increase to advance to PD. The objective response rate (ORR) and disease control rate (DCR) were significant indicators showing the antitumor effect of the combination therapy. ORR was defined as the proportion of patients showing CR and PR. DCR was defined as the proportion of patients showing CR, PR, and SD.

Data Acquisition, Preprocessing, and Feature Extraction
We combined the patients showing PD and SD into a nonresponse group (n = 91), and the patients achieving PR were listed as a response group (n = 34). We collected 64 clinical features in total, including demographic characteristics (age, gender, body mass index (BMI), smoking history, drinking history, hepatitis B virus (HBV) infection, hepatitis C virus (HCV), hypertensive history, diabetes history, heart disease history, nonalcohol fatty liver disease (NAFLD) history, and cirrhosis history), pretreatment serum biomarkers, and tumor characteristics from patients with unresectable HCC. We standardized the continuous variables to eliminate the influence of the distance between the sample data. The standardized data were calculated by subtracting the mean from the original data and dividing by the standard deviation. A random forest (RF) model was developed based on all 64 clinical features and selected top 40 features during the iterative process by recursive feature elimination (RFE). RFE is a wrapped feature selection algorithm, essentially a greedy algorithm intended to select the feature subset with the best performance. In the end, we developed the predictive model with the selected 40 features.

Machine Learning Approach
We aimed to predict whether patients with unresectable HCC would have a response to lenvatinib combined with TACE using ML algorithms. The predicted outcomes were binary classes: nonresponse and response. The clinical data were randomly split, with 70% as the training set and 30% as the testing set. The training set was used for data fitting, hyperparameter determination, and prediction models development, while the testing set was used to evaluate the performance of the prediction models. We applied the synthetic minority oversampling technique (SMOTE) algorithm for the sampling to overcome the imbalance of the samples in the training set [23]. We utilized the following five supervised ML algorithms to develop the predictive models: classification and regression tree (CART), adaptive boosting (AdaBoost), extreme gradient boosting (XGBoost), RF, and support vector machine (SVM). The hyperparameters were selected in each ML model by three repetitions of 10-fold cross-validation, and a grid search was applied to determine the hyperparameters yielding the best model performance. We evaluated the predictive performance of each ML model by plotting and comparing the area under the receiver operating characteristic curve (AUC) and calculated the accuracy, precision, recall, F1-score, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and Matthews correlation coefficient (MCC) from the confusion matrix of the testing set. In addition, we performed unsupervised clustering of the included 125 patients with unresectable HCC by the K-means algorithm and visualized the clustering results after the dimensionality reduction using t-distributed stochastic neighbor embedding (t-SNE). To make the black-box effects of the ML models explainable, we further used Shapley Additive exPlanation (SHAP) values based on cooperative game theory to elaborate the global model structure by multiple local explanations [24]. All ML analyses were performed using Python (version 3.

Statistical Analysis
The continuous variables were described as the mean ± standard deviation (SD) or median (interquartile range, (IQR)) based on the data distribution, and the Student's t-test or Mann-Whitney U test were further applied to compare the differences between the training and testing sets. The categorical variables were described as the frequencies (percentages) and analyzed using the chi-square test or Fisher exact test. p-Value < 0.05 was considered statistically significant. All statistical analyses were conducted using the R program (R Foundation for Statistical Computing, version 4.2.1).

Patient Characteristics and Treatment Response
A flow chart of the study was shown in Figure 1. We retrospectively collected data from 125 patients with unresectable HCC from January 2020 to December 2021. Among the 125 patients treated with lenvatinib combined with TACE, 42 (33.6%) patients showed PD, 49 (39.2%) showed SD, and 34 (27.2%) achieved PR. The ORR and DCR were 27.2% and 66.4%, respectively. We randomly divided the patients into a training set (n = 88) and testing set (n = 37) according to a ratio of 7:3 and compared all the features between the two groups. The demographic characteristics, pretreatment serum biomarkers, and tumor characteristics are shown in Table 1. The detailed raw data of 125 unresectable HCC patients were available in the Supplementary File S1.

Patient Characteristics and Treatment Response
A flow chart of the study was shown in Figure 1. We retrospectively collected data from 125 patients with unresectable HCC from January 2020 to December 2021. Among the 125 patients treated with lenvatinib combined with TACE, 42 (33.6%) patients showed PD, 49 (39.2%) showed SD, and 34 (27.2%) achieved PR. The ORR and DCR were 27.2% and 66.4%, respectively. We randomly divided the patients into a training set (n = 88) and testing set (n = 37) according to a ratio of 7:3 and compared all the features between the two groups. The demographic characteristics, pretreatment serum biomarkers, and tumor characteristics are shown in Table 1. The detailed raw data of 125 unresectable HCC patients were available in the Supplementary File S1.        (Table S1). The PCA showed no significant bias between the data from the two medical centers (R = 0.0060, p = 0.461) in Supplementary File S2 ( Figure S1).

Predictive Performance of the Machine Learning
We developed models based on the extracted 40 features with five types of ML classification algorithms, aiming to predict the efficacy of lenvatinib combined with TACE in unresectable HCC. The predictive performance matrix based on the testing set of the five models are shown in Table 2. The best hyperparameters and confusion matrix for five machine learning algorithms were listed in the Supplementary File S2 (Table S2) and Supplementary File S2 (Table S3), respectively. All of the ML models showed a moderate-to-excellent predictive performance (the AUCs ranged from 0.74 to 0.91). The SVM model and RF model showed the highest prediction accuracy (86.5%). All of the models were excellent in predicting the nonresponders to combined therapy in this study, as shown by the PPV and NPV. The AdaBoost, XGBoost, and CART predicted more false-positive patients, resulting in lower accuracy and precision. The AUCs based on the testing set of the five ML algorithms are shown in Figure 2. All of the ML models showed a moderate-to-excellent predictive performance (the AUCs ranged from 0.74 to 0.91). The SVM model and RF model showed the highest prediction accuracy (86.5%). All of the models were excellent in predicting the nonresponders to combined therapy in this study, as shown by the PPV and NPV. The AdaBoost, XGBoost, and CART predicted more false-positive patients, resulting in lower accuracy and precision. The AUCs based on the testing set of the five ML algorithms are shown in Figure 2. The RF algorithm revealed the largest AUC (0.91), and the AUCs of the SVM, XGBoost, AdaBoost, and CART were 0.86, 0.80, 0.80, and 0.74, respectively. Then, we ranked the relative importance of the 40 features ( Figure 3A) and identified the 10 most important predictors, including K, low-density lipoprotein (LDL), D-dimer (D-D), RBC, alanine aminotransferase (ALT), albumin (ALB), monocyte (Mono), tumor size, triglyceride (TG), and age. The box plots of the 10 predictor distributions between the nonresponse group and response group are shown in Figure 3B. The results before and after the unsupervised clustering of 125 patients with unresectable HCC are shown in the Supplementary File S1 ( Figure S2). The RF algorithm revealed the largest AUC (0.91), and the AUCs of the SVM, XGBoost, AdaBoost, and CART were 0.86, 0.80, 0.80, and 0.74, respectively. Then, we ranked the relative importance of the 40 features ( Figure 3A) and identified the 10 most important predictors, including K, low-density lipoprotein (LDL), D-dimer (D-D), RBC, alanine aminotransferase (ALT), albumin (ALB), monocyte (Mono), tumor size, triglyceride (TG), and age. The box plots of the 10 predictor distributions between the nonresponse group and response group are shown in Figure 3B. The results before and after the unsupervised clustering of 125 patients with unresectable HCC are shown in the Supplementary File S1 ( Figure S2).  T-BIL, total bilirubin; AST, aspartate aminotransferase; CEA, car-cinoembryonic antigen; Hb, hemoglobin; A/G, albumin/globulin; GLOB, globulin; FIB, fibrinogen; PLT, platelet; γ-GTP, γ-glutamyl transpeptidase; HDL, high-density lipoprotein; AFP, alpha fetoprotein; TP, total protein; NLR, neutrophil lymphocyte ratio; Lymph, lymphocyte; BUN, blood urea nitrogen; PLR, platelet lymphocyte ratio; CA19-9, carbohydrate antigen 19-9; Scr, serum creatinine; Neut, neutrophil; ALP, alkaline phosphatase; WBC, white blood cell; NR, nonresponse; R, response.

Explainability of the Machine Learning Models
To comprehensively explain the ML models, we used SHAP values to explain how the features affect the efficacy of lenvatinib combined with TACE. Since the SHAP algorithm could be applied to the tree-based algorithm for explanation but not for the SVM algorithm, we selected the RF model, which showed an excellent performance (AUC = 0.91), to calculate the SHAP values and draw a SHAP summary plot of the top 40 features ( Figure 4A). Each given patient in the training set was represented by a single point for each feature on the plot. The x-axis coordinate of each point was determined by the SHAP value, and the points were stacked along each feature to show the density. The features were ordered by the mean absolute value of the SHAP values for each feature. The color was used to show the original value of a feature; red and blue indicated high and low probability values, respectively. For example, a lower serum K, higher ALT, older age, and larger tumor size and BMI suggested a higher probability of response to lenvatinib combined with TACE. The SHAP decision plot provided a detailed view of the inner workings of the RF model ( Figure 4B). The gray, vertical line in the decision plot represents the base value of the model, while each colored line represented the prediction for each patient in the training set. Each prediction line showed how the SHAP values were accumulated from the base values at the bottom of the plot to the top of the plot to obtain the final score. The red line indicated a higher probability of response to lenvatinib combined with TACE, while the blue line indicated a lower probability. The SHAP decision plot provided a global explanation of the internal framework for the RF model. We then randomly selected one patient who had a response to lenvatinib combined with TACE ( Figure 4C) and one patient who did not have a response ( Figure 4D) in the testing set to illustrate the explainability of the model using a SHAP force plot. The arrows represented the effect of each feature on the predicted result, with the red arrows indicating an increased probability of response to the combined treatment and the blue arrows indicating a decreased probability. The predicted values were 0.88 for the first patient and 0.18 for the second patient after integrating the effects for each feature, which were in accordance with the true situation.

Discussion
HCC remains a serious challenge around the world, with a predicted case number of over one million by 2025, which is the third leading cause of cancer-related deaths [25]. ML has been shown to be effective in diagnosing and treating liver diseases for the purpose of precision therapy [26]. We performed a retrospective study based on the real world data from two hospitals. Then, we constructed a valuable clinical data-based model using ML methods to predict the response to lenvatinib combined with TACE for patients with unresectable HCC, which may provide new insights into risk prediction and clinical decision making.
Previous studies have shown that lenvatinib combined with TACE can significantly improve the OS and PFS compared with TACE monotherapy for patients with unresectable HCC [27,28]. Another study based on propensity score matching (PSM) confirmed the beneficial effect of alternating lenvatinib and TACE therapy compared with lenvatinib monotherapy on the prognosis of patients with intermediate stage HCC [29]. Similarly, a recent phase III randomized clinical trial in China demonstrated a longer OS and PFS and a higher ORR with lenvatinib combined with TACE therapy compared to lenvatinib monotherapy [30]. However, the chemotherapy resistance, complicated heterogeneity, and genetic aberrations of unresectable HCC result in a lower ORR and hinder the formulation of treatment options [31,32]. Identifying new clinical biomarkers is of great significance for clinical decision making and individual treatment.
Therefore, we conducted this retrospective study to explore the predictors of the response to lenvatinib combined with TACE using ML algorithms. We constructed five types of ML models, and all of the models showed excellent predictive performance, with AUCs of 0.74 to 0.91. The SVM and RF algorithms achieved the highest accuracy rate of 86.5%. The SVM and RF are both highly complicated models, and we could input data and obtain predictions but have no idea how they worked internally. Although the SVM showed the best predictive performance, explaining the model through hyperplanes is not conducive to clinical practice, because the kernel selected for performing the tuning is "rbf", which is a nonlinear kernel, and the model could not provide the weights assigned to the features. Tree ensemble algorithms could provide the advantage of being more accurate and stable predictions compared to a single decision tree and the availability of ranking the importance of features. AdaBoost, XGBoost, and CART predicted more patients who did not have a response to lenvatinib combined with TACE as having a response in the testing set, resulting in an increase in false positives. The optimal performing tree ensemble algorithm in this study was the RF. However, it would be too complicated to apply the RF model to clinical practice directly. Since the SHAP algorithm could be applied to the tree-based algorithm for explanation but not to the SVM algorithm. We further calculated the SHAP values based on the RF model to present global and local explanations of the efficacy to understand the internal framework.
The association of serum K, age, BMI, and tumor size with the efficacy of lenvatinib combined with TACE for unresectable HCC has not been clearly evaluated in previous studies. In this study, the global explanation of the SHAP model demonstrated the trend that patients with lower serum K, older age, larger BMI, and larger tumor size were more likely to be responsive to combination therapy. Previous studies have demonstrated that ALB was associated with the efficacy and prognosis of lenvatinib treatment, which was considered a predictive marker [10,33]. Similarly, the platelet-lymphocyte ratio (PLR), neutrophil-to-lymphocyte ratio (NLR), and alpha fetoprotein (AFP) were also related to the survival of patients with unresectable HCC treated with lenvatinib, which were considered early predictors of objective response [34][35][36]. In our study, we obtained a similar result. In addition, we used the SHAP model to predict patient efficacy at the individualized level by force plot to support clinical decision making. A comparison of the output value and base value can be applied to determine whether patients tend to have a response to lenvatinib combined with TACE therapy.
To our knowledge, this is the first study to predict the efficacy of lenvatinib combined with TACE therapy for unresectable HCC using ML algorithms and SHAP methods. However, this study has some limitations. Firstly, the sample size included in the study was moderate, which is similar to previous studies [10,14]. However, ML algorithms potentially perform better on larger sample size datasets, as well as specializing in detecting a realistic relationship between features and outcomes. Secondly, there was a lack of an independent external validation cohort to test the accuracy and robustness of the models. Thirdly, some concealed relationships between features were likely to be neglected by the ML algorithms, and other neglected features that are considered clinically unimportant may be associated with the efficacy of lenvatinib combined with TACE. In addition, we need to collect multidimensional features for analysis to improve the predictive performance, including lifestyle habits, environmental factors, imaging markers, and pathological examination-related features.

Conclusions
A valuable model was developed using ML algorithms based on clinical features to predict the response to lenvatinib combined with TACE for patients with unresectable HCC. Combining ML with SHAP can provide a reasonable explanation of the efficacy prediction at the individualized level, which is important for clinical decision making.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/cancers15030625/s1, File S1: The raw data of 125 patients with unresectable HCC; File S2: Figure S1: PCA plot of the two medical centers based on 64 standardized clinical features; Figure S2: The scatter plot before (A) and after (B) Kmeans clustering; Table S1: Demographic characteristics, pre-treatment serum biomarkers and tumor characteristics of nonresponse group and response group; Table S2: The best hyperparameters for five machine learning algorithms; Table S3: The confusion matrix for five machine learning algorithms.