1. Introduction
Hepatocellular carcinoma is one of the most common cancers and the third leading cause of cancer-related death worldwide [
1]. It is also a highly aggressive liver tumor containing cancer stem cells (CSCs), which contribute to tumor growth, resistance to conventional therapies, and the promotion of tumor recurrence [
2]. While liver resection is considered the first-line treatment for patients with early-stage HCC and well-preserved liver function, it is not always suitable for those with advanced-stage HCC. For these patients, systemic therapies are recommended as an alternative. Sorafenib, a multikinase inhibitor, is the only FDA-approved first-line therapy recommended by the American Association for the Study of Liver Diseases (AASLD) and has been shown to potentially extend the lives of patients with advanced HCC by 2 to 3 months [
3,
4]. However, the high level of resistance in HCC patients significantly impacts the clinical efficacy of Sorafenib, often leading to treatment failure [
4]. Therefore, identifying patients who may benefit from sorafenib treatment without developing resistance is crucial for making individualized treatment decisions in HCC.
The transcription factor SRY box 9 (SOX9) is a key regulator involved in various diseases, including cancers. It plays a crucial role in physiological and pathological processes, such as the cell growth, apoptosis, invasion, and metastasis of tumor cells [
5,
6]. In hepatocellular carcinoma, SOX9 is essential for the self-renewal and proliferation of liver cancer stem cells, which contribute to tumor progression and drug resistance. Additionally, SOX9 has been identified as a potential CSC marker and an independent prognostic factor for HCC [
2,
7,
8]. Recent studies have shown that SOX9 is overexpressed in many solid tumors, including HCC [
7,
9,
10]. Furthermore, a high SOX9 expression correlates with tumor aggressiveness in liver cancer and enhances sorafenib resistance by modulating the expression of ATP-binding cassette sub-family G member 2 (ABCG2) [
11]. Therefore, predicting SOX9 expression in advanced HCC could help identify patients at risk of sorafenib resistance, enabling timely alternative therapies. However, SOX9 status is typically assessed through immunohistochemistry, which requires invasive tumor samples to be obtained via surgery or biopsy. This procedure may introduce sampling bias and increase patient morbidity. Thus, a non-invasive and efficient method for assessing SOX9 expression is urgently needed to enable personalized treatment strategies for HCC patients. This has motivated us to propose a non-invasive method for predicting SOX9 expression, aiming to reduce the physical burden and economic strain on patients.
Recently, deep learning has been widely applied in various fields, including computer vision [
12], natural language processing [
13], DNA sequence analysis [
14], and medical image classification [
15]. Computed tomography (CT) images, a well-known form of medical imaging, contain high-dimensional features that are closely related to a tumor’s microenvironment and molecular status. Many studies have applied DL models to CT images to predict disease grade, gene expression, and immune checkpoint status, including the expression of SOX9 [
16,
17,
18]. However, not all regions of a CT image are equally informative about a patient’s SOX9 status. While DL models excel at handling noisy data and capturing high-dimensional features, it remains a challenging task for neural networks to independently identify regions relevant to SOX9 expression.
The novelty of this study lies in incorporating reinforcement learning into a deep learning model to guide the model in identifying the most relevant regions in CT images, thereby reducing the effect of background noise on the predictive model and enhancing its performance. RL is an advanced technique that trains an agent to take actions in an environment to maximize its cumulative rewards. By optimizing a series of strategies, the model can adjust key parameters to improve its performance [
19,
20,
21]. In our work, RL helps identify the regions in CT images that are most indicative of SOX9 expression status. To the best of our knowledge, this is the first study to utilize a deep learning model enhanced by reinforcement learning to predict SOX9 expression status from CT images. This method not only provides a potential high-precision non-invasive approach for gene expression prediction but also introduces a technique that enhances regions of interest through deep reinforcement learning while effectively mitigating background noise interference, significantly improving the performance of the predictive model.
In conclusion, we have developed and validated a non-invasive deep learning model that can predict SOX9 expression status in patients with advanced HCC using only preoperative contrast-enhanced CT images. We also examined the relationship between SOX9 expression and prognosis in HCC patients treated with sorafenib after surgery. Our experiments demonstrate that the proposed model outperforms previous methods by effectively focusing on relevant regions, indicating its potential to inform personalized treatment strategies for patients with advanced HCC. Furthermore, our findings suggest a strong correlation between SOX9 expression status and deep features extracted from medical imaging, opening up new avenues for research in HCC treatment.
2. Predictive Model
The features potentially seen in CT images may reflect the expression of many genes in HCC patients, including SOX9. Unlike radiomic analyses, deep learning models can automatically identify features in CT images related to SOX9 expression, rather than being limited to known, predefined biomarkers. This enables deep learning models to learn and predict more holistically. However, despite the significant improvement in their classification performance compared to that of traditional machine learning methods, deep learning models are still inevitably affected by noise, especially from features in low-correlation regions. As shown in
Figure 1, which presents the heatmap of a SOX9-positive prediction based on ResNet, deeper colors indicate regions receiving higher attention from the model. However, the model tends to focus on noisy areas, which significantly hampers its performance.
In this study, we developed a novel model that uses reinforcement learning (RL) to guide its attention toward regions closely associated with SOX9 expression while using only preoperative contrast-enhanced CT images to predict SOX9 status in patients with advanced HCC. Compared to traditional methods, our model can automatically identify and strengthen key regions while minimizing interference from noisy areas, thereby significantly improving its prediction accuracy and robustness. This model not only effectively enhances its classification performance but also provides more precise support for personalized treatment planning. The workflow of the proposed method is illustrated in
Figure 2. The top two boxes demonstrate the CT examination and immunohistochemistry procedures considered by our model. The box at the bottom outlines the framework of the proposed model. All components within the model are parameterized by neural networks, with layers that remain trainable throughout the entire training process. The upper half of the framework corresponds to the classification model, while the lower half represents the proposed reinforcement learning method.
We redefine the prediction task as a binary classification problem, the problem of distinguishing between SOX9-positive and SOX9-negative cases, to enhance the model’s expressiveness in this task. To improve its classification performance and minimize interference from background noise, the proposed framework consists of two key components: a classification model and a reinforcement learning model. The classification model is built upon a residual network [
12], with an attention layer added to enhance its classification performance and facilitate long-range feature modeling. The reinforcement learning model consists of a generator, comprising an encoder and a decoder, that acts as an RL agent. By generating a weight matrix, this procedure enables the model to focus selectively on high-weighted regions of interest, effectively filtering out irrelevant areas.
2.1. Classification Model
The classification model is based on an enhanced residual network, which includes an attention layer to effectively capture long-range feature dependencies. The attention layer incorporates a self-attention mechanism [
13,
22,
23], enabling the model to learn global dependencies across the input features. By computing the correlations between a Query (Q), Key (K), and Value (V), attention weights are generated to dynamically adjust the importance of features, thus enhancing the model’s ability to understand and utilize global information. This attention layer is seamlessly integrated into the residual blocks in a modular manner, thereby ensuring more efficient feature propagation throughout the network. This process can be mathematically described as follows:
Here, Q, K, and V are obtained by linearly transforming the input features X and A represents the attention weights.
The cross-entropy loss function [
24] has been widely used in the training of classification models to optimize their performance, as shown in Equation (
2). By measuring the difference between the predicted results and the true labels, it penalizes predictions that deviate significantly from the true values. This approach guides the model to continuously adjust its parameters, minimizing its error and improving the accuracy of its gene expression prediction. In this study, we applied the softmax transformation to the standard function to obtain the probability distribution of the results, thereby enhancing both the interpretability and readability of the outcomes.
where
represents the experimentally validated SOX9 gene expression in patients, while
i denotes the different categories possible.
2.2. Reinforcement Learning Model
The training and updating of the reinforcement learning (RL) model can be viewed as an interaction between the generator and the classification model. From the generator’s perspective, the feature map output from the final residual block in the classification model is treated as the state which is then fed into the generator as a request. The generator responds by generating a new weight distribution and feeds it back to the classification model as an action. This process continues iteratively, with the generator making new decisions based on received signals, to adaptively generate weight distributions that focus on the CT regions most indicative of SOX9 expression. In this process, the generator functions as the RL agent. Finally, by modeling the classification model as the environment and classification accuracy as the reward, the overall process can be framed as a reinforcement learning problem. Let represent the agent’s decision-making process, where S denotes the observation space, A is the action space, and T is the transition function that determines the next state based on the current state–action pair. The goal is to learn a policy that maximizes the expected cumulative reward. The reward function r, defined as having a benchmark value of 0.9, evaluates the effectiveness of an action by comparing the prediction’s accuracy to this benchmark. If the prediction’s accuracy exceeds 0.9, a positive reward is given, signaling a desirable outcome. Conversely, if the accuracy falls below 0.9, a negative reward is assigned, indicating that the action did not meet the performance threshold. This feedback mechanism guides the agent toward actions that enhance prediction accuracy.
Proximal Policy Optimization (PPO) [
25] is a widely used policy gradient method designed for reinforcement learning. It enables data sampling through interactions with the RL environment and optimizes a surrogate objective function using a stochastic gradient descent. In this study, agent training follows the Proximal Policy Optimization-Clip (PPO-Clip) approach, which introduces a similarity constraint that limits policy deviation during importance sampling. The similarity constraint is summarized as follows:
Let and denote the model’s action and state at step t, respectively. Here, represents the output policy, while refers to the target policy. The hyperparameter is used to measure the distance between the output policy and .
2.3. Alternating Training
The classification model and the reinforcement learning model are trained alternately to ensure their stable convergence. While training the classification model, its parameters are updated using the cross-entropy loss, with the reinforcement learning model remaining fixed. Conversely, when training the reinforcement learning model, only its own parameters are updated. This alternating training strategy ensures that both models influence each other while maintaining a certain level of independence. As a result, this approach helps stabilize the overall training dynamics of the model, allowing each model to learn effectively without overwhelming the other, thereby improving the overall performance of the system. Algorithm 1 provides a detailed description of this process in pseudocode.
Algorithm 1 Pseudocode of the alternating training. |
- 1:
Inputs: Training data data - 2:
Outputs: the Classification Model M and the Generator G - 3:
Initialize parameters of both models - 4:
for each epoch in epochs do - 5:
for each batch datai in data do - 6:
Freeze the parameters of the generator - 7:
Calculate the loss value using the cross-entropy loss(Equation ( 2)) - 8:
Update the parameters of the classification model - 9:
end for - 10:
for each batch datai in data do - 11:
Freeze the parameters of the classification model - 12:
Calculate the reward - 13:
Update the parameters of the Proximal Policy Optimization-Clip approach (Equation ( 3)) - 14:
end for - 15:
end for
|
3. Patient Cohort and Data Collection
The Institutional Review Board of West China Hospital approved this retrospective study, with informed consent being waived due to its retrospective nature. From an initial cohort of 179 patients with histologically confirmed hepatocellular carcinoma (HCC) who underwent systemic sorafenib treatment (800 mg daily, administered in the form of 400 mg twice per day) following surgery between July 2011 and June 2019, a total of 101 patients were included. The exclusion criteria were as follows: (1) the interval between CT and surgery exceeded four weeks, (2) poor-quality or incomplete CT imaging, (3) sorafenib treatment was interrupted for more than 48 h between the start of its administration and the first follow-up, and (4) a lack of follow-up data. The patient recruitment and data allocation processes are illustrated in
Figure 3. A total of 78 patients were excluded due to data reliability issues. The remaining patients were then split into training and validation cohorts. To ensure an equal evaluation, the number of SOX9-positive and -negative patients in the validation cohort was balanced.
The median age of the enrolled patients was 51 years (IQR: 42–60). In total, 90 were male (89.11%) and 11 were female (10.89%). The median overall survival time was 29.23 months, while the median recurrence time was 13.27 months. The corresponding interquartile ranges for survival and recurrence were 14.56–47.7 months and 4.77–24.7 months, respectively. An immunohistochemistry analysis showed that 51 patients were SOX9-positive, with a median age of 52 years (IQR: 42–62), while the remaining 50 patients were SOX9-negative, with a median age of 50 years (IQR: 43–59). No significant differences were observed in age distribution between the SOX9-positive and SOX9-negative groups. Additionally, a total of 4011 high-quality CT images were collected, of which 2041 were from the SOX9-positive group, indicating that there were no significant differences in CT image sampling and screening between the two groups.
Finally, patients were randomly assigned to either the training cohort or the validation cohort using the classic hold-out strategy [
26], with 20 patients in total allocated for validation. This group included 10 (50%) from the SOX9-positive group and 10 (50%) from the SOX9-negative group. The training cohort consisted of 72 males and 9 females, with a median age of 52 years (IQR: 43–61). In contrast, the validation cohort included 18 males and 2 females, with a median age of 49 years (IQR: 40–60). No significant differences were observed between the training and validation cohorts in terms of sex or age distribution. A summary of the cohorts’ characteristics is presented in
Table 1. The data interpreted include PLTs (Platelets), Neutrophils, Lymphocytes, TBIL (Total Bilirubin), ALT (Alanine Aminotransferase), AST (Aspartate Aminotransferase), ALB (Albumin), GGT (
-glutamyl transpeptidase), AFP (Alpha-Fetoprotein), and CEA (Carcinoembryonic Antigen).
3.1. Follow-Up Surveillance
All patients were regularly followed up with 3 to 6 months after a curative liver resection, with tests for -fetoprotein levels and imaging examinations, such as ultrasound, CT, or MRI, conducted. During this follow-up, we recorded the time of disease-specific progression (including local recurrence or distant organ metastasis) or death. Recurrence-free survival (RFS) was defined as the time between surgery and the date of relapse. Overall survival (OS) was calculated as the interval from the surgery date to either the date of death or the most recent follow-up. Data from patients who were still alive at the last follow-up were censored.
3.2. Immunohistochemistry
Surgically resected specimens embedded in paraffin were sliced into 4 μm thick sections, dewaxed, hydrated, and subjected to antigen retrieval. The tissue slides were incubated overnight at 4 °C with a primary monoclonal antibody (1:200 rabbit monoclonal antibody, HuaBio, ET1611-56), followed by incubation with a secondary antibody (cat # K5007) and Dako. SOX9 staining was then performed using 3,3′-diaminobenzidine, and counterstaining was achieved with hematoxylin. Two senior pathologists, blinded to all radiological and clinical data, independently assessed the histopathological slides. They performed statistical analyses by selecting five non-overlapping and non-continuous regions to calculate the mean. SOX9-positive cells were quantified at a 400× magnification (0.0484 mm2), and SOX9 expression was determined based on the proportion of SOX9-positive tumor cells. A threshold of 5% for the ratio of SOX9-positive tumor cells to total tumor cells was set, with samples exceeding this threshold considered SOX9-positive and those below it considered negative. If the results showed a variation within 5% of the threshold, re-assessments were conducted until a consensus was reached.
3.3. CT Image Acquisition and Processing
Slices from contrast-enhanced CT scans were obtained using three types of multi-detector CT (MDCT) scanners: the Somatom Definition Flash (used for 45% of the included patients), the Brilliance64 (used for 20% of the included patients), and the Somatom Definition AS+. The high-resolution scanning protocol was as follows: a tube voltage of 90–120 kVp, amperage of 200–20 mA, rotation time of 0.5–0.75 s, pitch of 0.8–1.0, and slice thickness of 2 mm. Intravenous non-ionic contrast material (Omnipaque, 350 mg/mL) at a dosage of 1.5–2.0 mL/kg, supplied by GE Healthcare Chicago IL, was administered via a power injector at a rate of 3 mL/s. Three-phase scans were conducted when the trigger threshold of the aorta reached 100 HU.
All CT images were retrospectively reviewed by two radiologists with over eight years of experience in liver imaging who were blinded to the clinicopathological data. Tumor segmentation was performed on the initial portal venous phase (PVP) CT images using SEVB-Net, a modified version of V-Net [
27] developed by United Imaging Intelligence [
28]. Unlike U-Net [
29], SEVB-Net enables three-dimensional segmentation, rather than just working with two-dimensional images. The Dice score for this segmentation task was 0.855. After the network segmented all tumors, the radiologists reviewed and verified the results to ensure accuracy. If the network’s segmentation results were inaccurate, the tumor contours were manually delineated using ITK-SNAP software (version 3.6).
Finally, the CT images were centered around the tumor, extending outward according to the segmentation results until the size of the image reached 128 × 128. If the tumor boundary exceeded 128, the boundary was preserved. To meet the network requirements, we resized all images to a unified scale of 128 × 128. Additionally, several data augmentation techniques were applied, including Random Rotation, Random Cropping, Random Horizontal Flipping, Random Vertical Flipping, and ColorJitter [
30], to increase data diversity and improve the model’s generalization ability. To mitigate the impact of sample imbalance on model training, we performed upsampling on the less abundant samples to improve training efficiency and stabilize the dynamics of the training process.
5. Discussion
Some studies have started exploring the feasibility of using machine learning or deep learning techniques to predict gene expression in cancer patients and have made some progress. Suleyman et al. [
35] employed five machine learning techniques, Random Forest, Support Vector Machine (SVM), Naive Bayes, C4.5, and K-Nearest Neighbors, to analyze somatic mutation data in breast cancer patients. When using the Random Forest method, the study achieved an accuracy of 0.70. Bhalla et al. [
36] applied SVM- and Random Forest (RF)-based models to 523 cases of clear cell renal cell carcinoma (ccRCC). The main goal of the study was to identify the minimum number of biomarker genes that could effectively distinguish between early-stage and late-stage ccRCC, enabling accurate cancer staging, and this was accomplished with a final accuracy of 70.19%. Matsubara et al. [
37] used convolutional neural networks (CNNs) combined with spectral clustering information to classify lung cancer using protein interaction network data and gene expression data from 639 samples. The accuracy, recall, precision, and specificity achieved in the study were 0.81, 0.88, 0.78, and 0.74, respectively. Guillermo et al. [
38] proposed the combination of CNN and Transfer Learning (TL) for lung cancer prediction. Their study utilized data from TCGA, which contains 33 different types of cancer, with a focus on testing the lung cancer dataset, and achieved an accuracy of 68%. It can be observed that due to the limitations of these method’s effectiveness, the performance of existing methods has reached a bottleneck, with a maximum accuracy of only 0.81. This is not only because the feature extraction process of existing methods heavily relies on expert experience, but also because these methods fail to effectively extract deep features, and particularly the relationships between features. Therefore, research focused on gene prediction needs to further improve the performance of model feature extraction.
On the other hand, current gene prediction technologies primarily rely on genetic sequence data, overlooking the potential of medical imaging (e.g., CT scans) in disease prediction. Genetic sequences provide valuable genetic information, but this approach has certain limitations. First, relying solely on genetic sequences for predictions may fail to capture the full complexity of a disease, particularly in terms of the spatial heterogeneity of tumors and the influence of the tumor microenvironment. Medical imaging, on the other hand, can directly present the tumor’s shape and size and changes in the surrounding tissues, offering more timely and dynamic information. Second, genetic sequencing usually requires additional testing steps, increasing the financial and time burdens on patients, whereas medical imaging is relatively more straightforward and can reflect the disease’s progression in real time. Finally, the accuracy of genetic sequence predictions is often affected by factors like genetic heterogeneity, making it difficult to meet clinical needs, while medical imaging provides more intuitive clinical evidence. Therefore, compared to relying solely on genetic sequences, medical imaging can be more effective in supporting disease diagnosis and prediction, especially in the study of complex diseases like cancer.
Recently, as an important transcription protein, SOX9 has attract a lot of attention due to the relationships between SOX9 and cancer progression and drug resistance [
2,
7,
8]. In terms of HCC, a recent study has proven that high SOX9 expression levels indicate tumor aggressiveness in liver cancer and enhance sorafenib resistance by modulating ATP binding cassette sub-family G member 2(ABCG2) expression [
11]. Furthermore, sorafenib is a first-line systematic treatment option for patients with advanced HCC. Therefore, identifying the SOX9 status of patients can help to determine the risk of sorafenib resistance, which is essential for the construction of personalized treatment strategies, including the choice to use alternative therapies like ICIs. However, IHC is the only method currently available for detecting SOX9 expression, and it requires invasive biopsy or surgery. This invasive strategy may bring the risk of sampling bias and morbidities. Therefore, demonstrating the association between SOX9 and medical imaging could open up new research opportunities in medical imaging analysis while also providing a non-invasive, preoperative SOX9 detection strategy that supports precision treatment for HCC.
Representations from CT images were found to be informative for disease grading, gene expression, and assessing the status of the immune checkpoint pathway. A series of previous studies [
39,
40,
41,
42] utilized radiomics combined with qualitative and quantitative analyses to predict gene or phosphorylation expression through manual feature extraction. Although these proof-of-principle studies demonstrated that radiomics features provide a comprehensive overview of tumor pathological status, feature extraction relying on human expertise still faces challenges such as feature bias, information loss, and reproducibility. Convolutional neural networks, as deep learning techniques, offer significant potential for feature extraction and diagnosis and have led to breakthroughs in many domains [
12,
13,
14,
15]. They can naturally integrate various features and classifiers in an end-to-end, multi-layer fashion, and the levels of these features can be enriched by increasing the depth [
12]. This technology has been applied to prediction tasks based on medical images [
16,
17,
18]. Although CNNs have shown promising performances in feature extraction and prediction tasks, they still struggle to recognize the actual discriminative regions relevant to their prediction target. Specifically, in our task of predicting SOX9 expression, not every region of a CT image is relevant, but only certain parts of it, which suggests that a model’s diagnostic performance can be improved by enhancing those specific regions. However, it remains a challenging task to help CNNs identify these regions.
Recently, several studies have attempted to integrate deep learning with reinforcement learning to enhance the feature extraction capabilities of convolutional neural networks, achieving promising progress. Joseph Stember et al. [
43] used reinforcement learning to classify 2D brain MRI images, using a very small training dataset, and achieved remarkable accuracy. However, the dataset in their study was quite limited, making it insufficient to validate the model’s generalization ability. Moreover, the study employed reinforcement learning in a multi-step image classification strategy aimed at building an end-to-end learning process, but it did not help the model selectively extract features from regions of interest. Similarly, Emma Slade et al. [
44] proposed an active learning framework based on deep reinforcement learning to optimize the efficiency of medical image classification by selecting a subset of images that maximally enhance model performance. In addition, Mingyuan Jiu et al. [
45] introduced an adaptive active learning method that combines deep reinforcement learning with active learning. By leveraging the Deep Deterministic Policy Gradient (DDPG) algorithm, their framework dynamically optimizes sample selection strategies across different learning environments, thereby improving model performance. In both studies, reinforcement learning was used for sample selection to enhance the representativeness of the samples used in active learning.
Although some studies have explored integrating deep learning with reinforcement learning for computer vision tasks, particularly in medical image processing and analysis, current research mainly focuses on using reinforcement learning for multi-step feature extraction and active learning decision-making. There remains a lack of research into using reinforcement learning to help models focus on regions of interest, thus reducing the impact of background noise. Experimental results have shown that background noise significantly influences the prediction of SOX9 expression in HCC patients when using CT images. Therefore, how to utilize reinforcement learning to enhance deep learning models’ ability to extract key features while effectively avoiding interference from background noise is the core focus of this study.
In this retrospective study, we developed and validated a reinforcement learning (RL)-based deep learning (DL) model that can non-invasively identify the SOX9 status of hepatocellular carcinoma (HCC) patients preoperatively, providing support for personalized treatment. A total of 101 hepatocellular carcinoma (HCC) patients, all histologically confirmed, were enrolled from West China Hospital between 2011 and 2019. These patients were subsequently divided into two groups: a training cohort and a validation cohort. The training cohort was used to develop and optimize the model, while the validation cohort was used to assess its performance and generalizability. Experimental results showed that our model achieved AUCs of 94.42% (95% CI, 92.23–96.39%) and 91.00% (95% CI, 88.64–93.15%) for the two cohorts. The results demonstrate that using deep reinforcement learning to extract latent features from CT images for the non-invasive prediction of SOX9 expression in HCC patients holds significant potential for clinical application. The model not only meets the standards required for clinical research but also strikes a good balance between sensitivity and specificity, with strong generalization capability. Compared to existing deep learning models, our algorithm shows a significantly superior performance, which is attributed to the use of reinforcement learning to focus on extracting and learning features from regions of interest, effectively mitigating the impact of background noise on the model’s performance. The use of class activation Maps to interpret the results further supported this conclusion. As shown in
Figure 7, the correctly predicted samples contained regions with higher activation values that were consistently concentrated around the tumor. In contrast, the activation regions in misclassified samples were predominantly in the background. Additionally, we observed another interesting phenomenon: in the SOX9-positive group, the regions with the highest activation values were often located in the tumor and in peritumoral areas, whereas in the SOX9-negative group, the activated regions were limited to the tumor alone. Although this phenomenon was observed in only a few cases, it holds significant research value for future studies.
Previous studies have established that SOX9 is a significant biological marker of recurrence-free survival (RFS) and poor prognosis in hepatocellular carcinoma (HCC) [
46,
47]. Another study identified SOX9 as an independent risk factor for both RFS and overall survival (OS) in HCC patients treated with sorafenib, as it enhances sorafenib resistance through the modulation of ABCG2 [
11]. In our experiments, we found that HCCs in the SOX9-positive group had worse outcomes than those in the SOX9-negative group when treated with sorafenib after surgery. Specifically, their RFS and OS were significantly lower. Notably, HCC recurrence post-hepatectomy is typically classified into early or late recurrence, with 12 months being the cut-off point [
48]. Our results for RFS within 12 months revealed that the SOX9-positive group had a significantly lower RFS rate compared to the SOX9-negative group. We attribute this to the fact that the HCCs in the SOX9-positive group often exhibited more aggressive histological behaviors, such as stronger venous invasion and more advanced TNM stages. These findings reinforce the idea that SOX9 status serves as an independent predictor of OS and RFS in HCCs after sorafenib treatment.
The results above clearly demonstrate that the method proposed in this study can accurately identify patients with advanced hepatocellular carcinoma (HCC) who are most likely to benefit from sorafenib treatment, thus providing personalized treatment strategies, particularly in deciding whether or not to administer sorafenib. Through this approach, we are not only able to predict patients’ treatment responses based on their SOX9 status but also to tailor treatment plans to each patient to enhance treatment efficacy and reduce unnecessary drug side effects. More importantly, this personalized treatment approach has the potential to significantly improve patients’ quality of life and survival rates, ultimately optimizing treatment outcomes. This precise treatment planning can assist doctors in making better decisions during treatment, playing a crucial role in complex cancer therapies.
However, despite the promising potential of this method, it still has several limitations. First, the reinforcement learning network is relatively complex to train and prone to overfitting, especially when the sample size is small. To improve training efficiency and avoid overfitting, we introduced two different learning rates during model training, using a smaller learning rate for the RL-based network to prevent gradient explosion and ensure training stability. Second, in our experiments, the model’s specificity was always higher than its sensitivity, which could be related to the small sample size used. The insufficient sample size may have caused the model to be more conservative in avoiding false positives, resulting in higher specificity but potentially affecting its sensitivity. To overcome this limitation, future studies could increase the sample size to improve the model’s performance and further optimize its generalization ability. Additionally, clinical variables could be incorporated into the model to enhance its overall diagnostic performance, particularly when diagnosing early-stage or rare subtypes of HCC. Finally, the data in this study were derived from a single institution, and while internal validation has proven the reliability of the model, further external validation using independent cohorts from different institutions is necessary. This multi-center validation would help reduce potential biases and strengthen the model’s applicability, ensuring its operational value and utility in a broader clinical setting.