Identification and Validation of Cuproptosis-Related LncRNA Signatures in the Prognosis and Immunotherapy of Clear Cell Renal Cell Carcinoma Using Machine Learning

(1) Objective: We aimed to mine cuproptosis-related LncRNAs with prognostic value and construct a corresponding prognostic model using machine learning. External validation of the model was performed in the ICGC database and in multiple renal cancer cell lines via qPCR. (2) Methods: TCGA and ICGC cohorts related to renal clear cell carcinoma were included. GO and KEGG analyses were conducted to determine the biological significance of differentially expressed cuproptosis-related LncRNAs (CRLRs). Machine learning (LASSO), Kaplan–Meier, and Cox analyses were conducted to determine the prognostic genes. The tumor microenvironment and tumor mutation load were further studied. TIDE and IC50 were used to evaluate the response to immunotherapy, a risk model of LncRNAs related to the cuproptosis genes was established, and the ability of this model was verified in an external independent ICGC cohort. LncRNAs were identified in normal HK-2 cells and verified in four renal cell lines via qPCR. (3) Results: We obtained 280 CRLRs and identified 66 LncRNAs included in the TCGA-KIRC cohort. Then, three hub LncRNAs (AC026401.3, FOXD2−AS1, and LASTR), which were over-expressed in the four ccRCC cell lines compared with the human renal cortex proximal tubule epithelial cell line HK-2, were identified. In the ICGC database, the expression of FOXD2-AS1 and LASTR was consistent with the qPCR and TCGA-KIRC. The results also indicated that patients with low-risk ccRCC—stratified by tumor-node metastasis stage, sex, and tumor grade—had significantly better overall survival than those with high-risk ccRCC. The predictive algorithm showed that, according to the three CRLR models, the low-risk group was more sensitive to nine target drugs (A.443654, A.770041, ABT.888, AG.014699, AMG.706, ATRA, AP.24534, axitinib, and AZ628), based on the estimated half-maximal inhibitory concentrations. In contrast, the high-risk group was more sensitive to ABT.263 and AKT inhibitors VIII and AS601245. Using the CRLR models, the correlation between the tumor immune microenvironment and cancer immunotherapy response revealed that high-risk patients are more likely to respond to immunotherapy than low-risk patients. In terms of immune marker levels, there were significant differences between the high- and low-risk groups. A high TMB score in the high-risk CRLR group was associated with worse survival, which could be a prognostic factor for KIRC. (4) Conclusions: This study elucidates the core cuproptosis-related LncRNAs, FOXD2−AS1, AC026401.3, and LASTR, in terms of potential predictive value, immunotherapeutic strategy, and outcome of ccRCC.


Introduction
Renal cell carcinoma (RCC) is a malignant tumor originating from the renal tubular epithelium, accounting for 80-90% of malignant renal tumors. Annually, approximately

ccRCC Data Collection
ccRCC patient data (72 normal patients and 539 ccRCC patients) were downloaded from the TCGA database. We collected clinical information on patients with ccRCC, including age, sex, stage, tumor-node-metastasis stage, grade, survival status, and followup time. The ICGC (International Cancer Genome Consortium) database includes 45 normal tissues adjacent to primary tumors and 91 primary tumor solid tissues.

Construction of the CRLR Prognostic Model by a Machine Learning Algorithm
LASSO is a machine learning algorithm based on regression. In this method, a regularization function is introduced to punish over-fitting on the basis of logistic regression, and the regression coefficient is compressed such that unnecessary or insignificant covariates can be automatically removed and refined model variables can be obtained [10,13,14].
Using the R package glmnet, the LASSO Cox regression technique demonstrated that CRLRs were distinctly associated with the overall survival (OS) of patients with ccRCC. Combining univariate and multivariate Cox regression analyses, CRLRs were associated with significant increases. The equation of risk score = (β1 × CRLR-1) + (β2 × CRLR-2) + . . . + (βn × CRLRs-n). By combining the CRLR prognostic signature with independent TCGA-KIRC factors, we developed a hybrid nomogram. Patients with ccRCC were divided into risk groups according to their CRLR expression levels. Based on the clinical variables and the CRLRs in the hybrid nomogram, ROC analyses were conducted to estimate the accuracy of the 1-, 3-, and 5-year OS.

Kaplan-Meier (K-M) Survival Analysis and Principal Component Analysis (PCA)
We examined patients with ccRCC based on CRLR signatures using K-M survival curves and PCA. In our analysis of the TIDE model for immunotherapy, we used half-maximal inhibitory concentrations (IC50) from the GDSC web pages to estimate the therapeutic response where the TIDE model could predict whether immunotherapy would succeed.

qPCR and RNA Isolation
Total RNA was isolated using RNAiso Plus Reagent (9108, Shanghai, China), according to the manufacturer's instructions. Total RNA was used for cDNA synthesis using the PrimeScript RT Reagent Kit (RR037A, Shanghai, China). Gene expression was quantified using TB Green Premix Ex TaqII (RR820A, Shanghai, China). Sangon Biotech Co., Ltd. (Shanghai, China) synthesized all primers for qPCR (see Table 1). The PCR procedure was as follows: 40 cycles of 98 • C for 30 s, 98 • C for 5 s, and 60 • C for 5 s. β-Actin served as the internal reference for normalization. The expression levels were calculated using the 2 −∆∆Ct method.

Gene
Forward Primer Reverse Primer

Statistical Analysis
We used the R statistical package (version 4.0.2) for our analyses. A Wilcoxon test was conducted to compare the proportion of tumor-infiltrating immune cells. Chi-square tests were used to analyze the differences in the proportions of clinical characteristics. PCR data were analyzed with an independent sample t-test using the GraphPad Prism 8.0 software. Statistical significance was defined as p < 0.05.

Identification of the Prognostic CRLR Signature in ccRCC
According to TCGA and previously published literature, 280 CRLR genes were identified (252 up-regulated and 28 down-regulated). Together with a multivariate Cox analysis, a univariate Cox analysis revealed 66 significant increases in CRLRs (Figure 1). The expression of three LncRNAs (AC026401.3, FOXD2-AS1, and LASTR) was found to be independent of prognosis in ccRCC by LASSO (Figure 2A,B). Figure 2C shows the network diagram for CRGs and CRLRs. Figure 2D shows the correlation heatmap of CRLRs and CRGs. Molecular correlation of the three CRLRs in TCGA-KIRC showed a significantly positive correlation ( Figure 3A-C), whereas the KM prognostic curve ( Figure 3D-F), overall survival event ( Figure 3G), progression-free interval ( Figure 3H), disease-specific survival ( Figure 3I), and TMN stage ( Figure 3J-L) Figure 4B). Figure 4C shows the associations between CRLRs and CRGs. Based on the Oncomine database, we found that the expression of real hub genes was significantly elevated in renal carcinoma compared with normal tissues. Moreover, immunohistochemistry staining obtained from The Human Protein Atlas database also demonstrated the de-regulation of real hub gene expression ( Figure 4D-I).

Identification of the Prognostic CRLR Signature in ccRCC
According to TCGA and previously published literature, 280 CRLR genes were identified (252 up-regulated and 28 down-regulated). Together with a multivariate Cox analysis, a univariate Cox analysis revealed 66 significant increases in CRLRs (Figure 1). The expression of three LncRNAs (AC026401.3, FOXD2-AS1, and LASTR) was found to be independent of prognosis in ccRCC by LASSO (Figure 2A,B). Figure 2C shows the network diagram for CRGs and CRLRs. Figure 2D shows the correlation heatmap of CRLRs and CRGs. Molecular correlation of the three CRLRs in TCGA-KIRC showed a significantly positive correlation ( Figure Figure 4B). Figure 4C shows the associations between CRLRs and CRGs. Based on the Oncomine database, we found that the expression of real hub genes was significantly elevated in renal carcinoma compared with normal tissues. Moreover, immunohistochemistry staining obtained from The Human Protein Atlas database also demonstrated the de-regulation of real hub gene expression ( Figure 4D-I).

Construction of the Hybrid Nomogram and GO Analysis
A GO enrichment analysis revealed the involvement of many immune-related biological processes ( Figure 5A). The novel CRLRs were involved in the production of immune-response molecular mediators, the defense response to bacteria, the humoral immune response, the immunoglobulin complex, the external side of plasma membrane antigen binding, and receptor-ligand activity. In Figure 5B, we present the correlations between CRLR features and clinical variables. Figure 6A illustrates the distribution of risk grades between the low-and high-risk groups. The survival statistics and survival times of the patients in the two risk groups are shown in Figure 6B. For each patient, Figure 6C shows the relative expression standards for the three CRLRs. Figure 6D shows that the low-risk group's OS was greater than that of the high-risk group (p < 0.001). Using a uniform formula for every patient in the test set and for the entire data set, we calculated the risk scores for this established model in order to test its prognostic capabilities. Figure 7A-D illustrate the risk grades, survival times, survival status, and CRLR expression within the testing set and the entire KIRC sample ( Figure 7E-H). Three CRLRs' prognostic signatures and independent factors were combined to construct a hybrid nomogram for ccRCC ( Figure 8A). The area under the curve (AUC) of OS was predictive for 1 year (0.741), 3 years (0.68), and 5 years (0.70) (see Figure 8B). Figure 8C shows the calibration plot of the nomogram. Additionally, the CRLR signature outperformed traditional clinical variables in predicting ccRCC patients ( Figure 8D). As depicted in Figure 8E, the concordance index showed that the risk model performed better than the other clinical factors.

Construction of the Hybrid Nomogram and GO Analysis
A GO enrichment analysis revealed the involvement of many immune-related biological processes ( Figure 5A). The novel CRLRs were involved in the production of immune-response molecular mediators, the defense response to bacteria, the humoral immune response, the immunoglobulin complex, the external side of plasma membrane antigen binding, and receptor-ligand activity. In Figure 5B, we present the correlations between CRLR features and clinical variables. Figure 6A illustrates the distribution of risk grades between the low-and high-risk groups. The survival statistics and survival times of the patients in the two risk groups are shown in Figure 6B. For each patient, Figure 6C shows the relative expression standards for the three CRLRs. Figure 6D shows that the low-risk group's OS was greater than that of the high-risk group (p < 0.001). Using a uniform formula for every patient in the test set and for the entire data set, we calculated the risk scores for this established model in order to test its prognostic capabilities. Figure 7A-D illustrate the risk grades, survival times, survival status, and CRLR expression within the testing set and the entire KIRC sample ( Figure 7E-H). Three CRLRs' prognostic signatures and independent factors were combined to construct a hybrid nomogram for ccRCC ( Figure 8A). The area under the curve (AUC) of OS was predictive for 1 year (0.741), 3 years (0.68), and 5 years (0.70) (see Figure 8B). Figure 8C shows the calibration plot of the nomogram. Additionally, the CRLR signature outperformed traditional clinical variables in predicting ccRCC patients ( Figure 8D). As depicted in Figure 8E, the concordance index showed that the risk model performed better than the other clinical factors.       Figure 9D). The three CRLR models can be seen to be excellent tools for distinguishing high-from low-risk patients with ccRCC. As shown in Figure 10A-N, by stratifying patients by tumor-node metastasis stage, sex, age, and grade, the K-M curve indicated that low-risk patients had a significantly better OS than high-risk patients (p < 0.001).

TIDE Algorithm and IC50 for Assessing Therapeutic Response
A prophetic algorithm was used to assess potential drug targeting for ccRCC using the three CRLR models. Low-risk participants were more sensitive to the 12 compounds, with significant differences based on the estimated IC50 values. The 12 different compounds in Figure 11A-L can thus be used to further analyze patients with ccRCC. The low-risk group was more sensitive to nine target drugs (A.770041, AG.014699, AMG.706, ATRA, AP.24534, axitinib, AZ628, ABT.888, and A.443654), based on the estimated IC50, and the high-risk group was more sensitive to ABT.263 and AKT inhibitors VIII and AS601245.     Figure 9D). The three CRLR models can be seen to be excellent tools for distinguishing high-from low-risk patients with ccRCC. As shown in Figure 10A-N, by stratifying patients by tumor-node metastasis stage, sex, age, and grade, the K-M curve indicated that low-risk patients had a significantly better OS than high-risk patients (p < 0.001).  In TCGA-KIRC, we estimated the immunotherapy response based on the CRLR model. High-risk patients responded better to immunotherapy than low-risk patients, indicating that the cuproptosis-based classifier index may be useful in predicting tumor immune dysfunction and exclusion ( Figure 12A). As shown in Figure 12B, the high-and lowrisk groups expressed immune indicators differently. Maftools was used to analyze and summarize the mutation data; mutations were categorized according to the variant effect predictor. Figure 12C,D show the top 20 driver genes that were altered most frequently between the high-and low-risk sub-groups, and TMB scores were calculated based on the TGCA somatic mutation data. There was no difference between the high-and lowrisk groups, indicating that the CRLR classifier index did not correlate well with TMB ( Figure 12E). High TMB in KIRC was associated with a lower survival rate in high-risk patients ( Figure 12F). Therefore, the results show that the CRLR model may be more predictive than the TMB status.

Validation of CRLRs by qPCR and ICGC Database
We evaluated the expression levels of the three core CRLRs using qPCR. The results showed that, compared with that in the proximal tubular cell line HK-2, the AC026401.3 level was significantly higher in the ccRCC cell lines (p < 0.01) UO31 and Caki-1. However, there was no significant change in the 786-O and SN12C cells (p > 0.05). The FOXD2-AS1 level was significantly higher in the UO31 and Caki-1-cell lines (p < 0.01), but again, there was no significant change in the 786-O and SN12C cells (p > 0.05). In particular, the LASTR level was very high in all four ccRCC cell lines (p < 0.01), ranging from 3.6 to 24.5 times that in the control cells. We further evaluated the expression levels of the CRLRs using the ICGC (RECA-EU) cohort. The expression of FOXD2-AS1 and LASTR was consistent with the PCR and TCGA-KIRC results ( Figure 13).

TIDE Algorithm and IC50 for Assessing Therapeutic Response
A prophetic algorithm was used to assess potential drug targeting for ccRCC using the three CRLR models. Low-risk participants were more sensitive to the 12 compounds, with significant differences based on the estimated IC50 values. The 12 different compounds in Figure 11A-L can thus be used to further analyze patients with ccRCC. The low-risk group was more sensitive to nine target drugs (A.770041, AG.014699, AMG.706,  In TCGA-KIRC, we estimated the immunotherapy response based on the CRLR model. High-risk patients responded better to immunotherapy than low-risk patients, indicating that the cuproptosis-based classifier index may be useful in predicting tumor immune dysfunction and exclusion ( Figure 12A). As shown in Figure 12B, the high-and low-risk groups expressed immune indicators differently. Maftools was used to analyze and summarize the mutation data; mutations were categorized according to the variant effect predictor. Figure 12C,D show the top 20 driver genes that were altered most frequently between the high-and low-risk sub-groups, and TMB scores were calculated based on the TGCA somatic mutation data. There was no difference between the highand low-risk groups, indicating that the CRLR classifier index did not correlate well with TMB ( Figure 12E). High TMB in KIRC was associated with a lower survival rate in high-risk patients ( Figure 12F). Therefore, the results show that the CRLR model may be more predictive than the TMB status.

Validation of CRLRs by qPCR and ICGC Database
We evaluated the expression levels of the three core CRLRs using qPCR. The results showed that, compared with that in the proximal tubular cell line HK-2, the AC026401.3 level was significantly higher in the ccRCC cell lines (p < 0.01) UO31 and Caki-1. However, there was no significant change in the 786-O and SN12C cells (p > 0.05). The FOXD2-AS1 level was significantly higher in the UO31 and Caki-1-cell lines (p < 0.01), but

Discussion
With the rapid popularization of artificial intelligence (AI) technology, it has shown strong application prospects in medicine. As such, artificial intelligence is gradually leading a revolution in the medical field. For example, in recent years, artificial intelligence has also garnered considerable interest in the field of tumor data processing [15,16]. The latest developments in biological sequencing technologies provide opportu-

Discussion
With the rapid popularization of artificial intelligence (AI) technology, it has shown strong application prospects in medicine. As such, artificial intelligence is gradually leading a revolution in the medical field. For example, in recent years, artificial intelligence has also garnered considerable interest in the field of tumor data processing [15,16]. The latest developments in biological sequencing technologies provide opportunities for a large amount of data mining in cancer research. However, due to the large amount of clinical data, it is difficult to carry out tumor research using traditional statistical analysis methods. How to use these clinical data to better carry out tumor research is a current focus of scientific research. Artificial intelligence technology based on machine learning technology allows for the extraction of data features from massive quantities of data and more accurate construction of risk stratification models for tumor patients, thereby assisting physicians in clinical decision making. Therefore, with the help of machine learning technology, the collection and mining of the available tumor data to find internal connections and rules has brought unprecedented opportunities to tumor research and diagnosis [9][10][11][12]17,18].
Machine learning is an interdisciplinary subject, especially in the statistical analysis of clinical medicine, which has become a research hotspot. Researchers think LASSO is a branch of regression analysis in machine learning. Of course, the application of the LASSO algorithm would be more appropriately described as a feature screening algorithm in machine learning. After screening variables, we retained the three most meaningful LncRNAs to build a prediction model. The expression of the three lncRNAs related to cuproptosis was verified in the TCGA, ICGC, and several tumor cell lines, consistent with the model prediction. We confirmed the expression of the three LncRNA indicators in tumor cells in the model, which was compatible with the expression in the database. According to the ROC curves for ccRCC in the TCGA database for 1 year, 3 years, and 5 years, the model's prediction accuracy was 0.74, 0.68, and 0.70, respectively. In addition, compared with the traditional tumor risk assessment indicators, such as the ROC curve for TNM stage and age prediction, the risk model in this study had the best predictive ability (AUC = 0.741).
LASTR is an lncRNA associated with the regulation of splicing by SART3. It plays an essential role in regulating plant metabolisms. However, research has demonstrated that LASTR modulates the activity of the U4/U6 recycling factor SART3 to boost cancer fitness [30], and also modulates the activity of the miR-137/TGFA/PI3K/AKT axis to accelerate lung cancer progression [31]. However, no study has reported a direct correlation between LASTR and immunity. Interestingly, LASTR has also been used as a ferroptosisrelated marker in stomach adenocarcinoma [32]. Therefore, there are some connections between cuproptosis and ferroptosis, and LASTR could be involved in the cross-talk between them, thus playing an essential role in the immunological therapy of cuproptosis and ferroptosis.
According to previous studies, AC026401.3 is a glycolysis-based lncRNA predictor for prognosis in kidney [33] and liver cancer [34]. AC026401.3 regulates the immune response, as tumor-cell-induced glucose deprivation inhibits T-cell glycolysis and immunogenic functions [35]. Although we confirmed that AC026401.3 expression was elevated in ccRCC cell lines, the function of AC026401.3 is far from known at present.
Using the three CRLR models, potential drug targets for ccRCC were determined using the TIDE algorithm, and it was shown that low-risk patients were more sensitive to the 12 target drugs based on IC50 estimates. However, the results from the model of the tumor immune microenvironment and immunotherapy response showed that immunotherapy was more likely to work for high-risk patients than low-risk patients, suggesting that the cuproptosis-based classifier index could be used to predict immune response. Using somatic mutation data from TGCA, we calculated TMB scores. The low-risk group did not surpass the high-risk group, suggesting poor correlation with the CRLR classifier index. In ccRCC, a high TMB score with high risk was associated with a worse outcome and could be used as a prognostic marker. Thus, these findings demonstrate that the CRLR model has a higher prognostic value than TMB status. Therefore, AC026401.3, FOXD2−AS1, and LASTR might be useful indicators for investigating different drug treatments with different TMN stages and mutation load burdens, providing a foundation for the precise treatment of ccRCC. It should be emphasized that the specific functions and mechanisms of these three molecules (i.e., FOXD2−AS1, AC026401.3, and LASTR) in renal cancer still require further experimental studies.
According to the three CRLR models, the low-risk group was more sensitive to nine target drugs (A.770041, AG.014699, AMG.706, ATRA, AP.24534, axitinib, AZ628, ABT.888, and A.443654) based on the estimated IC50, whereas the high-risk group was more sensitive to ABT.263 and AKT inhibitors VIII and AS601245. The results showed that ccRCC patients with different expression levels of the three CRLRs presented different sensitivities to different target drugs, reflecting individual differences and tumor heterogeneity in kidney cancer patients. Considering the heterogeneity in immune response, it is suggested that targeted therapy combined with immunotherapy can provide precision treatment in ccRCC patients, based on different levels of the LncRNAs FOXD2−AS1 and AC026401.3. Thus, this study can provide a reference paradigm for various tumors, facilitating the mining of cuproptosis-related LncRNAs based on TCGA data. We must point out that although AC026401.3, FOXD2−AS1, and LASTR were confirmed to be highly expressed in ccRCC cell lines, most of the results in this study were only based on our CRLR models. Hence, more experimental studies are needed, especially regarding how the three CRLRs regulate cuproptosis-related genes. It is important to determine how the signal transduction pathways are regulated by AC026401.3, FOXD2−AS1, and LASTR, as well as the relationships between CRLRs and apoptosis, necroptosis, pyroptosis, ferroptosis, and autophagy. In addition, we investigated whether there were synergistic effects between each of the 12 targeted drugs and immunotherapy drugs. Moreover, the roles of these three CRLRs in the diagnosis, treatment, and prognosis of ccRCC need to be studied in multiple clinical centers and with a large number of samples.
There are also some limitations to this study. First of all, all the analyses are based on data from public databases, and all the samples used in this study were retrospectively obtained. Therefore, the inherent bias in case selection may have affected the outcome. It is necessary to conduct large-scale prospective studies as well as additional experiments in vitro and in vivo in order to confirm our findings. Moreover, many datasets did not include key clinical variables such as surgery, radiotherapy, and neoadjuvant chemotherapy, which may have affected accuracy.

Conclusions
We identified cuproptosis-related LncRNAs using a machine learning approach and investigated their potential value in ccRCC immunotherapy. We verified 280 CRLRs and identified 66 significant increments associated with CRLRs, according to multiple analysis models. The enrichment results demonstrated that CRLRs are involved in the production of immune responses molecular mediators, the defense responses to bacteria, the humoral immune response, the immunoglobulin complex, and receptor-ligand activity. Subsequently, we revealed that three different LncRNAs (FOXD2−AS1, AC026401.3, and LASTR) are prognostic predictors in TCGA-KIRC. The three molecules were further validated via qPCR and were found to be over-expressed in ccRCC cell lines. The ICGC includes 89 projects in 17 administrative regions of Asia, Australia, Europe, North America, and South America, including 25,000 cancer genomes. FOXD2−AS1 and LASTR were validated in the ICGC (RECA-EU) cohort. The results of the above two verification studies