Adjusted CT Image-Based Radiomic Features Combined with Immune Genomic Expression Achieve Accurate Prognostic Classification and Identification of Therapeutic Targets in Stage III Colorectal Cancer

Simple Summary Using the covariate-adjusted tensor classification in the high-dimension (CATCH) model, we integrated adjusted radiomics-based CT images into RNA immune genomic expression data to achieve the accurate classification of recurrent CRC. The correlation between radiomic features and immune gene expression identifies potential therapeutic targets in CRC. We provide individualized cancer therapeutic strategies based on adjusted radiomic features in recurrent stage III CRC. Abstract To evaluate whether adjusted computed tomography (CT) scan image-based radiomics combined with immune genomic expression can achieve accurate stratification of cancer recurrence and identify potential therapeutic targets in stage III colorectal cancer (CRC), this cohort study enrolled 71 patients with postoperative stage III CRC. Based on preoperative CT scans, radiomic features were extracted and selected to build pixel image data using covariate-adjusted tensor classification in the high-dimension (CATCH) model. The differentially expressed RNA genes, as radiomic covariates, were identified by cancer recurrence. Predictive models were built using the pixel image and immune genomic expression factors, and the area under the curve (AUC) and F1 score were used to evaluate their performance. Significantly adjusted radiomic features were selected to predict recurrence. The association between the significantly adjusted radiomic features and immune gene expression was also investigated. Overall, 1037 radiomic features were converted into 33 × 32-pixel image data. Thirty differentially expressed genes were identified. We performed 100 iterations of 3-fold cross-validation to evaluate the performance of the CATCH model, which showed a high sensitivity of 0.66 and an F1 score of 0.69. The area under the curve (AUC) was 0.56. Overall, ten adjusted radiomic features were significantly associated with cancer recurrence in the CATCH model. All of these methods are texture-associated radiomics. Compared with non-adjusted radiomics, 7 out of 10 adjusted radiomic features influenced recurrence-free survival. The adjusted radiomic features were positively associated with PECAM1, PRDM1, AIF1, IL10, ISG20, and TLR8 expression. We provide individualized cancer therapeutic strategies based on adjusted radiomic features in recurrent stage III CRC. Adjusted CT scan image-based radiomics with immune genomic expression covariates using the CATCH model can efficiently predict cancer recurrence. The correlation between adjusted radiomic features and immune genomic expression can provide biological relevance and individualized therapeutic targets.


Introduction
Colorectal cancer (CRC) is a common malignancy that results in significant morbidity and mortality. Abdominal computed tomography (CT) scans of the primary tumor are valuable in planning surgery for patients with stage II-III CRC because they can demonstrate the regional extension of the tumor, adenopathy, and distant metastases. However, the role of non-invasive CT scan imaging with respect to the tumor microenvironment (TME) remains unclear. CT scan-based radiomics can be used to extract high-dimensional imaging radiomic features. Radiomics has shown great potential as an excellent method for predicting recurrence in various types of cancer [1]. Some studies have applied CT-based radiomic features for the clinical evaluation, such as staging, recurrence, or lymph node metastasis prediction of patients with CRC [2][3][4][5]. A previous study calculated the clinical factors and radiomic scores to predict the recurrence risk in patients with stage II CRC [1].
In contrast to traditional imaging features, radiomics has been proposed to reveal the characteristics of the TME and genetic features. The TME is heterogeneous and consists of tumor, stromal, and immune cells. Tumor cell types and their environments affect cancer growth and metastasis. Almost every immune cell-containing cancer can be imaged using computed tomography (CT). Tumor image-derived texture features are associated with the cancer immune cell infiltration status [6]. Several potential molecular and TMEbased immune predictors of recurrence risk and immunotherapy response have been recently investigated [7]. RNA sequencing methods can process a mixture of immune cells, averaging out the underlying differences in immune cell type-specific transcriptomes. RNA gene expressions of multiple tumor tissue immune cells that affect radiomic parameters are also present in imaging and radiomic features.
However, few studies have considered the covariates of texture radiomic features and integrated TME-based immune RNA expression into radiomic signatures to predict colon cancer prognosis. Another major challenge in studies of cancer recurrence prediction by radiomic features is that they are often composed of a relatively small number of patient samples and a large number of radiomic features. These types of data present a problem of high dimensionality. A good strategy is to reduce the number of dimensions using feature selection. This study aimed to evaluate the usefulness of adjusted CT scan image-based radiomics combined with immune genomic expression for the accurate stratification of cancer recurrence and identification of potential therapeutic targets in stage II-III CRC. We hypothesized that RNA gene expression affects radiomic features through the tumor microenvironment. Towards this goal, we used the covariate-adjusted tensor classification in the high-dimensional (CATCH) algorithm.
The main aim of the CATCH algorithm is to construct an interpretable discriminant analysis model for achieving variable selection and prediction consistency, even when the number of interesting variables is much larger than the sample size. Taking advantage of the CATCH algorithm, RNA expression was merged as a covariate into the radiomicsbased model, offering a more comprehensive prediction of tumor recurrence. Based on the successful application of radiomics analyses in precision oncology, we constructed a relationship between RNA expression and radiomics to provide an accessible method for identifying potential therapeutic targets.

Patient Selection
This cohort study initially enrolled 99 patients with stage II-III CRC who underwent surgery, followed by adjuvant chemotherapy with leucovorin (folinic acid), fluorouracil, and oxaliplatin (FOLFOX), at the National Cheng Kung University Hospital (NCKUH) between January 2015 and January 2017. Eligible cancer patients were aged >= 20 years, as well as having an Eastern Cooperative Oncology Group performance status (ECOG PS) of 0-1, and adequate organ function. Follow-up continued through to January 2019. Primary tumor tissues were collected from all subjects for RNA immune response genes and CT scans for research purposes. To study the impact of immune response-associated gene expression and radiomics on recurrence, tumor samples from 99 high-risk patients were collected. After quality control, samples from 71 patients were retained for further analysis. Among the 71 patients, 21 patients (29.5%) had tumor recurrence. The tumor recurrence was defined as any tumor-related lesion, including local/regional or distant metastasis, first detected after the curative operation.
This study was approved by the Institutional Review Board of NCKUH (A-ER-103-395, A-ER-104-153, and B-ER-109-154) and was conducted according to the tenets of the Helsinki Declaration. All participants provided written informed consent.

Image Acquisition and Imaging Texture Analysis
Seventy-one consecutive CRC patients underwent pre-treatment abdominal/pelvic CT scans with and without intravenous (IV) contrast enhancement with an injected volume of 80 mL (iohexol, 350 mgI/ML or iopamide, 370 mgI/mL) scanned at the portal-todelayed phase. All patients were studied, including 28 patients using 16-row CT scans (Sensation 16, Siemens Medical Solution, and BrightSpeed Series CT systems, General Electric (GE) Healthcare, Milwaukee, WI, USA), 15 patients using 64-row CT scans (Optima CT660 and LightSpeed, GE, and Sensation 64 and SOMATOM Definition AS, Siemens), and 28 patients using 128-row C T scans (SOMATOM Definition FLASH, Siemens Healthineers, Forchheim, Germany). For tumor segmentation, all portal venous phase CT images in the Digital Imaging and Communications in Medicine (DICOM) format were retrieved from the picture archiving and communication system (PACS) at NCKUH. The volume of interest (VOI) of the target tumors in serial slices was manually labeled by two senior boardcertified radiologists using a self-developed image-labeling tool running on "INFINITE" PACS 3.0. The DICOM images were saved as the Neuroimaging Informatics Technology Initiative file type, and the mask using polygonal annotation was saved as the file format of nearly raw raster data. PyRadiomics uses SimpleITK for image loading and handling. Features were calculated by several built-in filters. These included wavelet and Laplacian of Gaussian (LoG) with sigma filters. In total, 1037 radiomic features were selected, and are shown in Table S1. To test the reproducibility of the image features, we randomly selected 30 patients with tumor labeling performed by two radiologists who were both blinded to the clinicopathological and outcome details. We assessed the radiologist's reproducibility of double segmentation in CT scans. The median value of intra-classcorrelation coefficients (ICCs) is 0.93. For the accuracy of the tumor labeling, we selected the radiomic features based on the two radiologists' consensus. Quantitative imaging features were subsequently extracted from previously identified VOIs. The definitions of the radiomic features were derived from the PyRadiomics library (version 3.0). The PyRadiomics community maintains copyright for the definitions mentioned (http://github. com/radiomics/pyradiomics, accessed on 22 December 2021).

Tumor Microenvironment-Based RNA Immune Response Gene Sequencing
Cancer tissues with immune response gene expression profile data were obtained from 71 CRC patients. RNA was extracted from formalin-fixed paraffin-embedded tissue using the RecoverAll Total Nucleic Acid Isolation Kit (Thermo Fisher Scientific). The 398 RNA immune response genes were constructed into libraries using the Ion AmpliSeq Kit for Chef DL8 with the Ion Chef System. Raw gene expression data were preprocessed using Torrent Suite, followed by further normalization.

Statistical Analysis for Clinical Data
The chi-square test and Fisher's exact test were used to assess the differences between the tumor recurrence and non-tumor recurrence groups. The tumor recurrence was defined as any tumor-related lesion, including local/regional or distant metastasis, first detected after the curative operation. Kaplan-Meier curves were used to evaluate recurrence-free survival (RFS), which was defined as the time between surgery and cancer recurrence. A p-value < 0.05 was considered statistically significant.

CATCH Model
To integrate radiomic feature information and immune response-associated gene expression profile data, the CATCH model proposed by Pan et al. [9] was used to predict the recurrence in CRC patients. The CATCH model is based on Bayes' rule and is defined as follows: whereŶ is the predictor for categorical variable Y with two levels (1 for recurrent patients and 2 for non-recurrent patients), X represents the 33 × 32-pixel image data, and U represents the immune response-associated gene expression profile data. The parameters {γ k , α, B k } in Equation [9] are useful for clinical judgment. The coefficient γ k represents the direct effect of the immune response-associated gene expression profile (U) on recurrence (Y). The coefficient α represents the relationship between the radiomic features (X) and the immune response-associated gene expression profile. The coefficient B k represents the effect of X after adjusting for the covariate U, and X adj = X − α (M+1) ×U represents the adjusted radiomic features. The diagram of the relationships between the coefficients {γ k , α, B k }, which are critical and can guide the clinician to interpret the results obtained from the CATCH model, is shown in Figure 1. We described the method for using the coefficient B k and adjusted radiomic features X adj to build an individualized cancer therapeutic strategy. the coefficient and adjusted radiomic features to build an individualized cancer therapeutic strategy. represents the radiomics parameters indicative of the indirect effect of immune gene expression on cancer recurrence.
represents the direct effect of radiomic features on recurrence in the tensor discriminant analysis.

Patient Characteristics
To predict cancer recurrence based on radiomic features, we adjusted the covariates of the TME-based RNA gene expression. First, differentially expressed genes (DEGs) that affected cancer recurrence ( ) were identified [10]. Second, we demonstrated that the performance of the CATCH model based on radiomic features and RNA expression ( ) affects clinical outcomes. Third, a correlation between radiomic features and RNA gene expression ( ) was established to identify a potential therapeutic target ( Figure 1). The results show that this model can successfully predict cancer recurrence. The baseline characteristics of the patients are presented in Table S3. In total, 49.3% were men, and the median patient age was 58 years. The distribution of gender was almost the same between patients with and without cancer recurrence (Table S3). Overall, 71.8% of CRC patients were aged <65 years. The primary tumors were most commonly located in the left colon (78.9%). Most patients had a high tumor invasive stage (T3-T4) (87.3%) and low tumor nodal stage (N0-N1) (70.4%). There was no significant difference in clinical characteristics between patients with and without cancer recurrence (Table S3). In the genetic features of colorectal cancers, there was no significant difference in mismatch repair (MMR), KRAS, and BRAF status between recurrence and no-recurrence groups. No prognostic risk factors were identified in clinicopathological and genetic features. In our dataset, the percentage of local/regional and distal recurrence was 23.8% and 76.2%, respectively. Among the 21 patients with tumor recurrence, one (4.8%) patient had local recurrence, and four (19.0%) patients had regional recurrences. Sixteen (76.2%) patients had distant metastases.

Patient Characteristics
To predict cancer recurrence based on radiomic features, we adjusted the covariates of the TME-based RNA gene expression. First, differentially expressed genes (DEGs) that affected cancer recurrence (γ k ) were identified [10]. Second, we demonstrated that the performance of the CATCH model based on radiomic features and RNA expression (B k ) affects clinical outcomes. Third, a correlation between radiomic features and RNA gene expression (α) was established to identify a potential therapeutic target ( Figure 1). The results show that this model can successfully predict cancer recurrence. The baseline characteristics of the patients are presented in Table S3. In total, 49.3% were men, and the median patient age was 58 years. The distribution of gender was almost the same between patients with and without cancer recurrence (Table S3). Overall, 71.8% of CRC patients were aged <65 years. The primary tumors were most commonly located in the left colon (78.9%). Most patients had a high tumor invasive stage (T3-T4) (87.3%) and low tumor nodal stage (N0-N1) (70.4%). There was no significant difference in clinical characteristics between patients with and without cancer recurrence (Table S3). In the genetic features of colorectal cancers, there was no significant difference in mismatch repair (MMR), KRAS, and BRAF status between recurrence and no-recurrence groups. No prognostic risk factors were identified in clinicopathological and genetic features. In our dataset, the percentage of local/regional and distal recurrence was 23.8% and 76.2%, respectively. Among the 21 patients with tumor recurrence, one (4.8%) patient had local recurrence, and four (19.0%) patients had regional recurrences. Sixteen (76.2%) patients had distant metastases. Seven (33.3%) patients had lung-only metastasis. Three (14.3%) patients had liver-only metastasis (Table S4).

Identification of Genes Influencing Recurrence and Performance of the CATCH Model
To determine the correlation between RNA immune gene expression and clinical outcome, 30 significant DEGs were selected from 398 RNA genes. A differential gene expression analysis was performed using the DESeq2 R package [11] to observe the dif- ference in the immune expression in colorectal cancer patients with and without tumor recurrence. As a result, 30 differentially expressed genes (DEGs) were identified ( Figure S1). The expression differences of these RNAs in different samples are displayed in a heatmap in Figure S1. The 30 DEGs were incorporated as covariates into the CATCH model. Data were divided into the training set (67%) and the test set (33%), with the training set maintaining an original disease recurrence rate of 0.29. Performance measures were calculated using a testing set with 100 iterations. The performance of the proposed method was then compared with that of random forest (RF) and linear discriminant analysis (LDA), the most common algorithms in machine learning for classification. Table 1 shows the performance of the RF, LDA, and CATCH models ( Table 1). The average AUCs for the CATCH, RF, and LDA models were 0.56, 0.46, and 0.46, respectively. The CATCH model had a high sensitivity of 0.66 and an F1 of 0.69 in the testing set. These results indicate that the CATCH model can integrate transformed radiomics-based CT images into RNA immune genomic expression data to achieve an accurate classification of recurrent CRC.

Adjusted Radiomic Features Obtained from CATCH Model for Cancer Recurrence
The ten most significant adjusted radiomic features were selected based on the variable selection algorithm in the CATCH model to investigate the usefulness for predicting cancer recurrence (Table S5 and Figure 2). The adjusted radiomic features could clearly distinguish recurrent CRC from nonrecurrent CRC. The range of the coefficient B k was from 6.57 to −5.71 (Table 2). If the coefficient value was positive, greater significance of the variable was associated with the probability of CRC recurrence. If the absolute value of the coefficient was higher, its influence on recurrence was more remarkable. There were distinct radiomic profiles of the recurrent cancer patterns. Recurrence was positively correlated with wavelet.LHH_glcm_Idmn and wavelet.LHH_glcm_Idn. LHH_glcm_Idmn, which measures Idm (inverse difference moment or Homogeneity 2), measures the local homogeneity of an image. Idmn, local homogeneity, is associated with cancer recurrence. LHH_glcm_Idn, which measures the inverse normalized difference (Idn), is another measure of the local homogeneity of an image. Idn normalizes the difference between neighboring intensity values by dividing by the total number of discrete intensity values. Idn, local homogeneity, is associated with cancer recurrence.  LGLZE, such as tumor necrosis or mucinous lesions in CT scan images, is associated with cancer nonrecurrence. LHH_ngtdm_Contrast measures the spatial intensity change but depends on the dynamic range of gray. In this study, the contrast was high when both the dynamic range and spatial change rate were high. Contrast imaging is associated with cancer nonrecurrence (Table S6). These findings may provide significant adjusted radiomic features for predicting recurrence in stage III CRC.

Adjusted Radiomic Features Impact Clinical Outcome
After comparison of the clinical outcomes of adjusted and non-adjusted radiomic features, only two non-adjusted radiomic features, namely LHL_glcm_InverseVariance (IV) and HHH_gldm_DependenceVariance (DV), were correlated with cancer recurrence ( Figure S2). After adjusting radiomic features using the CATCH model, we obtained 10 significantly adjusted radiomic features correlated with cancer recurrence. Boxplot and Kaplan-Meier survival curves comparing the risk of cancer recurrence and recurrence- LGLZE, such as tumor necrosis or mucinous lesions in CT scan images, is associated with cancer nonrecurrence. LHH_ngtdm_Contrast measures the spatial intensity change but depends on the dynamic range of gray. In this study, the contrast was high when both the dynamic range and spatial change rate were high. Contrast imaging is associated with cancer nonrecurrence (Table S6). These findings may provide significant adjusted radiomic features for predicting recurrence in stage III CRC.

Adjusted Radiomic Features Impact Clinical Outcome
After comparison of the clinical outcomes of adjusted and non-adjusted radiomic features, only two non-adjusted radiomic features, namely LHL_glcm_InverseVariance (IV) and HHH_gldm_DependenceVariance (DV), were correlated with cancer recurrence ( Figure S2). After adjusting radiomic features using the CATCH model, we obtained 10 significantly adjusted radiomic features correlated with cancer recurrence. Boxplot and Kaplan-Meier survival curves comparing the risk of cancer recurrence and recurrencefree survival (RFS) among patients with adjusted radiomic features or without radiomic features are shown in Figure S2. Figure 3A (non-adjusted, p-value = 0.123) and 3B (adjusted, p-value < 0.001) show the boxplot of wavelet LHH_glcm_Idmn. Figure 3C (nonadjusted, p-value = 0.299) and Figure 3D (adjusted, p-value = 0.001) show the boxplot of wavelet.LHH_glszm_LGLZE. free survival (RFS) among patients with adjusted radiomic features or without radiomic features are shown in Figure S2. Figure 3A (non-adjusted, p-value = 0.123) and 3B (adjusted, p-value < 0.001) show the boxplot of wavelet LHH_glcm_Idmn. Figure 3C (nonadjusted, p-value = 0.299) and Figure 3D (adjusted, p-value = 0.001) show the boxplot of wavelet.LHH_glszm_LGLZE.  The blue curve represents overall population. The green curve represents the patients with radiomic data above the median. The red curve represents the patients with radiomic data below the median.
In addition to recurrence prediction, the survival impact of the adjusted radiomic features was compared with that of non-adjusted radiomic features. After adjusting for radiomic features using the CATCH model, seven significantly adjusted radiomic features ( Figure S2) that correlated with cancer RFS were obtained. The non-adjusted LHH_glcm_Idmn (p-value = 0.201) ( Figure 3E) was not a significant prognostic factor for RFS. In contrast, the adjusted LHH_glcm_Idmn was a significant prognostic factor for RFS ( Figure 3F, p-value < 0.001). The non-adjusted LHH_glszm_LGLZE (p-value = 0.381) ( Figure 3G) was not a significant prognostic factor for RFS, while the adjusted LHH_glcm_Idmn was a significant prognostic factor ( Figure 3H, p-value < 0.001). These two adjusted radiomic features were associated with cancer recurrence and RFS. These results demonstrate the advantages of the CATCH model.

Correlation between Adjusted Radiomic Features and Immune Gene Expression
The correlation between adjusted radiomic features of CT image data and tumor microenvironment-based immune gene expression in cancer tissues was investigated to identify potential therapeutic targets in high-risk CRC stage III patients. As shown in Figure 4 and Table S7, the spectrum of RNA expression was identified in the 10 significantly adjusted radiomic features. The adjusted radiomic features were positively associated with the expression of PECAM1, PRDM1, AIF1, IL10, ISG20, and TLR8. For recurrence, the highest positive correlation was found between the adjusted radiomic features LHH_glcm_Idmn, LHH_glcm_Idn, and LLH_glcm_Idn and expressions of immune genes PECAM1, PRDM1, and AIF1. For non-recurrence, the highest positive correlation was found between the adjusted radiomic feature of LHH_glszm_LGLZE and LHH_ngtdm_Contrast and the expression of the immune genes IL10, ISG20, and TLR8. PECAM1 and PRDM1 immune gene expression was negatively related to the non-cancer recurrence-adjusted radiomic features of LHH_glszm_LGLZE and LHH_ngtdm_Contrast. These results indicate that potential therapeutic targets, such as PECAM1, PRDM1, and AIF1, can be identified by adjusting the radiomic features that impact cancer.    The first patient ( Figure 5A) was a 61-year-old woman with pathological stage II sided colon cancer at the initial diagnosis. The patient underwent standard surgical r tion followed by adjuvant chemotherapy with modified FOLFOX (mFOLFOX7) for 1 cles (Figure 5A,B). Multiple peritoneal metastases were detected on CT at 24 months operatively ( Figure 5C). The feature profile showed higher LHH_glcm_I LHH_glcm_Idn, LHL_glcm_IV, and HHH_gldm_DV values, indicating a higher r recurrence ( Figure 5D). These adjusted features correlated with PECAM1 and SNAI2 expression ( Figure 5M). Therefore, it may be a therapeutic target.
The second patient ( Figure 5E) was a 60-year-old woman with pathological sta sigmoid colon cancer at the time of the initial diagnosis. The patient underwent stan surgical resection followed by adjuvant chemotherapy with mFOLFOX7 for 12 cycles ure 5F). Recurrence of lymph nodes was detected by CT at 12 months postopera ( Figure 5G). The feature profile showed higher LHH_glcm_Idmn, LHH_glcm_Idn LHL_glcm_IV values, indicating a higher risk of recurrence ( Figure 5H). These adj features correlated with PECAM1 RNA expression.
The third patient ( Figure 5I) was a 76-year-old woman with pathological sta right-sided colon cancer at the time of initial diagnosis. The patient underwent stan surgical resection followed by adjuvant chemotherapy with mFOLFOX7 for 12 cycles ure 5I,J). A single lung metastasis was detected on CT at 12 months postoperatively ure 5K). They were not correlated with the adjusted radiomic features profile, indic a borderline risk of recurrence ( Figure 5L). The first patient ( Figure 5A) was a 61-year-old woman with pathological stage III leftsided colon cancer at the initial diagnosis. The patient underwent standard surgical resection followed by adjuvant chemotherapy with modified FOLFOX (mFOLFOX7) for 12 cycles ( Figure 5A,B). Multiple peritoneal metastases were detected on CT at 24 months postoperatively ( Figure 5C). The feature profile showed higher LHH_glcm_Idmn, LHH_glcm_Idn, LHL_glcm_IV, and HHH_gldm_DV values, indicating a higher risk of recurrence ( Figure 5D). These adjusted features correlated with PECAM1 and SNAI2 RNA expression ( Figure 5M). Therefore, it may be a therapeutic target.
The second patient ( Figure 5E) was a 60-year-old woman with pathological stage III sigmoid colon cancer at the time of the initial diagnosis. The patient underwent standard surgical resection followed by adjuvant chemotherapy with mFOLFOX7 for 12 cycles ( Figure 5F). Recurrence of lymph nodes was detected by CT at 12 months postoperatively ( Figure 5G). The feature profile showed higher LHH_glcm_Idmn, LHH_glcm_Idn, and LHL_glcm_IV values, indicating a higher risk of recurrence ( Figure 5H). These adjusted features correlated with PECAM1 RNA expression.
The third patient ( Figure 5I) was a 76-year-old woman with pathological stage III rightsided colon cancer at the time of initial diagnosis. The patient underwent standard surgical resection followed by adjuvant chemotherapy with mFOLFOX7 for 12 cycles (Figure 5I,J). A single lung metastasis was detected on CT at 12 months postoperatively ( Figure 5K). They were not correlated with the adjusted radiomic features profile, indicating a borderline risk of recurrence ( Figure 5L).

Discussion
CRC is an etiologically heterogeneous disease that involves several distinct biological pathways and CT scan presentations. This study used diagnostic CT images and gene expression as covariates in the CATCH model to predict recurrence in CRC patients. The results show that the model predicts recurrence by adjusting radiomic features and identifies potential therapeutic targets in CRC. Our results highlight the following important points. First, the CATCH model efficiently integrates high-dimensional radiomic features and covariates of immune gene expression to predict cancer recurrence in small datasets. Second, 10 textural associated adjusted radiomic features are selected for cancer recurrence, with 7 of these adjusted radiomic features being associated with RFS. Finally, we established a correlation between radiomic features and immune gene expression, providing biological relevance and individualized therapeutic targets for patients with recurrence.
CT-based radiomic signatures are potential biomarkers for predicting CRC recurrence [1]. Prognosis prediction based on informative DEGs also has higher predictive accuracy for CRC prognosis. However, few studies have attempted to analyze immune gene expression as a covariate for high-dimensional image information. Further, the sample is often limited to surveys collecting both CT scan image data and gene expression levels due to the high intrinsic cost of data collection involving human participants. In such conditions, the machine learning algorithms could have poor accuracy because the learning algorithm does not have enough data to learn from.
The CATCH model solves high-dimensional data and feature integration strategies from disparate sources and unbalanced datasets. Furthermore, it incorporates a featureselection algorithm into the classification model. Thus, the CATCH model can achieve acceptable accuracy with limited data and covariate adjustments. We used this advantage in the current study. To establish a model to manage high-dimensional image data, the 1037 radiomic feature data were converted into 33×32-pixel image data, representing a transformed image, using a feature extraction algorithm. Then, we integrated 30 DEGs and 33×32-pixel image data to accurately predict cancer recurrence. In the traditional LDA and RF methods, a high number of features, 1037 high-dimensional features plus 30 DEGs, and unbalanced datasets were a problem for selecting the essential features. These machine-learning models struggled to integrate data from disparate sources and unbalanced datasets. LDA and RF models also had poorer sensitivity. Because the sample size is small, these machine-learning models were utilized without tuning processes to prevent overfitting. Although the performance of these machine-learning methods could be improved, it cannot offer interpretable discriminant analysis results. The study goal was to utilize the CATCH model to predict the risk of cancer recurrence and provide therapeutic strategies in our colorectal cancer patients. Using radiomic biomarkers and RNA expression data, we have provided a useful clinical model that will help physicians to make better-informed decisions regarding the short-interval CT scan follow-ups and potential drug targets. Collectively, these results indicate that the CATCH model can efficiently integrate high-dimensional radiomic features and covariates to predict cancer recurrence in small datasets.
Immune gene expression involves both clinical outcomes and radiomic expression. Previous studies have explored radiomic signatures and specific gene expression. CT scan-based radiomic features have been found to be significantly associated with KRAS or BRAF mutations [12,13]. A study in France demonstrated an association between gene expression and radiomics; for example, ABCC2 expression was correlated with LGLZE and SZLGE [14]. In addition, radiomic features and gene expression of ABCC2 have been identified as prognostic factors for survival [14]. Our model adopted gene expression as an additional covariate to improve the predictive accuracy for clinical outcomes. The performance of the CATCH model showed that radiomic features adjusted by immune gene expression are good prognostic factors for cancer recurrence. No prognostic risk factors were identified in clinicopathological and genetic features (Table S3). Therefore, we did not apply the clinicopathological and genetic features for risk modeling. Of all 71 CRC patients, there was no significant difference in genetic profiling including MMR, KRAS, and BRAF status between patients with and without cancer recurrence. The median age of these patients was 58 years. We did not identify clinically important factors such as age or pathological stage affecting cancer recurrence by statistical analysis (Tables S3 and S4).
Tumor heterogeneity, which is closely reflected in imaging data, is an important indicator of tumor growth and metastasis. In this study, 2/10 non-adjusted radiomic features were associated with cancer recurrence. After adjusting the covariant from the TME-associated immune gene expression, 10 adjusted radiomic features associated were identified to be associated with recurrence. There were no radiomic features associated with first-order statistics, with most features related to textural features. In the radiomic prognostic vector of ovarian cancer, the authors discovered and validated prognostic imaging value. They associated these findings with stromal biological factors [15]. Similarly, our results also indicate that stromal heterogeneity of images plays a prognostic role in cancer recurrence. Seven adjusted radiomic features were related not only to recurrence risk, but also survival. Textural wavelet decomposition was found to affect the prognosis of stage III CRC.
By integrating clinical and radiomic features, a clinical radiomics-based model accurately predicted recurrence in patients with stage II CRC [5,14], supporting the idea that clinical and radiomic signatures can serve as markers for survival stratification. Our study balanced the clinical features of patients with recurrent and nonrecurrent stage III cancer. These clinicopathological risk factors, including tumor invasion stage and lymph node stage, lack accuracy to identify patients at high risk of recurrence. Multivariate survival analysis showed that the clinical factors were not significantly associated with survival. Several previous studies have attempted to develop prognostic tools based on the radiomics-only model. However, these prognostic models are challenging to apply in routine clinical practice because of the lack of association between image features and molecular biology. Our model provides further guidance for individualized treatment according to the associated immune gene expression for patients at high risk of recurrence. PECAM1, platelet endothelial cell adhesion molecule 1 (CD31), is involved in tumor angiogenesis and endothelial cell migration [16]. The protein encoded by the PECAM1 gene is an endothelial cell marker used to evaluate tumor microvessels and vascular density. Previous studies have demonstrated that a high tumor microvessel count and density predict CRC recurrence and overall survival [17][18][19]. CRC patients with high CD31 expression have poor prognosis [19]. Given that PECAM1 contributes to colorectal peritoneal metastasis, it may be a potential therapeutic target for CRC [20,21].
There are some limitations in our study. In a clinical setting, we often only have a small dataset to work with. For example, RNA sequencing is still expensive and timeconsuming for the sequencing process. It is not easy for cancer patients to have both CT scan and RNA sequencing datasets. Secondly, the result of AUC is rather low in our study. Our prediction model was designed to assist healthcare professionals and patients with decisions about the short interval CT scan surveillance in clinical practice. The clinical risk of misclassification is increased with the radiation exposure of CT scans in cancer patients. Following the National Comprehensive Cancer Network (NCCN) guidelines [22], the stage III CRC patients received standard 6-12 months CT scan follow-ups. We could improve the cancer outcome by the early diagnosis of resectable CRC with lung or liver metastasis based on short-interval CT scan follow-ups (e.g., < 6 months CT scan interval for early detection of cancer metastasis). Third, many algorithms, such as the Synthetic Minority Oversampling Technique (SMOTE) [23], could also improve the prediction accuracy of imbalanced data and small datasets. The combat harmonization method has been adapted to neuroimaging studies with data heterogeneity [24]. However, only the CATCH model could produce interpretable discriminate results that can be used to identify potential therapeutic targets. It is worth noting that the CATCH model is developed under the assumption of homogeneity in data. In the future, it will be interesting to develop a modified CATCH model for heterogeneity data.

Conclusions
Our CATCH model efficiently adjusted high-dimensional radiomic features with covariates of immune gene expression to predict cancer recurrence. The correlation between radiomic features and immune gene expression may be biologically relevant. The adjusted radiomic features associated with recurrence in our model provide a basis for individualized treatment in stage III CRC.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/cancers14081895/s1. Figure S1: Heatmap of significant differentially expressed genes (DEGs) and clinical outcome; Figure S2: Boxplot and Kaplan-Meier survival curves comparing the risk of cancer recurrence and recurrence-free survival (RFS) among patients with adjusted radiomic features or without radiomic features; Table S1: Wavelets and LoG features per patient; Table S2: The spectrum of 1037 radiomic features; Table S3: Patients' characteristics in recurrent and non-recurrent groups; Table S4: The clinical and genetic features in cancer patients; Table S5: The adjusted radiomic features in cancer patients; Table S6: The clinical impact of adjusted radiomic features; Table S7: The correlation of adjusted radiomics and RNA expression.  Data Availability Statement: The datasets used and analyzed during the current study are available from the corresponding author on reasonable request, and supplementary information files are available for this manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.