Prediction of Neoadjuvant Chemoradiotherapy Response in Rectal Cancer with Metric Learning Using Pretreatment 18F-Fluorodeoxyglucose Positron Emission Tomography

Simple Summary Neoadjuvant chemoradiotherapy (NCRT) before surgery is the mainstay of treatment for patients with locally advanced rectal cancer. Based on baseline 18F-fluorodeoxyglucose ([18F]-FDG)-positron emission tomography (PET)/computed tomography (CT), a new artificial intelligence model was introduced to predict responses to NCRT. The model employed metric learning combined with the Uniform Manifold Approximation and Projection for dimensionality reduction. The treatment response was scored by Dworak tumor regression grade (TRG); TRG3 and TRG4 indicated favorable responses. Using this model, the area under the receiver operating characteristic curve was 0.96 for predicting a favorable response. The sensitivity, specificity, and accuracy were 98.3%, 96.5%, and 97.5%, respectively. After further external validation, oncologists may use the proposed model to advise patients on the relative suitability of treatment options, including the therapeutic decision between NCRT and neoadjuvant chemotherapy. Integrating this approach would have a notable effect on counseling patients about treatment alternatives or prognoses. Abstract Objectives: Neoadjuvant chemoradiotherapy (NCRT) followed by surgery is the mainstay of treatment for patients with locally advanced rectal cancer. Based on baseline 18F-fluorodeoxyglucose ([18F]-FDG)-positron emission tomography (PET)/computed tomography (CT), a new artificial intelligence model using metric learning (ML) was introduced to predict responses to NCRT. Patients and Methods: This study used the data of 236 patients with newly diagnosed rectal cancer; the data of 202 and 34 patients were for training and validation, respectively. All patients received pretreatment [18F]FDG-PET/CT, NCRT, and surgery. The treatment response was scored by Dworak tumor regression grade (TRG); TRG3 and TRG4 indicated favorable responses. The model employed ML combined with the Uniform Manifold Approximation and Projection for dimensionality reduction. A receiver operating characteristic (ROC) curve analysis was performed to assess the model’s predictive performance. Results: In the training cohort, 115 patients (57%) achieved TRG3 or TRG4 responses. The area under the ROC curve was 0.96 for the prediction of a favorable response. The sensitivity, specificity, and accuracy were 98.3%, 96.5%, and 97.5%, respectively. The sensitivity, specificity, and accuracy for the validation cohort were 95.0%, 100%, and 98.8%, respectively. Conclusions: The new ML model presented herein was used to determined that baseline 18F[FDG]-PET/CT images could predict a favorable response to NCRT in patients with rectal cancer. External validation is required to verify the model’s predictive value.


Introduction
Neoadjuvant chemoradiotherapy (NCRT) before total mesorectal excision (TME) has become a mainstay of treatment for patients with locally advanced rectal carcinoma [1,2]. Responses to NCRT vary, with 15-27% of patients exhibiting a pathological complete response, 54-75% of patients exhibiting a partial response, and others exhibiting no response [3]. A phase three trial demonstrated that neoadjuvant chemotherapy with intravenous fluorouracil, leucovorin, and oxaliplatin without radiation achieved noninferiority in three-year disease-free survival relative to fluorouracil with radiation [4]. Determining whether a patient can achieve a favorable therapeutic response is crucial for counseling them on their treatment options and their decision on whether to undergo NCRT or neoadjuvant chemotherapy. The prediction of tumor responses before selecting NCRT maximizes the therapeutic benefits of the approach.
Among the imaging modalities used for clinical staging in patients with rectal cancer, 18F-fluorodeoxyglucose ([18F]FDG)-positron emission tomography (PET)/computed tomography (CT) imaging has been widely employed to assess patients' pathological responses to NCRT [5][6][7][8][9]. The use of FDG-PET-derived radiomics for predicting favorable responses has also been investigated [10,11]. Artificial intelligence (AI) allows for novel image analysis techniques and may be key to the advancement of precision medicine. The authors of this study previously investigated the performance of a combination of baseline [18F]FDG-PET/CT radiomics and random forests in predicting pathological complete response in the same patient setting [12]. Compared with human-engineered radiomic methods, which strongly depend on segmentation methods and quantification of extracted features, a deep learning (DL) algorithm works by learning relevant features directly from image databases. Little is known regarding predictive performance when baseline [18F]FDG-PET/CT images are used in the absence of handcrafted features. Imaging features in [18F]FDG-PET/CT was hypothesized to be capable of directly predicting responses to NCRT using potential imaging biomarkers. In this study, a novel metric learning (ML) model with a data processing strategy was employed to circumvent the limitations of training on a cohort with a low data volume [13,14].

Study Design and Patient Population
Between January 2009 and July 2018, 361 patients were screened for this retrospective study. They were newly diagnosed with rectal cancer and were scheduled to undergo curative NCRT followed by TME at our institute. All patients had biopsy-confirmed adenocarcinoma and received pretreatment [18F]FDG-PET/CT. No patients with mucinous or signet ring carcinomas were included. To minimize bias, patients who received TME more than 12 weeks after receiving NCRT were excluded. The structure of the proposed model for the classification of responses to NCRT is illustrated in Figure 1. The model was categorized as a supervised ML. The PET and CT images were processed and convoluted with ML separately. After two sets of the features were concatenated, dimensionality re-duction was performed using a Uniform Manifold Approximation and Projection (UMAP). The treatment responses were classified using a support vector machine (SVM) according to the distribution of the visualized features. A receiver operating characteristic (ROC) curve analysis was performed to calculate the classification performance. This study was approved by China Medical University and Hospital Research Ethics Committee [certificate numbers: DMR99-IRB-010(CR-11) and CMUH106-REC3-119(CR-3)].
Cancers 2021, 13, x FOR PEER REVIEW 2 of 5 model for the classification of responses to NCRT is illustrated in Figure 1. The model was categorized as a supervised ML. The PET and CT images were processed and convoluted with ML separately. After two sets of the features were concatenated, dimensionality reduction was performed using a Uniform Manifold Approximation and Projection (UMAP). The treatment responses were classified using a support vector machine (SVM) according to the distribution of the visualized features. A receiver operating characteristic (ROC) curve analysis was performed to calculate the classification performance. This study was approved by China Medical University and Hospital Research Ethics Committee [certificate numbers: DMR99-IRB-010(CR-11) and CMUH106-REC3-119(CR-3)].

NCRT
The drugs used in the NCRT regimens comprised capecitabine, uracil-tegafur, and intravenous 5-fluorouracil. All patients were irradiated with intensity-modulated radiotherapy to reduce treatment-related toxicity without compromising the response rates [15]. After a prescribed dose of 45 Gy to the pelvis in 25 fractions over five weeks, a dose of 5.4 Gy in three fractions was administered as a boost to the gross tumor and metastatic pelvic lymph nodes.

Pathological Assessment
After the patients underwent TME, their pathological responses were scored according to the Dworak tumor regression grade (TRG) [16]. TRG3 or TRG4 responses were regarded as favorable, whereas TRG0, TRG1, and TRG2 responses were regarded as nonfavorable.

PET/CT Image Acquisition
The patients underwent [18F]FDG-PET/CT for baseline staging before NCRT. Before imaging, the patients fasted for at least 4 h to reduce the effect of serum glucose [12]. Approximately 60 min after 370 MBq of [18F]FDG was administered to the patients, images were taken using a PET/CT scanner (Discovery STE 16-Slice PET/CT Scanner, GE

NCRT
The drugs used in the NCRT regimens comprised capecitabine, uracil-tegafur, and intravenous 5-fluorouracil. All patients were irradiated with intensity-modulated radiotherapy to reduce treatment-related toxicity without compromising the response rates [15]. After a prescribed dose of 45 Gy to the pelvis in 25 fractions over five weeks, a dose of 5.4 Gy in three fractions was administered as a boost to the gross tumor and metastatic pelvic lymph nodes.

Pathological Assessment
After the patients underwent TME, their pathological responses were scored according to the Dworak tumor regression grade (TRG) [16]. TRG3 or TRG4 responses were regarded as favorable, whereas TRG0, TRG1, and TRG2 responses were regarded as nonfavorable.

PET/CT Image Acquisition
The patients underwent [18F]FDG-PET/CT for baseline staging before NCRT. Before imaging, the patients fasted for at least 4 h to reduce the effect of serum glucose [12]. Approximately 60 min after 370 MBq of [18F]FDG was administered to the patients, images were taken using a PET/CT scanner (Discovery STE 16-Slice PET/CT Scanner, GE Healthcare, Milwaukee, WI, USA). The patients were required to rest during the uptake period. A CT topogram was used to label the axial scan range. After the CT was performed, PET images were obtained in the three-dimensional acquisition mode at 2 min per field of view (FOV) with an 11-slice overlap at the borders of the FOV. The CT performed was a low-dose non-contrast CT. The [18F]FDG-PET data were saved in Advantage Workstation (Ver. 4.4, GE Healthcare). Two nuclear medicine physicians reviewed the images and located the target lesions. The PET/CT workstation quantified [18F]FDG uptake automatically.

Data Pre-Processing
The initial CT and PET images were reconstructed on a 512 × 512 and 128 × 128 matrix. To fit the size of the corresponding PET images, the matrix of CT images was converted to 128 voxels × 128 voxels × length of the region of interest (ROI). The geographical center of the tumors and the ROI of the lesions were defined based on the CT images. Through this approach, the training model converged more efficiently in the classification of responses to NCRT.

Metric Learning
ML is an AI method based on a distance metric that determines similarity or dissimilarity between objects [13,14,17]. This approach can decrease and increase the distance between similar and dissimilar objects, respectively. In this study, two deep residual learning frameworks were used to analyze PET and CT images, respectively. The batch normalization and activation of the rectified linear unit were performed before each block to minimize the possibility of overfitting ( Figure 1). Furthermore, triplet loss was utilized as a loss function for the ML algorithms. The distances from the baseline input to the positive and negative inputs were therefore minimized and maximized, respectively. Consequently, the data were transformed into a new representation to facilitate classification training.

Uniform Manifold Approximation and Projection for Dimensionality Reduction
Dimensionality reduction plays a key role in data science. UMAP is a nonlinear dimensionality reduction technique that can be used for various data distributions through a combination of Riemannian geometry and algebraic topology [18,19]. UMAP has already been widely implemented in bioinformatics, materials science, and machine learning [18]. To improve visualization in the training model, a UMAP was used to reduce the dimensionality of the data by mapping it from high-dimensional to two-dimensional space. Through the use of this approach, the possibility of data overfitting or oversensitivity was minimized [20].

Support Vector Machine
An SVM is a machine learning algorithm that can efficiently engage in linear or nonlinear classification. In addition, SVMs are capable of categorizing low-volume data sets. This study utilized an SVM to classify preprocessed two-dimensional visualized features into two groups, namely, responses of below TRG3 and those of TRG3 or above.

Statistical Analysis
An ROC curve analysis was performed to calculate the classification performance. The area under the ROC curve (AUC) was used to evaluate the predictive performance of the model. The predictive indices included sensitivity (SE), specificity (SP), positive predictive value (PPV), negative predictive value (NPV), and accuracy. The analysis was performed using a commercial software (SPSS Statistics 26.00).

Patient Characteristics
According to the patients' treatment periods, their images, and available clinical data, the 236 patients were included in this study and they were divided into two cohorts (202 and 34 patients in the training and validation cohorts, respectively), as indicated in Appendix A. The patients in the training cohort were treated at any period between January 2009 and June 2017, whereas the patients in the validation cohort were treated on July 2017 or later. The same PET/CT scanner and treatment scheme were used during the patient inclusion. In the training cohort, the patients' tumors were mainly located in the upper or rectosigmoid junction (16 patients), the middle third (103 patients), or the lower third (83 patients), as summarized in Table 1. The median age was 58 years (31-86 years); 140 patients were men, and 63 were women. The median interval from the end of NCRT to the TME was 56 days. In four patients with metastatic liver tumors before NCRT, wedge resection of liver tumors were carried out simultaneously during the TME. In total, 117 patients (58%) achieved TRG3 or TRG4 responses.

Partitioning of Patients in the Training Cohort
To ensure that training was not conducted on a fixed data set alone, this study applied K-fold cross-validation to validate the strength of the model. The patients were randomly divided into five groups, each containing a comparable proportion of TRG3 and TRG4 responders. Each set was labeled as a test set only once, and the remaining sets were combined to construct the training set for the modeling. During the training process, all sets included in the training cohort were split into training and test sets at a ratio of 4:1.

Image-Based Prediction
The classification indices for the prediction of TRG3 or TRG4 responses for all tumors in the five sets of the training cohort are summarized in Table 2. The AUC was 0.96 [95% confidence interval (CI) 0.951-0.993] (Figure 2A), and the predictive SE, SP, PPV, NPV, and accuracy were 98.3% (95% CI 0.962-1.000), 96.5% (95% CI 0.936-0.993), 97.5% (95%CI Cancers 2021, 13, 6350 6 of 13 0.958-0.993), 97.6% (95% CI 0.953-1.000), and 97.5% (95% CI 0.960-0.991), respectively. Figure 3 illustrates the distribution of the visualized features for the training model before and after the implementation of ML and UMAP, respectively. Figure 4 displays the overall classification performance of SVM for TRG3 and TRG4 responses, following the dimensionality reduction of the features. The proposed model could provide enhanced discrimination of the visualized two-dimensional features associated with a patient's response to NCRT. sets included in the training cohort were split into training and test sets at a ratio of

Image-Based Prediction
The classification indices for the prediction of TRG3 or TRG4 responses for all tu in the five sets of the training cohort are summarized in Table 2. The AUC was 0.96 confidence interval (CI) 0.951-0.993] (Figure 2A), and the predictive SE, SP, PPV, and accuracy were 98.3% (95% CI 0.962-1.000), 96.5% (95% CI 0.936-0.993), 97.5% (9 0.958 -0.993), 97.6% (95% CI 0.953-1.000), and 97.5% (95% CI 0.960-0.991), respect Figure 3 illustrates the distribution of the visualized features for the training model b and after the implementation of ML and UMAP, respectively. Figure 4 displays the o classification performance of SVM for TRG3 and TRG4 responses, following the di sionality reduction of the features. The proposed model could provide enhanced dis ination of the visualized two-dimensional features associated with a patient's respon NCRT.

Validation and Comparison
The proposed model had the following predictive performance when applied to the PET/CT images of the 34 patients in the validation cohort. The AUC was 0.962 (95% CI 0.935-0.999) ( Figure 2B), and the SE, SP, and accuracy were 95.0% (95% CI 0.910-0.990), 100% (95%CI 1.000-1.000), and 98.2% (95%CI 0.969-0.997), respectively. The classification performance of the proposed model and that of the traditional DL approach are compared in Appendix B. The AUC value of the DL without the integration of ML or UMAP was 0.618, which was significantly inferior to that of the proposed model (p = 0.002).

Heat Map
A heat map was utilized to visually identify the discriminative regions targeted by the proposed model and to detect events in the imaging set. The rectum is adjacent to many organs, including the uterus, bladder, and prostate. These anatomical structures, which also may exhibit an increased uptake of FDG, might disrupt the visualization and cause inaccuracies. Therefore, the heat map indicated the activated area in the imaging sets in the last layer of the ML model ( Figure 5). The heat map demonstrated that the proposed model was capable of distinguishing the rectum from the adjacent organs. In addition, the characteristics of the target events were based on critical areas in the rectal tumors. the proposed model and to detect events in the imaging set. The rectum is adjacent to many organs, including the uterus, bladder, and prostate. These anatomical structures, which also may exhibit an increased uptake of FDG, might disrupt the visualization and cause inaccuracies. Therefore, the heat map indicated the activated area in the imaging sets in the last layer of the ML model ( Figure 5). The heat map demonstrated that the proposed model was capable of distinguishing the rectum from the adjacent organs. In addition, the characteristics of the target events were based on critical areas in the rectal tumors.

Discussion
Precision medicine for cancer treatment involves the identification of biological or imaging markers to predict therapeutic outcomes early. A patient's response to NCRT is crucial because it directly affects their prognosis [3]. In addition, patients with rectal cancer-especially those with low-lying tumors-who exhibit a favorable response to NCRT could benefit from sphincter-saving procedures [1]. Furthermore, a response prediction is valuable in determining a patient's therapeutic decision between NCRT and neoadjuvant chemotherapy. By employing a novel combination of ML and UMAP, this study demonstrated that baseline [18F]FDG-PET imaging could be used to classify a patient's NCRT response with high accuracy. Although the gold standard for measurement of a tumoral response to NCRT is postoperative histopathological analysis, this proposed model can provide an innovative platform for future studies related to individualized treatments.
Despite the lack of a universal algorithm for extracting radiomic features from [18F]FDG-PET imaging of rectal cancer, Bang et al. [11]. investigated a set of radiomic features in 74 patients with rectal cancer. The authors reported that the kurtosis of the absolute gradient was related to tumor recurrence. However, no significant associations existed between radiomics and TRG. Lovinfosse et al. [10]. conducted a study involving 66 patients and discovered that total lesion glycolysis of a tumor is a significant predictor of a TRG3 or TRG4 response to NCRT. The aforementioned predictive value of FDG-PETbased radiomics for NCRT might imply that the imaging features of rectal tumors in FDG-PET are associated with a particular phenotypic response to NCRT. Although the biological mechanism underlying the imaging of tumor heterogeneity remains unclear, implementation of AI-based models may enable oncologists to identify a particular tumor phenotype common to patients predicted to respond favorably to NCRT.
Two studies have demonstrated that DL combined with features derived from magnetic resonance imaging (MRI) before NCRT exhibits superior performance in the prediction of patients' treatment responses [21,22]. In a prospective study by Zhang et al., [21] 383 participants (290 in the training cohort, 93 in the test cohort) were evaluated using a DL model based on diffusion kurtosis MRI. The AUC was 0.99 for the prediction of TRG4 responses among participants in the test cohort but 0.79 for the prediction of downstaging of the primary tumors. Fu et al. conducted a study involving a cohort of 43 patients receiving NCRT and TME [22]. All of the patients underwent pretreatment diffusionweighted imaging (DWI). The researchers found that DL of the features extracted from the DWI achieved significantly better classification of NCRT responses than that derived from handcrafted features. Therefore, to maximize predictive accuracy, future studies should integrate multiple sources of imaging information into the proposed model.
To achieve an accurate assessment of the proposed model's clinical utility, this study employed an innovative AI method combining ML and UMAP to facilitate the measurement of training performance. DL does not always work well when training is conducted on a low-volume data set. In addition, the use of the algorithms to process the data might be time-consuming. DL and ML have been combined in deep metric learning [14]. This model is mainly based on the principle of similarity or connection between samples. Using this approach, the data can be transformed into a new feature space with a highly discriminative power. As indicated in Figure 3, UMAP can be implemented to reorganize the layout of the data distribution in a low-dimensional space to reduce the cross-entropy between the original and the low-dimensional topological representations [18]. Consequently, the features of treatment responses were effectively discriminated by the SVM according to the two-dimensional distribution. In the future, the performance of this proposed AI approach should be examined using other cancers or image settings to verify its reproducibility.
This study had several limitations. First, although the performance of the proposed model was validated using a validation cohort, validation with independent external data sets is still necessary to establish the model's clinical utility because this study was conducted at a single institute. To optimize the role of the imaging phenotype, a model able to accurately predict the TRG4 response would be more beneficial. However, because a positive correlation was observed between predictive value and predicted events and because patients exhibiting TRG responses were a minority, the use of the proposed model to assess TRG4 responses might be challenging. Therefore, the authors of this study intend to increase the sample size to extend the predictive range of the proposed model. Moreover, the overall predictive performance of the model can be strengthened by integrating information from other predictive models [21][22][23][24]. For example, the integration of MRI-derived features extracted before or after NCRT would be valuable because MRIderived features are potentially associated with specific phenotypic categories observed in DWI and dynamic contrast-enhanced imaging [25]. Finally, future research should investigate disease-free or overall survival of patients to maximize the prognostic benefits of the imaging phenotypes. Nonetheless, this study's findings represent a crucial step toward enabling customization of neoadjuvant therapy for patients with rectal cancer using AI. After further validation, oncologists may use the proposed model to advise patients on the relative suitability of treatment options, including the therapeutic decision between NCRT and neoadjuvant chemotherapy. Integrating this approach would have a notable effect on counseling patients about treatment alternatives or prognoses.

Conclusions
Using a novel ML model, this study demonstrated that baseline [18F]FDG-PET/CT images could be used to directly predict favorable responses in patients with rectal cancer who had received NCRT. Prior to its clinical application in personalizing patients' treatment options, the proposed model requires further validation with more extensive clinical studies.

Appendix B
Comparison of classification performance between proposed model and tradi Figure A1. Flowchart of study design.

Appendix B
Comparison of classification performance between proposed model and traditional deep learning approach.