Prediction of Neoadjuvant Chemoradiotherapy Response in Rectal Cancer Patients Using Harmonized Radiomics of Multcenter 18F-FDG-PET Image

Simple Summary Neoadjuvant chemotherapy is the standard treatment for locally advanced rectal cancer. Preoperative chemoradiotherapy yields clinically significant tumor regression; while some patients exhibit a minimal response, others exhibit a complete pathologic response. We developed deep learning and machine learning models to predict chemoradiotherapy response across external tests using multicenter data. The machine learning model, which used harmonized image features extracted from 18F-FDG PET, showed higher performance and demonstrated reproducibility through external tests compared to the deep learning models using 18F-FDG PET images. Our study highlights the feasibility of predicting the chemoradiotherapy response of individual patients using non-invasive and reliable image feature values. Abstract We developed machine and deep learning models to predict chemoradiotherapy in rectal cancer using 18F-FDG PET images and harmonized image features extracted from 18F-FDG PET/CT images. Patients diagnosed with pathologic T-stage III rectal cancer with a tumor size > 2 cm were treated with neoadjuvant chemoradiotherapy. Patients with rectal cancer were divided into an internal dataset (n = 116) and an external dataset obtained from a separate institution (n = 40), which were used in the model. AUC was calculated to select image features associated with radiochemotherapy response. In the external test, the machine-learning signature extracted from 18F-FDG PET image features achieved the highest accuracy and AUC value of 0.875 and 0.896. The harmonized first-order radiomics model had a higher efficiency with accuracy and an AUC of 0.771 than the second-order model in the external test. The deep learning model using the balanced dataset showed an accuracy of 0.867 in the internal test but an accuracy of 0.557 in the external test. Deep-learning models using 18F-FDG PET images must be harmonized to demonstrate reproducibility with external data. Harmonized 18F-FDG PET image features as an element of machine learning could help predict chemoradiotherapy responses in external tests with reproducibility.


Introduction
More than 100,000 individuals worldwide are diagnosed with rectal cancer annually [1].Rectal cancer is generally treated with neoadjuvant chemoradiotherapy, and tumor responses to therapy are diverse, with 54-75% of patients experiencing tumor downstaging [2].The reasons for these changes in treatment response are poorly understood, and there is no exact method for predicting the treatment response [3].Only 15-27% of patients show no residual viable tumors on pathological examination, pathological complete response (pCR) to chemoradiotherapy, and surgery [4].An accurate imaging biomarker for predicting and evaluating chemotherapy could the early classification of patients into different prognostic groups and personalized treatment approaches.Early detection of patients who might respond poorly to chemoradiotherapy can provide them the opportunity to undergo surgery and receive enhanced treatments to maximize treatment response.
Medical imaging can be used to noninvasively evaluate therapeutic responses to chemotherapy.Jang et al. developed an MRI-based deep learning model for predicting chemotherapy response in rectal cancer and reported the area under receiver operating characteristic curve (AUC) of 0.76 and an accuracy of 0.85. 18F-FDG PET/CT has also been widely used to monitor treatment response in many types of malignancies, stages, and diagnoses. 18F-FDG PET can help detect glucose metabolism and reveal tumor characteristics.As the anatomical data obtained from CT in rectal cancer patients can help distinguish between physiological and pathological intestinal absorption [5], 18 F-FDG PET/CT is generally considered a standard tool for predicting the response to chemotherapy in rectal cancer.The radiomics features of 18 F-FDG PET/CT can also facilitate the prediction of chemoradiotherapy.Taking this into consideration, researchers are increasingly exploring the potential of incorporating radiomic features from 18 F-FDG PET/CT scans into predictive models to enhance the accuracy and reliability of forecasting responses to chemoradiotherapy.
Recently, the use of machine learning techniques for large and complex biological data analysis has increased.Deep learning techniques are considered among the most powerful tools and are frequently used in bioinformatics because they can allow the analysis of vast amounts of data.Many radiomics studies utilize features extracted by manual method, and these methods are significantly influenced by the knowledge and experience of individual researchers [6].Consequently, deep learning techniques for computing task-adaptive feature representations by learning layers of complex features directly from medical images are considered suitable tools for predicting prognosis.Deep learning techniques that can automatically learn representative information from raw image data to decode the radiation expression type of tumors can assist in disease diagnosis, prognostic evaluation, and treatment sensitivity prediction [7].The model performance of deeper hidden layers for pattern recognition has recently begun to surpass that of classical methods in different fields.One of the most popular deep neural networks is the Convolutional Neural Network (CNN).Random forest (RF) technology, which includes an ensemble of decision trees and naturally integrates feature selection and interaction during learning, is a popular choice in personalized medicine.It is nonparametric, efficient, and has a high predictive accuracy for many types of data.RF model is increasingly being adopted because of its advantages in dealing with small sample sizes, high-dimensional feature spaces, and complex data structures [8].
In oncology research, particularly when assessing rectal cancer responses to therapy, the role of SUVmax and SUVmean values derived from 18F-FDG PET/CT scans has been under critical evaluation, as illustrated by several independent studies.Two independent studies showed that the SUVmax predicted chemotherapy with a specificity and overall accuracy of only 35% and 44%, respectively [9,10].SUVmean, dissimilarity, and contrast from the neighborhood intensity-difference matrix (NGTDM contrast) were significantly and independently associated with OS [11].A decrease in metabolic tumor volume (MTV) and total lesion glycolysis (TLG) values was suggested to be an indicator of a positive response to chemotherapy [12].Chemotherapy response predictions using 18  respond poorly to the treatment [13].Several studies have reported that radiation features were scanner or protocol-sensitive, highlighting the importance of harmonizing image features to reduce multicenter variability before pooling data from multiple sites [14,15].
In the present study, we evaluated the use of machine learning to predict chemoradiotherapy responses using radiomics harmonization and demonstrated the reproducibility and repeatability of the findings through rigorous external testing.Our effort is not only to address the limitations of the current methodologies but also to contribute to the development of a more robust and universally applicable predictive model for chemoradiotherapy responses in cancer treatment.

Patient Cohort
All patients were diagnosed with pathologic T-stage III rectal cancer, with tumor growth into the outer lining of the bowel wall without breaching its integrity.Patients with a tumor size > 2 cm were treated with neoadjuvant chemoradiotherapy before surgery.The internal and external cohorts comprised 116 patients from internal institutions (Korea Institute of Radiological and Medical Sciences) and 40 patients from independent institutions (Soonchunhyang University Bucheon Hospital).The internal cohort comprised 21 patients diagnosed with pCR and 95 patients diagnosed with non-pCR.The external cohort consisted of six patients diagnosed with pCR and 31 patients diagnosed with non-pCR.The rectal cancer region was cropped from an 18 F-FDG PET image (Figure 1).
In the present study, we evaluated the use of machine learn diotherapy responses using radiomics harmonization and dem bility and repeatability of the findings through rigorous external only to address the limitations of the current methodologies but development of a more robust and universally applicable predic diotherapy responses in cancer treatment.

Patient Cohort
All patients were diagnosed with pathologic T-stage III re growth into the outer lining of the bowel wall without breachi with a tumor size > 2 cm were treated with neoadjuvant chemo gery.The internal and external cohorts comprised 116 patients f (Korea Institute of Radiological and Medical Sciences) and 40 pa institutions (Soonchunhyang University Bucheon Hospital).T prised 21 patients diagnosed with pCR and 95 patients diagnose ternal cohort consisted of six patients diagnosed with pCR and 31 non-pCR.The rectal cancer region was cropped from an 18 F-FDG

Image Feature Extraction
We utilized LIFEx (Local Image Features Extraction, version late image features from 18F-FDG PET/CT images of rectal can image features were extracted.The region of interest (ROI) was m SUV threshold of 2.0 (Figure 2).Tumor lesions were identified in take, which was pathologically increased and was in contrast to t chemotherapy response in rectal cancer, first-and second-order rately to compare intensity-based and GLCM-based image chara calculated to select the image features from the first-and secon (version 4.2.2) software (R Foundation for Statistical Computing,

Image Feature Extraction
We utilized LIFEx (Local Image Features Extraction, version 4.90) software to calculate image features from 18F-FDG PET/CT images of rectal cancer patients.In total, 55 image features were extracted.The region of interest (ROI) was marked manually with an SUV threshold of 2.0 (Figure 2).Tumor lesions were identified in the area of 18 F-FDG uptake, which was pathologically increased and was in contrast to the CT images.To predict chemotherapy response in rectal cancer, first-and second-order images were used separately to compare intensity-based and GLCM-based image characteristics.The AUC was calculated to select the image features from the first-and second-order features using R (version 4.2.2) software (R Foundation for Statistical Computing, Vienna, Austria).

Harmonization Methodology
Harmonization of the image features from the internal and external 18 F-FDG PET/CT datasets was performed.Both of training set and test set were harmonized in separate manner.The harmonization (ComBat) method was used with an online application (https://forlhac.shinyapps.io/Shiny_ComBat/,accessed on 28 November 2023).ComBat is a batch-matching technology initially proposed for gene expression microarrays [16] and has been widely used in the field of imaging.The ComBat model is given by where j indicates the specific measurement of image feature y, i indicates the setting of the scanner, protocol effect, or even observer effect (called the site effect), α represents the average value of the image features denoted as y, γi signifies additive batch effect influence on measurement, δi represents multiplicative batch effect, and εij is an error term.Batch i represents the experimental settings employed for y measurement, including the possible scanner effect.Site effects γi and δi can be estimated using conditional posterior means and subsequently corrected using where  ,  and  are estimators of α, γi and δi.yij ComBat is the converted yij measured value devoid of the site i effect.

Deep Learning and Machine Learning
The CNN structure consisted of input, convolution, batch normalization, ReLU, max pooling, linear, dropout, and output layers.The CNN parameters comprised the optimizer, learning rate, and epoch; the values were set to Adam, 0.0002, and 200, respectively.Two convolutional layers are used.The CNN structure was constructed using two-dimensional input slices taken from each patient.The chemotherapy prediction performance of the RF model was internally and externally evaluated using the scikit-learn library (version 1.2.0) in Python (version 3.10.11).
Augmentation techniques were employed to resolve the data imbalance between pCR and non-pCR.The "RandomRotation" function of PyTorch livery in Python were used to randomly rotate input images by a certain angle to increase the diversity of the training dataset.The "RandomResizedCrop" function of PyTorch livery in Python is employed to randomly select a portion of the input image and subsequently resize it, serving the purpose of augmenting the training dataset and enhancing its variety.The Synthetic minority oversampling technique was implemented on the training dataset for machine learning to mitigate data imbalance.
After splitting the internal dataset at a 7:3 ratio, internal test were performed for both models through evaluating AUC, accuracy, precision, and sensitivity.External test were

Harmonization Methodology
Harmonization of the image features from the internal and external 18 F-FDG PET/CT datasets was performed.Both of training set and test set were harmonized in separate manner.The harmonization (ComBat) method was used with an online application (https: //forlhac.shinyapps.io/Shiny_ComBat/,accessed on 28 November 2023).ComBat is a batch-matching technology initially proposed for gene expression microarrays [16] and has been widely used in the field of imaging.The ComBat model is given by where j indicates the specific measurement of image feature y, i indicates the setting of the scanner, protocol effect, or even observer effect (called the site effect), α represents the average value of the image features denoted as y, γ i signifies additive batch effect influence on measurement, δ i represents multiplicative batch effect, and ε ij is an error term.Batch i represents the experimental settings employed for y measurement, including the possible scanner effect.Site effects γ i and δ i can be estimated using conditional posterior means and subsequently corrected using where α, γi and δi are estimators of α, γ i and δ i .y ij ComBat is the converted y ij measured value devoid of the site i effect.

Deep Learning and Machine Learning
The CNN structure consisted of input, convolution, batch normalization, ReLU, max pooling, linear, dropout, and output layers.The CNN parameters comprised the optimizer, learning rate, and epoch; the values were set to Adam, 0.0002, and 200, respectively.Two convolutional layers are used.The CNN structure was constructed using two-dimensional input slices taken from each patient.The chemotherapy prediction performance of the RF model was internally and externally evaluated using the scikit-learn library (version 1.2.0) in Python (version 3.10.11).
Augmentation techniques were employed to resolve the data imbalance between pCR and non-pCR.The "RandomRotation" function of PyTorch livery in Python were used to randomly rotate input images by a certain angle to increase the diversity of the training dataset.The "RandomResizedCrop" function of PyTorch livery in Python is employed to randomly select a portion of the input image and subsequently resize it, serving the purpose of augmenting the training dataset and enhancing its variety.The Synthetic minority oversampling technique was implemented on the training dataset for machine learning to mitigate data imbalance.
After splitting the internal dataset at a 7:3 ratio, internal test were performed for both models through evaluating AUC, accuracy, precision, and sensitivity.External test were proceed using independent institution dataset.Confusion matrix-based evaluation metrics including accuracy, sensitivity and precision were estimated and the threshold probability was adjusted to the value that maximizes Youden's index.

Patients Cohort
18 F-FDG PET/CT images from 116 internal and 40 external datasets were used for model estimation.The average ages of the internal and external datasets were 61.85 years and 59.88, respectively.The internal cohort comprised 75 males (64.66%) and 41 females (35.34%).The external cohort comprised 27 males (67.5%) and 13 females (32.5%).A summary of the demographic characteristics and pathological TNM stages is presented in Table 1.The patient cohort included patients who developed lymph node-or distant organ-metastases.

Evaluation of Deep Learning Model
The CNN model for rectal cancer chemoradiotherapy prediction was developed using 18 F-FDG PET images.The number of pCR data points from the internal and external data increased through augmentation to 84 and 24, respectively.To equalize the amount of pCR and non-pCR data, the pCR data from the internal and external cohorts were decreased by random sampling.The deep learning model showed a performance, with an accuracy of 0.867 and 0.789 in the internal test (Table 2).However, in the external test, the deep learning signature showed an accuracy of 0.557 and 0.355 (Table 3).The deep learning models showed higher performance in internal test then external test.

Image Feature Extraction and Harmonization
A total of 55 image featuers were quantitatively calculated from 18 F-FDG PET and CT images.The image features were separated into first-order features, including conventional indices, shapes, and histogram-based intensity values (n = 23).The image texture features were assigned as second-order features, including a Gray-level co-occurrence matrix (GLCM), neighborhood gray-level difference matrix (NGLDM), Gray-level run-length matrix (GLRLM), and Gray-level zone length matrix (GLZLM) (n = 22) (Figure 2).AUC was calculated to determine image features capable of distinguishing between chemotherapy and non-PCR cases.Subsequently, image features from the internal dataset were selected and used for machine learning.First-order features extracted from 18 F-FDG PET and CT with AUC over 0.65 and 0.55 were used for machine learning, respectively (Table 4).Second-order features extracted from 18 F-FDG PET and CT with AUC over 0.7 and 0.6 were used for machine learning, respectively (Table 5).Image feature values from internal and external institutions were harmonized to reduce multicenter variations.GLZLM GLNU, which had the largest change in the distribution of values before and after harmonization, was visualized (Figure 3).

Evaluation of Machine Learning Model
The extracted primary and secondary features were used as variables for the RF model, and each model was evaluated using internal and external tests.The RF model using harmonized first-order features showed an accuracy and AUC of 0.771, which is higher than before harmonization in the external test.The RF model using secondary features exhibited an accuracy and AUC of 0.675 and 0.603 in the external test after harmonization, lower than those without harmonization.The first-order features showed higher accuracy and AUC for the external datasets than the second-order features.In the external test set, the 18 F-FDG PET image feature as a machine learning signature achieved the highest accuracy with an AUC value of 0.875 and 0.896 (95% confidence interval 0.562-1) (Table 6).

Evaluation of Machine Learning Model
The extracted primary and secondary features were used as variables for the RF model, and each model was evaluated using internal and external tests.The RF model using harmonized first-order features showed an accuracy and AUC of 0.771, which is higher than before harmonization in the external test.The RF model using secondary features exhibited an accuracy and AUC of 0.675 and 0.603 in the external test after harmonization, lower than those without harmonization.The first-order features showed higher accuracy and AUC for the external datasets than the second-order features.In the external test set, the 18 F-FDG PET image feature as a machine learning signature achieved the highest accuracy with an AUC value of 0.875and 0.896 (95% confidence interval 0.562-1) (Table 6).

Discussion
The performance of the machine learning models in predicting chemoradiotherapy response using imaging features extracted from 18 F-FDG PET images was estimated using an external test.Conducting multicenter studies is one of the main objectives of clinical applications.However, medical images acquired from different institutions may introduce biases due to variations in imaging devices, data acquisition methods, and protocols [17,18].Because radiomics is sensitive, variations in feature values may occur even in cases where the same feature is extracted from multiple organs.Large-scale radiomic data analysis is required to verify the reproducibility of radiomics, and radiomic features extracted from images acquired from different centers must be integrated.In this study, radiomics harmonization was performed to reduce batch effects.Our results indicated that the harmonization of image features extracted from multiple datasets is essential as a predictor.
In several studies related to cancers, the RF model has shown a high potential in predicting clinical outcomes [19][20][21][22].The RF model demonstrated reproducibility and repeatability in external tests when utilizing the features extracted from 18 F-FDG PET images.Because the RF model generates predictions by randomly selecting a decision tree, it mitigates the risk of overfitting.As it traverses the decision tree, it learns the image features that best encapsulate the discriminatory factors for distinguishing tumor characteristics.Moreover, it is expected to yield superior outcomes because it employs an optimal cut-off value for discriminating between pCR and non-pCR patients based on image features.These attributes of the RF model appear to have further enhanced its predictive accuracy and AUC in the context of chemoradiotherapy prognosis.
Medical imaging offers vital insights into the progress of patients with rectal cancer, and AI holds promise for developing quantitative treatment decision support tools.Some studies have shown that tumor metabolic changes on 18 F-FDG PET were more predictive than tumor morphological modifications on CT [23][24][25].In our study, image features extracted from 18 F-FDG PET images showed higher machine learning performance than those extracted from CT images.The imaging features of CT in the external tests showed an accuracy and AUC of 0.425 and 0.593, whereas those extracted from 18 F-FDG PET showed an accuracy and AUC of 0.875 and 0.896.Our study indicate that the radiomics of 18 F-FDG PET have a more complementary effect then CT in predicting the pCR of rectal cancer. 18F-FDG PET imaging is crucial for monitoring alterations in tumor metabolic activity, playing a vital role in prognostic predictions for patients undergoing concurrent chemoradiotherapy.Although CT imaging provides comprehensive details pertaining to the tumor's size and shape, excelling in anatomical delineation, it falls short in effectively predicting tumor responses to chemoradiotherapy.This discrepancy highlights a potential limitation in its prognostic utility for this specific therapeutic context.It has been observed that the integration of radiomic features extracted from both 18 F-FDG PET and CT into predictive models can lead to a decrement in performance, suggesting a paradoxical reduction in the model's efficacy despite the amalgamation of data from both imaging techniques.This underscores the need for careful consideration when combining features from different modalities to enhance the accuracy of treatment response predictions.
The first and second selected features for AUC values encompassed those previously identified as having prognostic significance in other investigations.The significance of SUVmax, SUVmean, and Uniformity, which are image feature values, has been demonstrated in previous studies.The secondary features based on GLRLM, NGLDM, and GLRM were incorporated as important variables in the radiochemotherapy prediction model.These feature values have demonstrated their predictive utility in various cancers.When the chemoradiotherapy response was predicted using harmonized first-order features, it showed a higher performance than second-order features.The first-order features were derived from histograms, whereas the second-order features were based on the GLCM.As the first-order values exhibited significant alterations following harmonization, the impact of harmonization is noteworthy.Conversely, the second-order values displayed negligible changes after harmonization.Consequently, the model utilizing first-order features exhibited superior performance in predicting rectal cancer chemotherapy outcomes.
There are several 18 F-FDG PET/CT predictive radiomics for pCR to chemotherapy, including visual response, maximum standardized uptake value (SUVmax), percentage SUVmax reduction, TLG, and MTV [26-29].Lovinfosse et al. revealed that SUVmean, dissimilarity, and contrast from contrast NGTDM were significantly and independently associated with OS in patients with rectal cancer.Jean-Emmanuel et al. predicted a complete response using a deep neural network after rectal chemoradiotherapy with 80% accuracy in a multicenter cohort using radiomics extracted from CT. Xiaolu M et al.The RF model for the degree of differentiation, T-stage, and N-stage were obtained using radiomics from MRI (AUC, 0.746; 95% CI, 0.622-0.872;sensitivity, 79.3%; and specificity, 72.2%).Giannini et al. evaluated a logistic regression model using six texture features (five from PET and one from T2w MRI) to determine the chemotherapy outcomes (AUC = 0.86; sensitivity = 86%, and specificity = 83%).
We estimated the performance of the deep learning model in predicting the outcomes of neoadjuvant responses using multicenter 18 F-FDG PET images.However, the model performance proved insignificant in external tests conducted with datasets from independent institutions.Deep learning demonstrated subpar performance in external tests owing to the omission of dataset harmonization, which failed to account for potential biases between the internal and external datasets.In the case of machine learning, the difference between the internal and external datasets was drastically reduced through the harmonization of the image feature values shown in the ROI; thus, reproducibility as a predictor of machine learning was confirmed.Batch effects can be mitigated by preprocessing the images employed in deep learning, involving techniques such as slope distortion correction, bias slope distortion correction, bias field correction, and intensity normalization, which help standardize the data [30,31].Reducing batch effects through harmonization at the image level is expected to show high performance in sufficiently predicting chemotherapy, even in external tests.
Our study has some limitations.Deep learning exhibited a lower performance in external tests than in internal tests.This outcome may be attributed to the absence of harmonization between internal and external datasets.Because the CNN model makes predictions using the image itself, it is necessary to harmonize the image.The number of patients within the presently registered external data may be relatively limited, leading to suboptimal performance in external tests.Deep learning techniques in the realm of medical image analysis are challenged by their black-box characteristics, which pose issues for interpretability.Additionally, given the extensive discussion in this article about how chemotherapy and radiotherapy can significantly increase the risk of infertility for women wishing to conceive in the future, we propose a more proactive approach.Women should be given greater autonomy over their reproductive timelines, particularly through the strategic use of oocyte vitrification prior to undergoing such medical interventions [32].

Conclusions
Our research underscores the critical significance of image harmonization in multicenter studies for accurate chemotherapy response prediction in pancreatic cancer while also highlighting the potential of noninvasive radiomics-based machine learning models in predicting neoadjuvant chemoradiotherapy response in rectal cancer.A machine learning model predicting radiochemotherapy outcomes for pancreatic cancer using harmonized 18 F-FDG PET imaging features was confirmed to be reproducible and repeatable in external testing using multicenter data.A deep model using 18 F-FDG PET images without the harmonization process performed poorly in predicting neoadjuvant chemoradiotherapy response, demonstrating the importance of image harmonization in multicenter studies.We confirmed the possibility of using a machine learning model to predict the chemoradiotherapy response of rectal cancer before treatment using radiomics, which can be obtained noninvasively.

Figure 1 .
Figure 1.The corp process of rectal cancer region from 18 F-FDG PET ima

Figure 1 .
Figure 1.The corp process of rectal cancer region from 18 F-FDG PET image.

Figure 3 .
Figure 3. Distribution of GLZLM GLNU value before and after harmonization: (a) Distribution of GLZLM GLNU extracted from all T-stage patients before harmonization; (b) Distribution of GLZLM GLNU max extracted from all T-stage patients after harmonization.

Figure 3 .
Figure 3. Distribution of GLZLM GLNU value before and after harmonization: (a) Distribution of GLZLM GLNU extracted from all T-stage patients before harmonization; (b) Distribution of GLZLM GLNU max extracted from all T-stage patients after harmonization.
F-FDG PET/CT are not sufficiently accurate to distinguish patients showing treatment response from those who

Table 1 .
Characteristics of the study cohort.

Table 2 .
Internal test of CNN model using18F-FDG PET images.
pCR: pathological complete response; AUC: area under receiver operating characteristic curve; CI: Confidence interval.

Table 3 .
External test of CNN model using18F-FDG PET images.
pCR: pathological complete response; AUC: area under receiver operating characteristic curve; CI: Confidence interval.

Table 4 .
Extraction of first-order image features by AUC cut-off value.

Table 5 .
Extraction of second-order image features by AUC cut-off value.

Table 6 .
Internal and external test of RF model.
AUC: area under receiver operating characteristic curve; CI: Confidence interval.

Table 6 .
Internal and external test of RF model.
AUC: area under receiver operating characteristic curve; CI: Confidence interval.