Evaluating the Effectiveness of 2D and 3D CT Image Features for Predicting Tumor Response to Chemotherapy

Background and Objective: 2D and 3D tumor features are widely used in a variety of medical image analysis tasks. However, for chemotherapy response prediction, the effectiveness between different kinds of 2D and 3D features are not comprehensively assessed, especially in ovarian-cancer-related applications. This investigation aims to accomplish such a comprehensive evaluation. Methods: For this purpose, CT images were collected retrospectively from 188 advanced-stage ovarian cancer patients. All the metastatic tumors that occurred in each patient were segmented and then processed by a set of six filters. Next, three categories of features, namely geometric, density, and texture features, were calculated from both the filtered results and the original segmented tumors, generating a total of 1403 and 1595 features for the 2D and 3D tumors, respectively. In addition to the conventional single-slice 2D and full-volume 3D tumor features, we also computed the incomplete-3D tumor features, which were achieved by sequentially adding one individual CT slice and calculating the corresponding features. Support vector machine (SVM)-based prediction models were developed and optimized for each feature set. Five-fold cross-validation was used to assess the performance of each individual model. Results: The results show that the 2D feature-based model achieved an AUC (area under the ROC curve (receiver operating characteristic)) of 0.84 ± 0.02. When adding more slices, the AUC first increased to reach the maximum and then gradually decreased to 0.86 ± 0.02. The maximum AUC was yielded when adding two adjacent slices, with a value of 0.91 ± 0.01. Conclusions: This initial result provides meaningful information for optimizing machine learning-based decision-making support tools in the future.


INTRODUCTION
Ovarian cancer is the most aggressive malignancy in gynecologic oncology [1].Given the difficulty on early stage diagnosis of ovarian carcinoma, most patients are diagnosed with ovarian cancer at advanced stages [2], which can lead to the formation of metastatic tumors on various organs.To control these metastatic tumors, chemotherapy is the only effective treatment after the primary cytoreduction [3].However, due to the nature of metastasis heterogeneity, the patients' response to treatment varies largely.Thus, an important challenge in current clinical practice is to select the appropriate chemotherapy before administrating the treatment.To address this, many studies have been conducted to identify biomarkers from either histologic or/and imaging data that might be associated with patient response to chemotherapy [4][5][6][7].Among different technologies, imaging approaches have the advantage on evaluating the organ/tissue noninvasively with adequate spatial resolution.CT is a popular choice in therapy response assessment, due to its wide availability, low cost, robustness, and ease of use [8].However, the current criteria requires both pre-therapy and 6-8 week follow-up examinations, and the diameter-based evaluation causes low association with the clinical outcome [9].
Meanwhile, radiomics is an emerging technology which extracts a large amount of features from tumors segmented from various medical images [10][11][12][13].These radiomics features are able to quantify the complex textures, shapes, and other details of a tumor that are highly associated with carcinogenesis.Therefore this information can be beneficial in determining treatment options and predicting the tumor's response to therapy [14].For this technique, identifying effective radiomics features is critically important: 2D features are easier and faster to calculate but provide less information, while the 3D features use the entire 3D tumor volume and intuitively contain more comprehensive information.To this end, many studies have been conducted to explore the effectiveness of 2D and 3D features in different applications.The prognostic performance of 2D and 3D radiomics features in CT images of non-small cell lung cancer (NSCLC) was investigated in [15], which demonstrated that 2D features exhibited better performance, although both feature types had strong prognostic capability.One study compared the 2D and 3D radiomics features in characterizing gastric cancer [16] and revealed that models constructed with 2D radiomics features had comparable performance with those constructed with 3D features.In another study, Xu et al. [17] evaluated the prediction performance of 2D and 3D radiomics features in a multi-organ, multimodality cancer study, and the authors concluded that 3D radiomics features were more effective.Lee et al. [18] compared the 2D and 3D texture features for discriminating between gastric cancer and normal gastric mucosa on CT images, and they revealed that 3D texture features were more effective than 2D features.Despite their clinical usefulness, these studies only compare features from the single slice based 2D tumor and the full volume 3D tumor.Very few studies have been focused on the effectiveness of incomplete-3D tumors, which contain only a portion of the image slices.
In this investigation, we hypothesized that the inclusion of multiple tumor slices would lead to a substantial enhancement in the predictive accuracy of our model, but it may also introduce more uncertainties that may adversely impact model accuracy.To verify this hypothesis, we extracted 2D radiomics features from single slice of the tumors and then generated incomplete-3D features by sequentially adding the adjacent CT slices.Based on each individual feature pool, we optimized the same kind of predictive model and evaluated their performance respectively.The details are presented in the following sections.

Database
In our study, we utilized a dataset of CT images involving 188 ovarian cancer patients.The CT images in the dataset were retrospectively collected from the University of Oklahoma Health Science Center (OUHSC).Subjects for this study included the following criteria: (1) they had recurrent ovarian/peritoneal/tubal carcinoma of high-grade histology (e.g., serous, endometrioid, and undifferentiated); (2) systemic chemotherapy treatment after primary cytoreduction was used; and (3) for each patient in the dataset there were between 1 to 5 tumors where pre-and post-therapy CT imaging scans were available for each tumor.Post-therapy CT scans were taken 6-8 weeks after initiation of therapy.The clinical information of the patients was also collected, including their RECIST evaluation results and 6-month progression-free survival (PFS).The RECIST and PFS data was used to assess the proposed quantitative image marker established in this study.6month PFS is a common assessment index established by the US Food and Drug Administration and the European Medicines Agency as a tool that assesses the efficacy of testing new cancer chemotherapies in clinical trials [19].The baseline demographic and clinicopathological characteristics of our patients are illustrated in Table 1.The clinical data indicate that 130 patients are responders with a 6-month PFS of 'Yes,' whereas 58 patients are non-responders with little response to the treatments.ln this research, a standard protocol was established utilizing the same scanner settings as prior studies, as recommended by RECIST 1.1 [20].The acquisition of images was conducted in line with a conventional CT scanning image acquisition technique.Images were captured utilizing either a GE Light Speed VCT 64 or a GE Discovery 600 16-row detector machine, a tube voltage of 120 kVp, and currents that ranged from 100 to 600 mA depending on patient body size.

Radiomics Feature Extraction
In this study, the tumor features were obtained from 2D and 3D pre-therapy CT slices.Each segmented tumor is depicted on a number of CT slices, while the central slice has the largest 2D area and was identified by radiologists.Accordingly, we first extracted the quantitative features only from the central slice to generate a 2D feature pool.Next, based on the central slice, we added one neighboring slice to have one incomplete-3D tumor, for which one 3D tumor feature pool was created.Thus, we sequentially added the adjacent CT slices and generated the corresponding incomplete-3D feature pool until the tumor disappeared.Before the feature computation, we developed a computer aided image analysis scheme to segment each individual tumor from the CT images.The tumor segmentation procedure started at the central slice where the tumor was marked by a radiologist according to RECIST criteria.The segmentation process was performed on the slices adjacent to the central slice sequentially until the tumor was completely accounted for by the algorithm.For each slice, the tumor was segmented by a hybrid algorithm, which consists of two core algorithms: a multilayer topographic region growth with adaptive thresholds [21] and a dynamic contour-seeker or edge tracking method [22,23].The performance of this algorithm was assessed and validated by a number of our previous studies [24][25][26][27].Given that the metastatic tumors occurred on a number of human organs with high heterogeneity, the automated segmentation may not be able to generate the tumor contour with satisfactory accuracy.Thus, these segmentation results were visually evaluated by experienced researchers and were manually corrected if needed.The entire scheme was developed as an ImageJ plugin [28], which is equipped with a user friendly graphical user interface (GUI) to visualize the 2D and 3D tumors.Based on the segmented tumors, we generated the 2D and 3D feature pool using the following methods [12,13].The feature computation scheme is based on pyradiomics, an opensource platform which calculated features in accordance with the definitions in the Imaging Biomarker Standardization Initiative (IBSI) [29].The general flowchart of the calculation is described in Figure 1.In this module, the segmented tumor was first processed by a number of operations including exponential, Gradient Magnitude, Local Binary Pattern (LBP), Logarithm (Log), Square, Square Root, and Wavelet (Coif1) transformations or filters.Next, the radiomics features were then calculated on the processed images as well as the original image.The features can be divided into 3 categories, i.e., geometric, density and texture features, which describe the tumor in various aspects [30].Table 2 shows a breakdown of the number of features calculated for each image class, with the number of features per class ranging from 93 to 741.We calculated a total of 1595 and 1403 features for the 3D and 2D feature pools, respectively.To address the potential presence of multiple metastatic tumors within each patient, we calculated a single feature vector for each patient by taking the minimum value of the feature vectors obtained from the tumors present in that patient.

Develop a Machine Learning based Model to Predict Ovarian Cancer Response to Chemotherapy
The Pearson correlation coefficient (PCC) and LASSO (Least Absolute Shrinkage and Selection Operator) approaches were used to reduce the dimensionality of our feature space.Firstly, the PCC was calculated and a heatmap was generated to investigate the feature dependency.After the computation, all the feature pairs with a PCC above a pre-set threshold of 0.99 were discarded [31] to remove highly correlated features and minimize the inclusion of redundant information.
Next, LASSO was applied on the rest of the feature pool to create an optimal feature cluster.LASSO is a regularization and feature selection regression approach [32].Essentially, this method employs a variation of the least squares regression method that employs L1 regularization to produce sparse variable coefficients.Accordingly, LASSO constrains the sum of the absolute values of the regression weights to be smaller than a fixed threshold.LASSO has the capability to handle high dimensional data with a limited number of observations [33].Therefore, it improves model interpretability by removing variables irrelevant to the response variable and reduces the possibility of overfitting [34].The LASSO objective function is defined in Eq. 1.
In the above error function,  is the upper bound threshold for the sum of the absolute values of feature weights , and  is a non-negative tuning parameter that regulates the degree of the penalty.We employed the LASSO-Cross Validation (LASSO-CV) approach to tune the LASSO parameters [35].After applying LASSO, features with non-zero weights after shrinkage were chosen to be included in the model.Next, a support vector machine (SVM) was used to predict the tumor responses.SVM has been widely validated as an effective classifier in various medical imaging tasks [36][37][38][39][40][41].The method constructs one or more hyperplanes to classify the feature vectors into two classes with minimal generalization error [42].Mathematically, the SVM model is established by solving the following optimization problem: min 0. In the above formulars, (  ,   ) is one pair of feature vector and output value,  and  are the normal vector and intercept of the discrimination hyperplane respectively,   is the slack variable,  is the adjusting coefficients for optimization errors, e is the unit vector,  is the Lagrange coefficient, and  represents kernel function (  ,   ) = exp (− ) [43].The model's parameters (i.e.,  and ) will be tuned to achieve optimal values during the training process.Additionally , the synthetic minority oversampling technique (SMOTE) was also employed to oversample the imbalanced data and minimize the training bias [44,45].
Each classifier was then trained and tested using a 5-fold cross-validation approach [46,47].The original group was randomly split into 5 equal sized subgroups using 5-fold crossvalidation.A single subgroup was preserved as testing data, while the remaining four were utilized for model optimization.The cross-validation procedure was then repeated five times, so each of the five subgroups served as testing data once.To evaluate the model performance, the receiver operating characteristic (ROC) curve was used, and the area under the ROC curve (AUC) was estimated for each model.

RESULTS
A sample tumor from our database is shown in Figure 2, which shows the tumor depicted on a total of 7 consecutive CT slices.The tumor has the largest area in the central slice, and the area gradually decreases until the tumor eventually disappears on the 4th neighboring slices of both sides (Figure 2 (a)).Accordingly, the segmentation procedure starts with the central slice, where the tumor was identified by a radiologist or oncologist.Then the segmentation continues on the adjacent slices until the tumor disappears (Figure 2 and (c) represent the heatmap of all the features extracted from 2D, incomplete-3D, and 3D tumor, respectively.The maps were produced by a total of 188 observations.Each entry in the map indicates the colorized Pearson correlation coefficient ranging from 0 (Blue) to 1 (red).The maps illustrate that most of the features extracted from 2D tumor masks have relatively low correlation.The similar pattern can be observed in the features obtained from incomplete-3D tumor masks, but the 3D features set has stronger dependencies in the exponential, logarithm, square, and square root features than other feature groups.Furthermore, Figure 3(d), (e), and (f) demonstrate the histograms of all feature categories for 2D, incomplete-3D, and 3D features.According to the histograms, more than 80% of correlation coefficient values for 2D and incomplete-3D features are less than 0.5, whereas a higher percentage of 3D features have correlation coefficients greater than 0.5.These results imply the feature pools of 2D, and incomplete-3D tumor features cover comprehensive tumor characteristics with little information redundancy.
After applying the PCC and LASSO feature selection methods, we identified 115 2D features as the most effective feature cluster for further analysis.Similarly, 56-126 features were identified for different incomplete-3D tumor features pools, and 73 features were identified from the full 3D feature pool.As illustrated in Table 3, a minimum of 78% of the selected features can be categorized into to texture-based features, while the remainder fall into the categories of shape or density features.When dividing the features based on the filters, we observe that the largest group of features is extracted from the wavelet transform-based filtered images, with the features calculated from LBP-filtered images forming the second most prominent group.Our models used these features as inputs to generate the prediction score.Figure 4 illustrates the ROC curve achieved by the models trained with 1 to 9 number of slices of tumor mask and whole slices, while Table 4 summarizes the corresponding values of AUC and overall accuracy values.The results demonstrate that adding more slices to the central slice can increase the prediction performance, but only for a certain range of additional slices.The highest performance was achieved when the model was constructed with one central slice and two adjacent slices, with the model yielding an AUC of 0.91 ± 0.01 (95% confidence interval [0.88, 0.94]) and an overall predicting accuracy of 0.84 ± 0.03 (95% confidence interval [0.78, 0.88]).The 3D model with 3 and 5 adjacent slices were the second and third best performing models, which achieved an AUC values of 0.89 ± 0.01 and 0.86 ± 0.02, respectively.Model_3D with AUC of 0.86 ± 0.02 (95% confidence interval [0.82, 0.89]) performed better than Model_2D with AUC of 0.84 ± 0.02 (95% confidence interval [0.78, 0.88]).All these trends are also illustrated in Figure 4(b).

DISCUSSION
In this study, we conducted a comprehensive evaluation of various radiomic feature extraction methods for predicting early response of ovarian cancer to chemotherapy.Our investigation focused on 2D, incomplete-3D, and 3D radiomic features and their ability to represent and discriminate the underlying tumor characteristics.Our findings demonstrate that the incomplete-3D features extracted from the central tumor slice and its two adjacent slices yielded better prediction accuracy and might be recommended as the preferred approach.
As compared to the previous studies, the most unique characteristic of our investigation is that we explored the performance of the features computed from incomplete-3D tumors, which contain a portion of the tumor slices.The comparison between the 2D or 3D features have been investigated by many research groups.In general, the 2D features are calculated from the single slice containing the maximal projection within the entire tumor.Accordingly, the tumor segmentation and annotation are relatively easier, but some useful information are lost.For instance, it is difficult to accurately extract tumor volume, surface area, spherical disproportion, and other features that reflect geometric irregularities from a 2D tumor.On the other end, despite the comprehensive information, the 3D feature suffers from more information uncertainties.Given that it is very difficult to clearly define the boundary between the tumor region and its surround healthy tissues, there is no segmentation algorithm which is able to perfectly extract the tumor contour [48].Moreover, the tumor heterogeneity may only be inside several central slices.The marginal slices may only contain the suspicious surrounding tissue, and they may not have any information associated with the needed treatment responses.As a result, more uncertainties will be added on the segmented 3D tumors, which may adversely affect the performance of prediction models.Therefore, there is a trade-off between adding more useful information and uncertainties when sequentially increasing the adjacent slices.In this study, we first demonstrated that using the central and its two neighboring slices may achieve better performance than other situations.To the best of our knowledge, no similar studies have been performed on this topic before.
Moreover, our study suggests that the number of slices used for CT image analysis can have an impact on the correlation structure of extracted radiomics features.As shown in Figure 3, the correlation heatmap of features extracted from 2D and incomplete-3D tumors showed relatively low correlation values.However, the correlation structure of the features extracted from whole 3D tumor masks was notably different, particularly in logarithm-based, exponential, square, and square root-based features, exhibiting higher correlation among different classes.The high correlation may be attributed to the enhancement of certain 3D structures when applying these filters, which leads to consistent value changes on some categories of the features.When investigating the contributions of separate feature groups after feature selection, we observe that texture-based features extracted from tumor images were more informative than shape-based and density-based features in predicting tumor response to chemotherapy.This could be attributed by the fact that texture-based features are able to capture the heterogeneity and complexity of the tumor microenvironment, which can provide insights into tumor biology and response to treatment.Additionally, our findings also suggest that wavelet-based filtered images provided the most informative features for treatment response prediction, when comparing with other filters such as logarithm, exponential, gradient, and LBP.This may be caused by the fact that wavelet transform is able to initially concentrate the tumor edge and texture information, which are highly related to the treatment effectiveness.Since the other unrelated information are discarded on the input images, the extracted features may therefore exhibit more discriminative powers.Overall, our study highlights the importance of texture-based features and wavelet filtering in radiomics analysis for predicting treatment response in cancer patients.
Despite the promising results of our study, this study has the following limitations.First, the dataset used in our study only consisted of 188 patients, which are collected from a single institute.This single institute dataset may not be able to comprehensively represent the diversified population within the entire country, therefore the robustness of the results in this investigation should be further verified by a larger multi-institute patient cohort.Second, although the segmentation scheme was validated in our previous studies, the tumor segmentation accuracy was not considered.The segmentation error may also potentially introduce noise into our 3D features and consequently reduce the predictive performance [49].Third, only radiomics features were used in the feature performance comparison.Other types of features, such as deep learning features, were not investigated.It would be worthwhile to explore the combination of deep learning features and radiomics features in future studies.In conclusion, our study may provide valuable information for the development of radiomics-based prediction models in the future.

Figure 2 .
Figure 2. One example of the tumor segmentation process, in which the central slice is outlined with red color.a) Identified the tumor on the CT slices.b) Tumors outlined using our developed CAD software.(c) Segmented tumor masks.

Figure 3 (
Figure3(a), (b), and (c) represent the heatmap of all the features extracted from 2D, incomplete-3D, and 3D tumor, respectively.The maps were produced by a total of 188 observations.Each entry in the map indicates the colorized Pearson correlation coefficient ranging from 0 (Blue) to 1 (red).The maps illustrate that most of the features extracted from 2D tumor masks have relatively low correlation.The similar pattern can be observed in the features obtained from incomplete-3D tumor masks, but the 3D features set has stronger dependencies in the exponential, logarithm, square, and square root features than other feature groups.Furthermore, Figure3(d), (e), and (f) demonstrate the histograms of all feature categories for 2D, incomplete-3D, and 3D features.According to the histograms, more than 80% of correlation coefficient values for 2D and incomplete-3D features are less than 0.5, whereas a higher percentage of 3D features have correlation coefficients greater than 0.5.These results imply the feature pools of 2D, and incomplete-3D tumor features cover comprehensive tumor characteristics with little information redundancy.

Figure 3 .
Figure 3. Feature correlation analysis after eliminating constant features a) Heatmap of the 1276 selected features from 2D tumor masks and b) Heatmap of the 1432 selected features from incomplete-3D tumor masks and c) Heatmap of the 1496 selected features from 3D tumor masks, and the corresponding histograms were shown in (d) for 2D features, (e) for incomplete-3D features and (f) for 3D features, respectively.

Figure 4 .
Figure 4. Performance comparison of the models for 6-month PFS prediction a) ROC curves of the prediction models.b) The corresponding AUC values of ten prediction models (with 95%CI).

Table 1 .
Information of the patients in our dataset

Table 2 .
Number of features calculated from original image and other processed versions of the image.

Table 3 .
Breakdown of selected features by radiomics class type and filtering method

Table 4 .
Performance comparison of the prediction models