Next Article in Journal
Niclosamide Revitalizes Sorafenib through Insulin-like Growth Factor 1 Receptor (IGF-1R)/Stemness and Metabolic Changes in Hepatocellular Carcinoma
Previous Article in Journal
2,2-Diphenethyl Isothiocyanate Enhances Topoisomerase Inhibitor-Induced Cell Death and Suppresses Multi-Drug Resistance 1 in Breast Cancer Cells
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Impact of Aggregation Methods for Texture Features on Their Robustness Performance: Application to Nasopharyngeal 18F-FDG PET/CT

1
School of Biomedical Engineering, Southern Medical University, Guangzhou 510515, China
2
Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou 510515, China
3
Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China
4
Department of Electronic Engineering, Information School, Yunnan University, Kunming 650504, China
5
Pazhou Laboratory, Guangzhou 510330, China
*
Author to whom correspondence should be addressed.
Cancers 2023, 15(3), 932; https://doi.org/10.3390/cancers15030932
Submission received: 27 December 2022 / Revised: 27 January 2023 / Accepted: 30 January 2023 / Published: 1 February 2023

Abstract

:

Simple Summary

This study investigates the impact of aggregation methods used for the generation of texture features on their robustness of nasopharyngeal carcinoma (NPC) based on 18F-FDG PET/CT images. 128 NPC patients were enrolled and 95 texture features were extracted for each patient including six feature families under different aggregation methods. For GLCM and GLRLM features, six aggregation methods were considered. For GLSZM, GLDZM, NGTDM and NGLDM features, three aggregation methods were considered. The robustness of the feature was assessed by the intra-class correlation coefficient (ICC). Different dimensional features with same aggregation methods showed worse robustness compared with the same dimensional features with different aggregation methods. Different discretization levels and PVC algorithms have a negligible effect on the percent of ICC categories of all texture features.

Abstract

Purpose: This study aims to investigate the impact of aggregation methods used for the generation of texture features on their robustness of nasopharyngeal carcinoma (NPC) based on 18F-FDG PET/CT images. Methods: 128 NPC patients were enrolled and 95 texture features were extracted for each patient including six feature families under different aggregation methods. For GLCM and GLRLM features, six aggregation methods were considered. For GLSZM, GLDZM, NGTDM and NGLDM features, three aggregation methods were considered. The robustness of the features affected by aggregation methods was assessed by the pair-wise intra-class correlation coefficient (ICC). Furthermore, the effects of discretization and partial volume correction (PVC) on the percent of ICC categories of all texture features were evaluated by overall ICC instead of the pair-wise ICC. Results: There were 12 features with excellent pair-wise ICCs varying aggregation methods, namely joint average, sum average, autocorrelation, long run emphasis, high grey level run emphasis, short run high grey level emphasis, long run high grey level emphasis, run length variance, SZM high grey level emphasis, DZM high grey level emphasis, high grey level count emphasis and dependence count percentage. For GLCM and GLRLM features, 19/25 and 14/16 features showed excellent pair-wise ICCs varying aggregation methods (averaged and merged) on the same dimensional features (2D, 2.5D or 3D). Different discretization levels and partial volume corrections lead to consistent robustness of textural features affected by aggregation methods. Conclusion: Different dimensional features with the same aggregation methods showed worse robustness compared with the same dimensional features with different aggregation methods. Different discretization levels and PVC algorithms had a negligible effect on the percent of ICC categories of all texture features.

1. Introduction

Positron emission tomography (PET) has been established as a powerful quantitative imaging technique [1]. For PET imaging, standardized uptake value (SUV) based parameters are the most commonly adopted image biomarkers (such as SUVmean and SUVmax) in routine clinical application for diagnostic and prognostic assessment [2]. Due to good reproducibility or robustness, SUV parameters have been extensively used in multicenter studies [1,3], but these parameters are limited by the insufficient descriptions of tumor heterogeneity [4]. With the development of radiomics, the image biomarkers are extended beyond the SUV to a large set of radiomic features, which are achieved by using statistical, shape-based, and/or textural features, including second- and higher-order methods of increasing complexity [5,6,7,8]. The reproducibility and robustness of radiomic features are very critical to radiomic model construction for supporting clinical translation [9].
Previous studies have extensively studied the reproducibility and robustness of radiomic features affected by various factors [10,11]. These factors can be divided into three classes, namely image acquisition factors, radiomic- or texture-specific image processing and feature computation. Studies on PET image acquisition factors include patient status (e.g., motion, respiratory motion) [12,13,14], image acquisition modes (e.g., scanner difference, test-retest repeatability) [15], image reconstruction [16,17,18], tumor delineation [19,20]. Studies on radiomic- or texture-specific image processing include image interpolation or voxel harmonization [21], and discretization methods or levels [22,23]. Although these studies analyzed and offered recommendations for optimizing features robustness (e.g., removing unstable features, using feature-wise preprocessing, large tumor volume definition), their investigated factors to determine radiomic features are still subject to variability, and did not involve the fundamental computation of features [24].
The computation of radiomic features, even with the same feature names, may be implemented differently in radiomic studies [25]. It not only depends on the parameters setting (such as symmetry, averaging strategy and distance) of texture matrices (e.g., grey-level co-occurrence matrix, GLCM; neighborhood grey tone difference matrix, NGTDM; and grey level zone size matrix, GLZSM), but also depends strongly on the aggregation methods of matrices. The concept of aggregation describes the process that combines multiple matrices (e.g., extracted from each single slice) to merge as one matrix for comprehensive representation of 3D tumor volume. Different parameter settings and aggregation methods can lead to different texture feature distributions and clinical characterization [26]. Hatt et al., first compared the strategies of averaging features from 13 independent matrices from each of 13 directions, versus directly calculating from one matrix considering 13 directions simultaneously [27]. Our previous study comprehensively investigated the impact of parameters (including symmetry, averaging strategy, and neighborhood extent or window size) of 3D textural matrix design on robustness and diagnostic performance of the resulting features [28]. Despite many significant suggestions, it only focused on parameter settings as generation of texture matrices.
In fact, textural features can be extracted from the largest cross-sectional (axial) slice of the tumour boundary (2D textures) or extracted from the entire tumour volume (3D textures). As reviewed by Reiazi et al. [29], although most clinical studies (about 62.2%) used 3D radiomics features, there are still 17.8% of studies that applied 2D features for radiomics analysis, and 20% of studies did not report any details on the dimension of textures. In [30], the author reported the better quantification of tumor heterogeneity in predicting clinical outcomes of 3D radiomic features compared to 2D features. A recent study [31] not only extracted radiomics features from 2D slice and 3D volume, but also averaged the features from each slice separately within the 3D tumor volume, namely 2.5D aggregation strategy. Their results showed that the overall performance of features under 2.5D aggregation strategy did not exceed based single 2D slice. Thus, there remains a long-time divergence about whether to use 2D or 3D annotations in specific radiomics-based research.
The Image Biomarker Standardization Initiative (IBSI) sought to standardize the radiomic analysis for improved clinical translation and provided a guideline to standardize the definition and computation of 174 radiomic features, including 95 texture features of six feature families under different aggregation methods [24]. It gives the standardized parameter settings and aggregation methods of matrices. For gray level co-occurrence matrix (GLCM) and Gray level run length matrix (GLRLM), the IBSI provides six aggregation methods for different dimensional features (2D, 2.5D and 3D). For gray level size zone matrix (GLSZM), gray level distance-zone matrix (GLDZM), neighborhood gray tone difference matrix (NGTDM) and neighborhood gray level dependence matrix (NGLDM), it provides three aggregation methods (2D, 2.5D and 3D). However, the impact of different aggregation methods on the computation of texture features has not been thoroughly evaluated.
The objective of the present work was therefore to evaluate the impact of aggregation method on the computation of texture features in [18F]FDG PET imaging of nasopharyngeal carcinoma patients. The consensus of two expert physicians for delineation was used as reference. This study investigates, in a comprehensive manner, the impact of variations in a range of aggregation methods used in the generation of 95 texture features. Robustness was assessed by the pair-wise intra-class correlation coefficient (ICC). The effect of different discretization and partial volume correction on the aggregation methods was further evaluated.

2. Materials and Methods

2.1. Patients and PET/CT Imaging

128 NPC patients (103 men and 25 women; mean age, 47.7 ± 13.2 years) with pathology confirmation (scan dates from January 2012 to August 2016) were enrolled in this study. The characteristics of 128 patients were summarized in Table 1. Under the European Association of Nuclear Medicine (EANM) procedure guidelines [1], patients underwent a pre-treatment whole-body [18F]FDG PET/CT scanning on a Biograph-128 mCT scanner (Siemens Healthineers, Shenzhen, Guangzhou). Furthermore, all patients underwent an additional local tumor imaging. To minimize the motion blur of the PET/CT system, patients were encouraged to lie still during the examination. Patients fasted for 6 h before radiotracer injection, and 306–468 MBq (8.27–12.65 mCi) of [18F]FDG (~150 μCi/kg of body weight) was administered intravenously for about 60 min imaging (mean: 58 ± 5 min, range [52–66 min]). CT scans (80 mA, 120 KVp) were used for attenuation correction. PET images were reconstructed using standard ordered-subset expectation maximization (OSEM) with three iterations and 21 subsets. PET images were reconstructed with matrix size of 200 × 200 and voxel size of 4.07 × 4.07 × 5 mm3. Then the PET image was cubic interpolated to the same dimension as CT matrix size of 512 × 512 and voxel sizes of 0.98 × 0.98 × 3 mm3 for image registration and fusion. The interpolated images were used for tumor delineation and radiomics features extraction. The body weight SUV images were calculated from original PET images with the formula as follows (Formula (1)):
S U V ( g / m L )   = t i s s u e   a c t i v i t y ( B q / m L ) i n j e c t e d   d o s e ( B q ) / b o d y   w e i g h t ( g )
where the tissue activity was decay-corrected to account for the time elapsed between injection and acquisition.

2.2. Tumor Delination, Partial Volume Correction and Intensity Discretization

For each patient, the PET/CT fusion images were displayed in ITK-SNAP software v.3.8 (http://www.itksnap.org (accessed on 29 January 2023)) with the horizontal-, coronal- and sagittal-views for visualization. Then two radiologists with 3 and 10 years of experience independently performed the delineation of primary tumors. High consistency with a median Dice Similarity Coefficient (DSC) of 0.87 was derived from the two 3D primary tumors. Leijenaar et al., first validated that features were more robust to inter-observer variability compared to test-retest variability [32]. Our previous study also validated that features were more robust with respect to different segmentation methods than discretization [23]. Therefore, the intersections of the two manual delineations by the two radiologists were used for radiomics analysis.
In order to evaluate the effect of partial volume correction on the robustness of aggregation methods, the Van Cittert deconvolution algorithm was applied on the original SUV images [33]. The Van Cittert iteration to estimate SUV is given as follows (Formula (2)):
S U V ( i ) = S U V ( i 1 ) + α S U V ( 0 ) σ S U V ( i 1 ) , S U V ( i ) 0
where S U V ( i ) is the i th estimation of S U V , and S U V ( 0 ) is the original SUV image, α is a parameter of order 1 that affects the convergence rate, σ is a normalized point spread function, is the three-dimensional convolution operator. The only constraint on S U V ( i ) is that each voxel value must be positive. The VC deconvolution was applied on the SUV image with fixed 10 iterations and varied σ (from 1 to 5).
The SUVs of each VOI were then resampled into D bins [22], as follows (Formula (3)):
S U V D ( x ) = 1 S U V ( x ) = S U V min D × S U V ( x )   S U V min S U V max S U V min o t h e r w i s e
where S U V ( x ) is the SUV of voxel x , S U V D ( x ) is the resampled value of voxel x . The SUV resolution equals ( S U V max S U V min ) / D . Discretization was performed with a fixed bin number (e.g., D = 8, 16, 32, 64 and 128), namely FBN. The discretization step is necessary to generate texture matrices, and bin number determinates the intensity resolution and the corresponding computation complexity of texture matrices.

2.3. Matrix Construction with Different Aggregation Methods

The flowchart of the study design is shown in Figure 1. A schematic example of feature aggregation is shown in Figure 1b. According to IBSI guideline [24], the gray level co-occurrence matrix (GLCM) is generated by considering the occurrence of two pixels with intensity i and j separated by fixed distance D = 1 in direction θ . To improve rotational invariance, GLCM features are computed by aggregating information from the different underlying directional matrices. Six specific aggregation methods of GLCM and GLRLM are defined as follows:
  • Features are computed from each 2D directional matrix and then averaged over 2D directions and slices, namely 2Daveraged;
  • Features are computed from a single matrix after merging 2D directional matrices per slice, and then averaged over slices, namely 2Ds-meraged;
  • Features are computed from a single matrix after merging 2D directional matrices per direction, and then averaged over directions, namely 2Dd-meraged;
  • The feature is computed from a single matrix after merging all 2D directional matrices, namely 2Dmeraged;
  • Features are computed from each 3D directional matrix and averaged over the 3D directions, namely 3Daveraged;
  • The feature is computed from a single matrix after merging all 3D directional matrices, namely 3Dmeraged.
Three aggregation methods in calculating GLSZM, GLDZM, NGTDM, and NGLDM are defined as follows:
  • Features are computed from 2D matrices and averaged over slices, namely 2D;
  • The feature is computed from a single matrix after merging all 2D matrices, namely 2.5D;
  • The feature is computed from a 3D matrix, namely 3D.

2.4. Feature Extraction

In short, after the construction of matrices with different aggregation methods as shown in Table 2, 25, 16, 16, 16, 5 and 17 features were extracted from GLCM, GLRLM, GLSZM GLDZM NGTDM and NGLDM under each aggregation method, respectively.
Table 1. Clinical characteristics of the NPC patients.
Table 1. Clinical characteristics of the NPC patients.
CharacteristicAll Patients
Patient No.128
Age (year), mean ± SD47.7 ± 13.2
Sex, no.(%)
Male103 (80.5%)
Female25 (19.5%)
AJCC stage, no.(%)
I4 (3.1%)
II11 (8.6%)
III49 (38.3%)
IV64 (50%)
MATV50.9 ± 86.4
SUVmax15.4 ± 7.77
SUVmean7.95 ± 3.75
CTmean50.4 ± 55.1

2.5. Robustness Evaluation by Intra-Class Coefficient (ICC)

In order to analyze the robustness of feature values calculated with different aggregation methods, the intra-class coefficient (ICC) [34] was adopted (Formula (4)):
ICC = BMS WMS BMS + W M S
where BMS and WMS were the between-subjects and within-subjects mean squares, obtained via Kruskal-Wallis one-way ANOVA. ICC ranges from 0 to 1. The higher the ICC, the more robust the feature and an ICC of one indicates perfect robustness (i.e., identical feature values). The textural features were categorized as having poor (ICC < 0.5), moderate (0.5 ≤ ICC < 0.75), good (0.75 ≤ ICC < 0.9), or excellent (ICC ≥ 0.9) robustness.
We first used pair-wise ICC to investigate the impact of aggregation methods as used for the generation of texture features on their robustness. The features were extracted with fixed discretization level (FBN = 128) from original PET images. The overall ICC calculated by considering all the aggregation methods for each feature was used to validate that if our results will be affected by different discretization levels (FBN = 8, 16, 32, 64, 128) and partial volume correction (varying normalized point spread function).
Furthermore, under different discretization levels and partial volume corrections, the relationships between features with excellent robustness and MATV were studied to assess the potential complementarity of texture features to MATV. Since such relationships are nonlinear and these parameters frequently are not normally distributed, Spearman rank correlation (rs) was exploited in this study.

3. Results

3.1. Robustness of GLCM Features Affected by Aggregation Methods

Ideally, there are six aggregation methods for GLCM, and corresponding C 6 2 = 15 pairs of aggregation methods. Considering that some pairs of aggregation methods (simultaneously with different feature matrix dimension and averaged/merged strategies) are meaningless, we only evaluated nine pairs of aggregation methods with the same feature matrix dimensions or same averaged/merged strategies. Figure 2 illustrates pair-wise ICC values of GLCM texture features as extracted from FBN = 128 for nine pairs of aggregation methods. Twenty-one features showed ICC values of larger than 0.75 (19/21 features depicted ICC values of nearly 1) among the three combination strategies of six aggregation methods (2Daveraged-2Ds_merged, 2.5Dd_merged-2.5Dmerged, and 3Daveraged-3Dmerged), while four features showed ICC values lower than 0.75 as shown in Figure 2a. Thus, the effect of aggregation methods on the same dimension (2D, 2.5D, and 3D) feature was negligible except for four features (Joint entropy, Angular second moment, Information correlation1 and correlation2). Thirteen features showed ICC values of larger than 0.75 among the three combination strategies of six aggregation methods (2Daveraged-2.5Dd_merged, 2Daveraged-3Daveraged, and 2.5Dd_merged-3Daveraged), while twelve features showed parts or all ICC values lower than 0.75 as shown in Figure 2b. Thus, the effect of feature matrix dimension on similar aggregation methods (d_merged can be considered as an averaged strategy) was not negligible. The same conclusion could be derived from Figure 2c.
Figure 3 illustrates ICC heat map of GLCM features. For features extracted with different aggregation methods for the same dimension (Figure 3 left), we can see that the effect of aggregation methods on feature values was negligible for most features (with excellent robustness) except for four features (joint entropy, angular second moment, information correlation1 and correlation2). For features extracted with the same aggregation methods (d_merged can be considered as an averaged strategy) for the different dimension (Figure 3 middle and right), it is worth noting that 3/25 features (joint average, sum average, autocorrelation) showed excellent robustness, and 10/25 features (difference average, difference variance, difference entropy, contrast, dissimilarity, inverse difference, inverse difference moment, inverse difference moment normalized, inverse variance) showed good or excellent robustness.

3.2. Robustness of GLRLM Features Affected by Aggregation Methods

Figure 4 illustrates pair-wise ICC values of GLRLM texture features as extracted from FBN = 128 for nine pairs of aggregation methods. Comparing the aggregation methods in same dimension (2Daveraged vs. 2Ds_merged, 2.5Dd_merged vs. 2.5Dmerged, and 3Daveraged vs. 3Dmerged), 13/16 features depicted high robustness with the ICC close to one as shown in Figure 4a. In particular, all 16 features have the nearly the same values between the aggregation methods of 2.5Dd_merged and 2.5Dmerged, which indicated a negligible effect of aggregation methods. Seven features showed an ICC larger than 0.75 when considering the three pairs of 2Daveraged vs. 2.5Dd_merged, 2Daveraged vs. 3Daveraged, 2.5Dd_merged vs. 3Daveraged (Figure 4b, while 9/16 features showed moderate robustness with an ICC lower than 0.75. The results suggested that the different dimensions of aggregation methods in calculating textural matrices led to the relatively large difference of features values. Furthermore, we found that the robustness of GLRLM features for the same dimension affected by different aggregation methods is closed, as shown in Figure 4b,c.
Figure 5 illustrates the ICC heat map of GLRLM features. For features extracted with different aggregation methods for the same dimension (Figure 5 left), we can see that the effect of aggregation methods on the same dimension (2D, 2.5D, and 3D) features was negligible. There are only 3/16 features showed poor or moderate robustness. For features extracted with same aggregation methods (d_merged can be considered as an averaged strategy) for different dimensions (Figure 5 middle and right), we can also note that there are 5/16 features (long runs emphasis, high grey level run emphasis, short run high grey level emphasis, long run high grey level emphasis, run length variance) showed excellent robustness, and 3/16 features (short runs emphasis, long run low grey level emphasis, run length non-uniformity normalized) showed good or excellent robustness.

3.3. Robustness of GLSZM, GLDZM, NGLDM, NGTDM Features Affected by Aggregation Methods

Figure 6 shows pairwise ICC for GLSZM (top-left), GLDZM (top-right), NGLDM (bottom-left) and NGTDM (bottom-right) features extracted from three aggregation strategies (2D, 2.5D, 3D), denoted as 2D-2.5D, 2D-3D, and 2.5D-3D. Nine of the 16 features of GLSZM (Figure 6 top-left) showed ICC values of larger than 0.75 among the combination of two aggregation methods (2D-2.5D). Another 9/16 features of GLSZM showed ICC values of larger than 0.75 among the combination of two aggregation methods (2.5D-3D). Only 3/16 features of GLSZM showed ICC values of larger than 0.75 among the combination of two aggregation methods (2D-3D). This indicates that the effect of the feature matrix dimensions (2D-3D) is larger than 2D-2.5D and 2.5D-3D. Nine out of 16 features of GLDZM (Figure 6 top-right) showed ICC values of larger than 0.75 among the combination of two aggregation methods (2D-2.5D). Another 10/16 features of GLDZM (Figure 6 top-right) showed ICC values of larger than 0.75 among the combination of two aggregation methods (2.5D-3D). Only 3/16 features of GLDZM showed ICC values of larger than 0.75 among the combination of two aggregation methods (2D-3D). Similar conclusions are derived that the effect of the feature matrix dimensions (2D-3D) is larger than 2.5D-3D and 2D-2.5D. Figure 6 shows the pairwise ICC for each NGTDM (bottom-right) and NGLDM (bottom-left) features extracted from three aggregation strategies (2D, 2.5D, 3D), denoted as 2D-2.5D, 2D-3D, and 2.5D-3D. There are only two features and no features showed ICC values of larger than 0.9 for NGLDM (bottom-left) and NGTDM (bottom-right) features.
Figure 7 illustrates the ICC heat map of GLSZM (top-left), GLDZM (bottom-left), NGTDM (top-right) and NGLDM (bottom-right) features. Based on the ICC value, textural features can be divided into sub-groups with excellent, good, moderate and poor robustness, respectively. There are only five features (high grey level emphasis from GLSZM, high grey level emphasis and small distance high grey level emphasis from GLDZM, high grey level count emphasis and dependence count percentage from NGLDM) that showed excellent robustness. Six features showed excellent or good robustness, including three GLCM features (small zone emphasis, small zone high grey level emphasis, zone size non-uniformity normalized), two GLDZM features (small distance emphasis, Zone distance non-uniformity normalized) and one NGLDM feature (low grey level count emphasis).

3.4. Effect of Discretization and Partial Volume Correction on Feature’s Robustness

To evaluate the effect of discretization and partial volume effect on the percent of ICC categories of all texture features, instead overall ICC was computed of the pair-wise ICC to simplify the computation. From Figure 8a, different discretization levels lead to consistent robustness of textural features affected by aggregation methods. Similarly, different FWHM parameters of PVC also hardly change the robustness of features. The results showed that different discretization levels and PVC algorithms have a negligible effect on the percent of ICC categories of all texture features.
The results of overall ICC for each feature family are shown in Figure 9. For all feature families, discretization levels exhibit a larger effect than partial volume correction with larger changes of relative distribution of the overall ICC per ICC category. For GLCM and GLRLM based features, most features had an excellent or good agreement between different aggregation methods with increasing FBN from 8 to 128. For GLSZM, GLDZM, NGTDM and NGLDM, most features had a moderate or poor agreement between different aggregation methods with increasing FBN from 8 to 128. For all feature families, most features had a fixed agreement between different aggregations with PVC images (FWHM from 1 to 5), and only a few features (GLRLM) had decreased excellent agreement with increasing FWHM of PVC.

3.5. Effect of Discretization and Partial Volume Correction on Correlation between Features with Excellent Robustness and MATV

To avoid confusion, an absolute Spearman rank correlation is reported and correlation direction results can be found in Figure 10. First, significant details on the gray-level distribution will be preserved when using a discretization of FBN more than 64. We can see that the discretization has an important impact on the correlation between features (run length variance, long run emphasis, long run high grey level emphasis, short run high grey level emphasis) and MATV, decreasing with the FBN increasing. The correlation of seven features (joint average, sum average, autocorrelation, high grey level run emphasis, SZM high grey level emphasis, DZM high grey level emphasis, high grey level count emphasis) was insensitive to the quantization values. Only the correlation between feature (small distance high grey level emphasis) and MATV increased with the FBN increasing. The correlation between features and MATV ranged from 0.04 to 0.82, suggesting that a substantial amount of complementary information with respect to MATV may be found in some of these texture features.
As shown in Figure 11, the high uptake region will be enhanced with increasing FWHM of PVC. However, the PVC has a negligible effect on the correlation between all features and MATV, and all the rs values have a very small variation with increasing FWHM of PVC.

4. Discussion

The present study focused on the internal factor, namely the aggregation methods, for the construction of texture feature matrices. We assessed the impact of aggregation methods as used in the generation of texture features on their robustness in nasopharyngeal PET/CT. Furthermore, the effect of different discretization and partial volume correction on the distribution of ICC was further evaluated.

4.1. The Research Value of This Study

Different software tools can produce noticeably different values for the same image biomarker. It would be difficult to compare, for consistency and standardization, the radiomic features extracted by different software implementations and details. The IBSI has made great progress in minimizing such difference, and variance due to software will no longer be important [24]. As reviewed in [29], one ambiguity in radiomics analysis was the aggregation methods of feature computation. Though IBSI gives definite aggregation methods for each feature family, one question has not been answered is the impact of the aggregating strategy for texture features on their robustness performance. The present study aims to investigate the impact of aggregation methods as used for the generation of texture features on their robustness of nasopharyngeal carcinoma (NPC) based on 18F-FDG PET/CT images. The objective was two-fold. First, various features have been selected for specific task in a previous study [35,36,37]. However, the same features could have different values in different studies, and present results can give guidelines for choosing the robust features for different aggregation methods. Second, the features with excellent and good robustness could be used for stable model construction, which is one of the possible research topics of radiomics [38].

4.2. The Effect of Different Aggregation Methods and Feature Dimension on the Feature’s Robustness

For GLCM and GLRLM, the effect of different aggregation methods (averaged and merged) on the same dimensional features (2D, 2.5D or 3D) was negligible despite the fact that six features (joint entropy, angular second moment, information correlation 1, information correlation 2, grey level non-uniformity, run length non-uniformity) were not robust. This conclusion almost agrees with our previous study [28], that the effect of combination strategies on the 3D feature is negligible or the 3D feature was robust to different combination strategies with higher ICC. This suggests that most of the same dimensional GLCM and GLRLM based features can be exploited by ignoring the aggregation methods except for the after-mentioned six features.
The effect of the same aggregation method (averaged or merged) on different dimensional features (2D-2.5D, 2D-3D, 2.5D-3D) showed similar robustness performance. Obviously, different dimensional features with same aggregation methods showed worse robustness compared with the same dimensional features with different aggregation methods. This conclusion was first derived from GLCM and GLRLM based features, and then validated from the other feature families. In the present work, the clinical usefulness of the feature has been evaluated, and the reason is that the choice of clinically useful features should be based on the specific task (i.e., diagnosis, prognosis, etc.), as in our previous study [28]. The aim of this study is to provide a guideline/reference for researchers to choose a stable/robust feature when varying the aggregation methods.

4.3. Statistical Metric and Cutoff Values

Feature robustness was assessed by the intra-class correlation coefficient (ICC). The ICC metric is appropriate where one expects a strong correlation with a given class but a weak correlation between classes. However, the determination of a robust feature is depended on the specific cutoff values of ICC, which segregate stable from unstable features. Different studies give different threshold values to define a feature as highly reproducible, such as ICC ≥ 0.9 [18,39] and ICC ≥ 0.8 [23,28] in our previous study. This leads to differences in the individual features and there was no universal consensus. In the present study, multiple thresholds were simultaneously given and features were categorized as having poor (ICC < 0.5), moderate (0.5 ≤ ICC < 0.75), good (0.75 ≤ ICC < 0.9), or excellent (ICC ≥ 0.9) robustness, as adopted in [40]. This method can provide the distribution of feature robustness from poor to excellent other than simply distinguish the feature robustness based on single thresholds.

4.4. The Effect of Discretization and Partial Volume Corrections on the Percent of ICC Categories of All Texture Features

Discretization reduces the infinite possible number of intensity values to a finite set and effectively reduces image noise [41]. Commonly, there are two discretization methods, namely fixed bin number (FBN) and fixed bin size (FBS) [42]. The FBN methods divide the image SUV range into fixed bin number, this results in discretized images with varying bin sizes, depending on the SUV range of each ROI. The FBS methods divide the image SUV range into uniform bin size and maintain a constant intensity resolution across all tumor images [22]. In the present study, FBN = 128 was adopted in the experiment design as in [43], which is enough to describe the grey level of a tumor and has acceptable computation complexity. Of course, we also set the FBN from 8 to 128 to evaluate the effect of discretization on the percent of ICC categories of all texture features. The results showed that different discretization levels have a negligible effect on the percent of ICC categories of all texture features. For each feature family, there is a small disagreement of ICC categories between different discretization levels; the main reason is that the discretization of FBN ≤ 32 will not be enough to describe the grey level distribution, which will finally affect the feature computation [44].
Accurate quantification of metabolic volumes of PET is hampered by partial volume effects, leading to underestimation of the standardized uptake value (SUV) [45]. Various partial volume correction methods have been advocated for [46], mainly including enhancing spatial resolution during reconstruction and post reconstruction image restoration. In the present study, the classic Van Cittert (VC) deconvolution technique was adopted as a partial volume correction [33], which is relatively effective and simple to apply with only one convolution kernel parameter compared to reconstruction-based methods. We also set the kernel parameters from 1 to 5, which means simultaneously increasing the SUV uptake and noise level. Thus, we can evaluate the effect of partial volume corrections on the percent of ICC categories of all texture features. It is worth noting that the Lucy–Richardson method is as easy to implement as the Van Cittert method (the only difference is in the iteration step which is multiplicative in the Lucy–Richardson method and additive in the Van Cittert method) [47]. In our previous study, we also validate that there is no significant difference between these two methods [48]. Furthermore, there are also methods incorporating regularization or denoising term into LR or VC correction to control the noise, such as wavelet-based denoising [49], HYPR denoising [50], MR-guided regularization [51,52] and parallel level set regularization [53]. However, these methods will increase the additional regularization parameters and the optimization of these parameters is relatively difficult to achieve in clinical application.

4.5. Robustness and Discriminability of Features

In the present study, we aimed to investigate the impact of aggregation methods as used for the generation of texture features on their robustness of nasopharyngeal carcinoma (NPC) based on 18F-FDG PET/CT images. Thus, this study focused on the robustness or stability of features rather than the discriminability of features as in [15,16,17,18]. The robustness or stability of features was assessed by the intra-class correlation coefficient (ICC). The discriminability of the feature depends on the specific clinical task and machine learning algorithms, including feature selection methods and classifiers. In our previous study, we investigated 42 machine learning methods cross-combined from six feature selection methods and seven classifiers for radiomics-based differentiation of local recurrence versus inflammation from post-treatment NPC PET/CT images [54]. The results showed that three combinations from a Fisher score with random forests (RF), k-nearest neighborhood (kNN) and support vector machine (SVM) had a comparatively higher performance in distinguishing recurrence and inflammation. As a univariate selection method, the Fisher score can deal with feature relevance unable to detect feature redundancy. We also noted that a new feature selection algorithm, namely heat transfer search, which is proposed in [55] and applied in hybrid perovskites thin films morphology identification [56]. The combinations of four machine learning methods with a heat transfer search were evaluated and compared with a cuckoo search algorithm. The results showed that the features selected through the heat transfer search algorithm are effective in identifying thin film morphological images with machine learning models. Though the heat transfer search is applied in a different context, it has great potential application in radiomics. In the future, we will simultaneously evaluate the robustness and discriminability of features for a specific clinical task, and incorporate the novel heat transfer search into model construction.

4.6. Limitations

Our study had some limitations. First, we had 128 NPC patients in a single center and a relatively small data set to validate the robustness of the features. Ideally, multi-center and a large data set should be used for validation of the reliability of the results. In the present study, we used multiple preprocessing methods, including different discretization levels and partial volume corrections, to validate the consistency of our results. Secondly, we only considered different aggregation methods for all the features, and the parameter setting was a default as suggested in IBSI, which can make the conclusions more generalizable for most of researchers.

5. Conclusions

In the present study, we enrolled 128 NPC patients and extracted six feature families and a total of 95 texture features under different aggregation methods. Robustness was assessed by the pair-wise intra-class correlation coefficient (ICC). Only 12 features with excellent robustness with varying aggregation methods were selected from 95 textural features. Different dimensional features with the same aggregation methods showed worse robustness compared with the same dimensional features with different aggregation methods. Different discretization levels and PVC algorithms have a negligible effect on the percent of ICC categories of all texture features.

Author Contributions

Conceptualization, L.P. and W.C.; methodology, L.P.; software, L.P.; validation, H.X. and L.P.; formal analysis, L.P. and H.X.; investigation, L.P.; resources, W.L. and H.X.; data curation, L.P.; writing—original draft preparation, L.P.; writing—review and editing, L.P., L.L. and W.L.; visualization, L.P. and W.L.; supervision, L.L. and W.C.; project administration, L.L.; funding acquisition, W.L. and L.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Guangdong Basic and Applied Basic Research Foundation under grants 2019A1515011104, 2020A1515110683, 2021A1515011676, and the Science and Technology Program of Guangdong Province 2022A0505050039.

Institutional Review Board Statement

No ethical committee approval was required, given the retrospective nature of this study of previously anonymized data.

Informed Consent Statement

Patient informed consent was waived.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors would like to appreciate Quanshi Wang and Qingyu Yuan for the delineation of the NPC tumor.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Boellaard, R.; Delgado-Bolton, R.; Oyen, W.J.; Giammarile, F.; Tatsch, K.; Eschner, W.; Verzijlbergen, F.J.; Barrington, S.F.; Pike, L.C.; Weber, W.A. FDG PET/CT: EANM procedure guidelines for tumour imaging: Version 2.0. Eur. J. Nucl. Med. Mol. I. 2015, 42, 328–354. [Google Scholar] [CrossRef] [PubMed]
  2. Lodge, M.A. Repeatability of SUV in oncologic 18F-FDG PET. J. Nucl. Med. 2017, 58, 523–532. [Google Scholar] [CrossRef] [PubMed]
  3. Boellaard, R. Standards for PET image acquisition and quantitative data analysis. J. Nucl. Med. 2009, 50, 11S–20S. [Google Scholar] [CrossRef] [PubMed]
  4. Tomaszewski, M.R.; Gillies, R.J. The biological meaning of radiomic features. Radiology 2021, 298, 505–516. [Google Scholar] [CrossRef]
  5. Kumar, V.; Gu, Y.; Basu, S.; Berglund, A.; Eschrich, S.A.; Schabath, M.B.; Forster, K.; Aerts, H.J.; Dekker, A.; Fenstermacher, D. Radiomics: The process and the challenges. Magn. Reson. Imaging 2012, 30, 1234–1248. [Google Scholar] [CrossRef]
  6. Lambin, P.; Rios-Velazquez, E.; Leijenaar, R.; Carvalho, S.; Van Stiphout, R.G.; Granton, P.; Zegers, C.M.; Gillies, R.; Boellard, R.; Dekker, A. Radiomics: Extracting more information from medical images using advanced feature analysis. Eur. J. Cancer 2012, 48, 441–446. [Google Scholar] [CrossRef]
  7. Aerts, H.J.; Velazquez, E.R.; Leijenaar, R.T.; Parmar, C.; Grossmann, P.; Carvalho, S.; Bussink, J.; Monshouwer, R.; Haibe-Kains, B.; Rietveld, D. Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nat. Commun. 2014, 5, 4006. [Google Scholar] [CrossRef]
  8. Mayerhoefer, M.E.; Materka, A.; Langs, G.; Häggström, I.; Szczypiński, P.; Gibbs, P.; Cook, G. Introduction to radiomics. J. Nucl. Med. 2020, 61, 488–495. [Google Scholar] [CrossRef]
  9. Welch, M.L.; Mcintosh, C.; Haibe-Kains, B.; Milosevic, M.F.; Wee, L.; Dekker, A.; Huang, S.H.; Purdie, T.G.; O’Sullivan, B.; Aerts, H.J. Vulnerabilities of radiomic signature development: The need for safeguards. Radiother. Oncol. 2019, 130, 2–9. [Google Scholar] [CrossRef]
  10. Zwanenburg, A. Radiomics in nuclear medicine: Robustness, reproducibility, standardization, and how to avoid data analysis traps and replication crisis. Eur. J. Nucl. Med. Mol. Imaging 2019, 46, 2638–2655. [Google Scholar] [CrossRef]
  11. Hatt, M.; Krizsan, A.K.; Rahmim, A.; Bradshaw, T.J.; Costa, P.F.; Forgacs, A.; Seifert, R.; Zwanenburg, A.; El Naqa, I.; Kinahan, P. Joint EANM/SNMMI Guideline on Radiomics in Nuclear Medicine. Eur. J. Nucl. Med. Mol. Imaging 2022, 50, 352–373. [Google Scholar] [CrossRef]
  12. Xu, H.; Lv, W.; Zhang, H.; Ma, J.; Zhao, P.; Lu, L. Evaluation and optimization of radiomics features stability to respiratory motion in 18F-FDG 3D PET imaging. Med. Phys. 2021, 48, 5165–5178. [Google Scholar] [CrossRef]
  13. Hosseini, S.A.; Shiri, I.; Hajianfar, G.; Bahadorzadeh, B.; Ghafarian, P.; Zaidi, H.; Ay, M.R. Synergistic impact of motion and acquisition/reconstruction parameters on 18F-FDG PET radiomic features in non-small cell lung cancer: Phantom and clinical studies. Med. Phys. 2022, 49, 3783–3796. [Google Scholar] [CrossRef]
  14. Carles, M.; Torres-Espallardo, I.; Alberich-Bayarri, A.; Olivas, C.; Bello, P.; Nestle, U.; Martí-Bonmatí, L. Evaluation of PET texture features with heterogeneous phantoms: Complementarity and effect of motion and segmentation method. Phys. Med. Biol. 2016, 62, 652–668. [Google Scholar] [CrossRef]
  15. Galavis, P.E.; Hollensen, C.; Jallow, N.; Paliwal, B.; Jeraj, R. Variability of textural features in FDG PET images due to different acquisition modes and reconstruction parameters. Acta. Oncol. 2010, 49, 1012–1016. [Google Scholar] [CrossRef]
  16. Yan, J.; Chu-Shern, J.L.; Loi, H.Y.; Khor, L.K.; Sinha, A.K.; Quek, S.T.; Tham, I.W.K.; Townsend, D. Impact of image reconstruction settings on texture features in 18F-FDG PET. J. Nucl. Med. 2015, 56, 1667–1673. [Google Scholar] [CrossRef]
  17. Van Velden, F.H.; Kramer, G.M.; Frings, V.; Nissen, I.A.; Mulder, E.R.; de Langen, A.J.; Hoekstra, O.S.; Smit, E.F.; Boellaard, R. Repeatability of radiomic features in non-small-cell lung cancer [18F] FDG-PET/CT studies: Impact of reconstruction and delineation. Mol. Imaging Biol. 2016, 18, 788–795. [Google Scholar] [CrossRef]
  18. Shiri, I.; Rahmim, A.; Ghaffarian, P.; Geramifar, P.; Abdollahi, H.; Bitarafan-Rajabi, A. The impact of image reconstruction settings on 18F-FDG PET radiomic features: Multi-scanner phantom and patient studies. Eur. Radiol. 2017, 27, 4498–4509. [Google Scholar] [CrossRef]
  19. Hatt, M.; Tixier, F.; Cheze Le Rest, C.; Pradier, O.; Visvikis, D. Robustness of intratumour 18F-FDG PET uptake heterogeneity quantification for therapy response prediction in oesophageal carcinoma. Eur. J. Nucl. Med. Mol. Imaging 2013, 40, 1662–1671. [Google Scholar] [CrossRef]
  20. Pfaehler, E.; Beukinga, R.J.; de Jong, J.R.; Slart, R.H.; Slump, C.H.; Dierckx, R.A.; Boellaard, R. Repeatability of 18F-FDG PET radiomic features: A phantom study to explore sensitivity to image reconstruction settings, noise, and delineation method. Med. Phys. 2019, 46, 665–678. [Google Scholar] [CrossRef] [Green Version]
  21. Yip, S.S.; Parmar, C.; Kim, J.; Huynh, E.; Mak, R.H.; Aerts, H.J. Impact of experimental design on PET radiomics in predicting somatic mutation status. Eur. J. Radiol. 2017, 97, 8–15. [Google Scholar] [CrossRef] [PubMed]
  22. Leijenaar, R.T.; Nalbantov, G.; Carvalho, S.; Van Elmpt, W.J.; Troost, E.G.; Boellaard, R.; Aerts, H.J.; Gillies, R.J.; Lambin, P. The effect of SUV discretization in quantitative FDG-PET Radiomics: The need for standardized methodology in tumor texture analysis. Sci. Rep. 2015, 5, 11075. [Google Scholar] [CrossRef] [PubMed]
  23. Lu, L.; Lv, W.; Jiang, J.; Ma, J.; Feng, Q.; Rahmim, A.; Chen, W. Robustness of radiomic features in [11C] choline and [18F] FDG PET/CT imaging of nasopharyngeal carcinoma: Impact of segmentation and discretization. Mol. Imaging Biol. 2016, 18, 935–945. [Google Scholar] [CrossRef] [PubMed]
  24. Zwanenburg, A.; Vallières, M.; Abdalah, M.A.; Aerts, H.J.; Andrearczyk, V.; Apte, A.; Ashrafinia, S.; Bakas, S.; Beukinga, R.J.; Boellaard, R. The image biomarker standardization initiative: Standardized quantitative radiomics for high-throughput image-based phenotyping. Radiology 2020, 295, 328–338. [Google Scholar] [CrossRef] [PubMed]
  25. Yip, S.S.; Aerts, H.J. Applications and limitations of radiomics. Phys. Med. Biol. 2016, 61, R150. [Google Scholar] [CrossRef]
  26. Hatt, M.; Tixier, F.; Pierce, L.; Kinahan, P.E.; Le Rest, C.C.; Visvikis, D. Characterization of PET/CT images using texture analysis: The past, the present… any future? Eur. J. Nucl. Med. Mol. Imaging 2017, 44, 151–165. [Google Scholar] [CrossRef]
  27. Hatt, M.; Majdoub, M.; Vallières, M.; Tixier, F.; Le Rest, C.C.; Groheux, D.; Hindié, E.; Martineau, A.; Pradier, O.; Hustinx, R. 18F-FDG PET uptake characterization through texture analysis: Investigating the complementary nature of heterogeneity and functional tumor volume in a multi–cancer site patient cohort. J. Nucl. Med. 2015, 56, 38–44. [Google Scholar] [CrossRef]
  28. Lv, W.; Yuan, Q.; Wang, Q.; Ma, J.; Jiang, J.; Yang, W.; Feng, Q.; Chen, W.; Rahmim, A.; Lu, L. Robustness versus disease differentiation when varying parameter settings in radiomics features: Application to nasopharyngeal PET/CT. Eur. Radiol. 2018, 28, 3245–3254. [Google Scholar] [CrossRef]
  29. Reiazi, R.; Abbas, E.; Famiyeh, P.; Rezaie, A.; Kwan, J.Y.; Patel, T.; Bratman, S.V.; Tadic, T.; Liu, F.; Haibe-Kains, B. The impact of the variation of imaging parameters on the robustness of computed tomography radiomic features: A review. Comput. Biol. Med. 2021, 133, 104400. [Google Scholar] [CrossRef]
  30. Zhao, B.; Tan, Y.; Tsai, W.; Qi, J.; Xie, C.; Lu, L.; Schwartz, L.H. Reproducibility of radiomics for deciphering tumor phenotype with imaging. Sci. Rep. 2016, 6, 1–7. [Google Scholar] [CrossRef] [Green Version]
  31. Meng, L.; Dong, D.; Chen, X.; Fang, M.; Wang, R.; Li, J.; Liu, Z.; Tian, J. 2D and 3D CT radiomic features performance comparison in characterization of gastric cancer: A multi-center study. IEEE J. Biomed. Health 2020, 25, 755–763. [Google Scholar] [CrossRef]
  32. Leijenaar, R.T.; Carvalho, S.; Velazquez, E.R.; Van Elmpt, W.J.; Parmar, C.; Hoekstra, O.S.; Hoekstra, C.J.; Boellaard, R.; Dekker, A.L.; Gillies, R.J. Stability of FDG-PET radiomics features: An integrated analysis of test-retest and inter-observer variability. Acta Oncol. 2013, 52, 1391–1397. [Google Scholar] [CrossRef]
  33. Teo, B.; Seo, Y.; Bacharach, S.L.; Carrasquillo, J.A.; Libutti, S.K.; Shukla, H.; Hasegawa, B.H.; Hawkins, R.A.; Franc, B.L. Partial-volume correction in PET: Validation of an iterative postreconstruction method with phantom and patient data. J. Nucl. Med. 2007, 48, 802–810. [Google Scholar]
  34. Bartko, J.J. The intraclass correlation coefficient as a measure of reliability. Psychol. Rep. 1966, 19, 3–11. [Google Scholar] [CrossRef]
  35. Foy, J.J.; Robinson, K.R.; Li, H.; Giger, M.L.; Al-Hallaq, H.; Armato, S.G. Variation in algorithm implementation across radiomics software. J. Med. Imaging 2018, 5, 44505. [Google Scholar] [CrossRef]
  36. Suarez-Ibarrola, R.; Basulto-Martinez, M.; Heinze, A.; Gratzke, C.; Miernik, A. Radiomics applications in renal tumor assessment: A comprehensive review of the literature. Cancers 2020, 12, 1387. [Google Scholar] [CrossRef]
  37. Sollini, M.; Antunovic, L.; Chiti, A.; Kirienko, M. Towards clinical application of image mining: A systematic review on artificial intelligence and radiomics. Eur. J. Nucl. Med. Mol. Imaging 2019, 46, 2656–2672. [Google Scholar] [CrossRef]
  38. Khorrami, M.; Bera, K.; Thawani, R.; Rajiah, P.; Gupta, A.; Fu, P.; Linden, P.; Pennell, N.; Jacono, F.; Gilkeson, R.C. Distinguishing granulomas from adenocarcinomas by integrating stable and discriminating radiomic features on non-contrast computed tomography scans. Eur. J. Cancer 2021, 148, 146–158. [Google Scholar] [CrossRef]
  39. Bogowicz, M.; Riesterer, O.; Bundschuh, R.A.; Veit-Haibach, P.; Hüllner, M.; Studer, G.; Stieb, S.; Glatz, S.; Pruschy, M.; Guckenberger, M. Stability of radiomic features in CT perfusion maps. Phys. Med. Biol. 2016, 61, 8736. [Google Scholar] [CrossRef]
  40. Somasundaram, A.; García, D.V.; Pfaehler, E.; Jauw, Y.W.; Zijlstra, J.M.; van Dongen, G.A.; Huisman, M.C.; de Vries, E.G.; Boellaard, R. Noise sensitivity of 89Zr-Immuno-PET radiomics based on count-reduced clinical images. EJNMMI Phys. 2022, 9, 16. [Google Scholar] [CrossRef]
  41. Tixier, F.; Hatt, M.; Le Rest, C.C.; Le Pogam, A.; Corcos, L.; Visvikis, D. Reproducibility of tumor uptake heterogeneity characterization through textural feature analysis in 18F-FDG PET. J. Nucl. Med. 2012, 53, 693–700. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  42. Lovinfosse, P.; Visvikis, D.; Hustinx, R.; Hatt, M. FDG PET radiomics: A review of the methodological aspects. Clin. Transl. Imaging 2018, 6, 379–391. [Google Scholar] [CrossRef]
  43. Lv, W.; Yuan, Q.; Wang, Q.; Ma, J.; Feng, Q.; Chen, W.; Rahmim, A.; Lu, L. Radiomics analysis of PET and CT components of PET/CT imaging integrated with clinical parameters: Application to prognosis for nasopharyngeal carcinoma. Mol. Imaging Biol. 2019, 21, 954–964. [Google Scholar] [CrossRef] [PubMed]
  44. Crandall, J.P.; Fraum, T.J.; Lee, M.; Jiang, L.; Grigsby, P.; Wahl, R.L. Repeatability of 18F-FDG PET radiomic features in cervical cancer. J. Nucl. Med. 2021, 62, 707–715. [Google Scholar] [CrossRef] [PubMed]
  45. Cysouw, M.C.; Kramer, G.M.; Schoonmade, L.J.; Boellaard, R.; de Vet, H.C.; Hoekstra, O.S. Impact of partial-volume correction in oncological PET studies: A systematic review and meta-analysis. Eur. J. Nucl. Med. Mol. Imaging 2017, 44, 2105–2116. [Google Scholar] [CrossRef]
  46. Erlandsson, K.; Buvat, I.; Pretorius, P.H.; Thomas, B.A.; Hutton, B.F. A review of partial volume correction techniques for emission tomography and their applications in neurology, cardiology and oncology. Phys. Med. Biol. 2012, 57, R119–R159. [Google Scholar] [CrossRef]
  47. Tohka, J.; Reilhac, A. Deconvolution-based partial volume correction in Raclopride-PET and Monte Carlo comparison to MR-based method. Neuroimage 2008, 39, 1570–1584. [Google Scholar] [CrossRef]
  48. Lu, L.; Hu, D.; Han, Y.; Gu, C.; Rahmim, A.; Ma, J.; Chen, W. Partial volume correction in small animal PET imaging incorporating total variation regularization. J. Nucl. Med. 2014, 55 (Suppl. 1), 374. [Google Scholar]
  49. Boussion, N.; Cheze Le Rest, C.; Hatt, M.; Visvikis, D. Incorporation of wavelet-based denoising in iterative deconvolution for partial volume correction in whole-body PET imaging. Eur. J. Nucl. Med. Mol. I. 2009, 36, 1064–1075. [Google Scholar] [CrossRef]
  50. Golla, S.S.V.; Lubberink, M.; van Berckel, B.N.M.; Lammertsma, A.A.; Boellard, R. Partial volume correction of brain PET studies using iterative deconvolution in combination with HYPR denoising. EJNMMI Res. 2015, 7, 36. [Google Scholar] [CrossRef]
  51. Yan, J.; Chu-Shern, J.L.; Townsend, D. MRI-guided brain PET image filtering and partial volume correction. Phys. Med. Biol. 2015, 60, 961–976. [Google Scholar] [CrossRef]
  52. Gao, Y.; Zhu, Y.; Bilgel, M.; Ashrafinia, S.; Lu, L.; Rahmim, A. Voxel-based partial volume correction of PET images via subtle MRI guided non-local means regularization. Phys. Med. 2021, 89, 129–139. [Google Scholar] [CrossRef]
  53. Zhu, Y.; Bilgel, M.; Gao, Y.; Rousset, O.G.; Resnick, S.M.; Wong, D.F.; Rahmim, A. Deconvolution-based partial volume correction of PET images with parallel level set regularization. Phys. Med. Biol. 2021, 66, 145003. [Google Scholar] [CrossRef]
  54. Du, D.; Feng, H.; Lv, W.; Ashrafinia, S.; Yuan, Q.; Wang, Q.; Yang, W.; Feng, Q.; Chen, W.; Rahmim, A.; et al. Machine learning methods for optimal radiomics-based differentiation between recurrence and inflammation: Application to nasopharyngeal carcinoma post-therapy PET/CT images. Mol. Imaging Biol. 2020, 22, 730–738. [Google Scholar] [CrossRef]
  55. Patel, V.K.; Savsani, V.J. Heat transfer search (HTS): A novel optimization algorithm. Inf. Sci. 2015, 324, 217–246. [Google Scholar] [CrossRef]
  56. Vakharia, V.; Shah, M.; Suthar, V.; Patel, V.K.; Solanki, A. Hybrid Perovskites Thin Films Morphology Identification by adapting Multiscale-SinGAN architecture, Heat Transfer Search optimized feature selection and Machine Learning Algorithms. Physica Scripta. Heat transfer search (HTS): A novel optimization algorithm. Phys. Scr. 2023, 98, 025203. [Google Scholar]
Figure 1. The flowchart of the present study includes PET imaging and processing, aggregation methods, and the robustness analysis of features. (a) Illustration of PET/CT imaging of nasopharyngeal carcinoma patients (NPC) and the corresponding segmentation. (b) Aggregation methods for textural matrices, (b1) six aggregation methods of GLCM and GLRLM, which is composed by three classes (namely 2D slice-, 2.5D direction-, and 3D-averaged & merged), and each class includes two aggregation methods (namely features averaged and matrices merged), (b2) three aggregation methods of GLSZM, GLDZM, NGTDM, NGLDM. (c) Robustness analysis of features, including pair-wise ICC and ICC heat map.
Figure 1. The flowchart of the present study includes PET imaging and processing, aggregation methods, and the robustness analysis of features. (a) Illustration of PET/CT imaging of nasopharyngeal carcinoma patients (NPC) and the corresponding segmentation. (b) Aggregation methods for textural matrices, (b1) six aggregation methods of GLCM and GLRLM, which is composed by three classes (namely 2D slice-, 2.5D direction-, and 3D-averaged & merged), and each class includes two aggregation methods (namely features averaged and matrices merged), (b2) three aggregation methods of GLSZM, GLDZM, NGTDM, NGLDM. (c) Robustness analysis of features, including pair-wise ICC and ICC heat map.
Cancers 15 00932 g001
Figure 2. Pairwise ICC for each GLCM feature extracted from: (a) 2Daveraged-2Ds_merged, 2.5Dd_merged-2.5Dmerged, 3Daveraged-3Dmerged, (b) 2Daveraged-2.5Dd_merged, 2Daveraged-2.5Dmerged, 2.5Dd_merged-3Daveraged, (c) 2Ds_merged-2.5Dmerged, 2Ds_merged-3Dmerged, and 2.5Dmerged-3Dmerged.
Figure 2. Pairwise ICC for each GLCM feature extracted from: (a) 2Daveraged-2Ds_merged, 2.5Dd_merged-2.5Dmerged, 3Daveraged-3Dmerged, (b) 2Daveraged-2.5Dd_merged, 2Daveraged-2.5Dmerged, 2.5Dd_merged-3Daveraged, (c) 2Ds_merged-2.5Dmerged, 2Ds_merged-3Dmerged, and 2.5Dmerged-3Dmerged.
Cancers 15 00932 g002
Figure 3. ICC heat map of GLCM features; volumes: pair-wise aggregation methods corresponding to Figure 2; rows; GLCM features. Blue = ICC > 0.9 (excellent), light blue = 0.75 ≤ ICC < 0.9 (good), red = 0.5 ≤ ICC < 0.75 (moderate), yellow = ICC < 0.5 (poor).
Figure 3. ICC heat map of GLCM features; volumes: pair-wise aggregation methods corresponding to Figure 2; rows; GLCM features. Blue = ICC > 0.9 (excellent), light blue = 0.75 ≤ ICC < 0.9 (good), red = 0.5 ≤ ICC < 0.75 (moderate), yellow = ICC < 0.5 (poor).
Cancers 15 00932 g003
Figure 4. Pairwise ICC for each GLRLM feature extracted from: (a) 2Daveraged-2Ds_merged, 2.5Dd_merged-2.5Dmerged, and 3Daveraged-3Dmerged, (b) 2Daveraged-2.5Dd_merged 2Daveraged-2.5Dmerged, and 2.5Dd_merged-3Daveraged, and (c) 2Ds_merged-2.5Dmerged, 2Ds_merged-3Dmerged, and 2.5Dmerged-3Dmerged.
Figure 4. Pairwise ICC for each GLRLM feature extracted from: (a) 2Daveraged-2Ds_merged, 2.5Dd_merged-2.5Dmerged, and 3Daveraged-3Dmerged, (b) 2Daveraged-2.5Dd_merged 2Daveraged-2.5Dmerged, and 2.5Dd_merged-3Daveraged, and (c) 2Ds_merged-2.5Dmerged, 2Ds_merged-3Dmerged, and 2.5Dmerged-3Dmerged.
Cancers 15 00932 g004
Figure 5. ICC heat map of GLRLM features; volumes: pair-wise aggregation methods corresponding to Figure 2; rows; GLCM features. Blue = ICC > 0.9 (excellent), light blue = 0.75 ≤ ICC < 0.9 (good), red = 0.5 ≤ ICC < 0.75 (moderate), yellow = ICC < 0.5 (poor).
Figure 5. ICC heat map of GLRLM features; volumes: pair-wise aggregation methods corresponding to Figure 2; rows; GLCM features. Blue = ICC > 0.9 (excellent), light blue = 0.75 ≤ ICC < 0.9 (good), red = 0.5 ≤ ICC < 0.75 (moderate), yellow = ICC < 0.5 (poor).
Cancers 15 00932 g005
Figure 6. Pairwise ICC for each of GLSZM (top-left), GLDZM (top-right), NGLDM (bottom-left) and NGTDM (bottom-right) features extracted from three aggregation strategies (2D, 2.5D, 3D), denoted as 2D-2.5D, 2D-3D, and 2.5D-3D.
Figure 6. Pairwise ICC for each of GLSZM (top-left), GLDZM (top-right), NGLDM (bottom-left) and NGTDM (bottom-right) features extracted from three aggregation strategies (2D, 2.5D, 3D), denoted as 2D-2.5D, 2D-3D, and 2.5D-3D.
Cancers 15 00932 g006
Figure 7. ICC heat map of GLSZM (top-left), GLDZM (bottom-left), NGTDM (top-right) and NGLDM (bottom-right) features; volumes: pair-wise aggregation methods corresponding to Figure 2; rows; GLCM features. Blue = ICC > 0.9 (excellent), light blue = 0.75 ≤ ICC < 0.9 (good), red = 0.5 ≤ ICC < 0.75 (moderate), yellow = ICC < 0.5 (poor).
Figure 7. ICC heat map of GLSZM (top-left), GLDZM (bottom-left), NGTDM (top-right) and NGLDM (bottom-right) features; volumes: pair-wise aggregation methods corresponding to Figure 2; rows; GLCM features. Blue = ICC > 0.9 (excellent), light blue = 0.75 ≤ ICC < 0.9 (good), red = 0.5 ≤ ICC < 0.75 (moderate), yellow = ICC < 0.5 (poor).
Cancers 15 00932 g007
Figure 8. Agreement of texture features between different aggregations methods from: (a) original SUV image discretized with FBN from 8 to 128, and (b) original SUV and PVC images (with FWHM from 1 to 5) discretized with FBN = 128. The relative distribution of the overall ICC value of texture features per ICC category are shown (ICC > 0.9: excellent, 0.75 ≤ ICC < 0.9: good, 0.5 ≤ ICC < 0.75: moderate, ICC < 0.5: poor).
Figure 8. Agreement of texture features between different aggregations methods from: (a) original SUV image discretized with FBN from 8 to 128, and (b) original SUV and PVC images (with FWHM from 1 to 5) discretized with FBN = 128. The relative distribution of the overall ICC value of texture features per ICC category are shown (ICC > 0.9: excellent, 0.75 ≤ ICC < 0.9: good, 0.5 ≤ ICC < 0.75: moderate, ICC < 0.5: poor).
Cancers 15 00932 g008
Figure 9. Agreement of each features families between different aggregations methods from: (a) original SUV image discretized with FBN from 8 to 128, and (b) original SUV and PVC images (with FWHM from 1 to 5) discretized with FBN = 128. The relative distributions of the overall ICC value of texture features per ICC category are shown (ICC > 0.9: excellent, 0.75 ≤ ICC < 0.9: good, 0.5 ≤ ICC < 0.75: moderate, ICC < 0.5: poor).
Figure 9. Agreement of each features families between different aggregations methods from: (a) original SUV image discretized with FBN from 8 to 128, and (b) original SUV and PVC images (with FWHM from 1 to 5) discretized with FBN = 128. The relative distributions of the overall ICC value of texture features per ICC category are shown (ICC > 0.9: excellent, 0.75 ≤ ICC < 0.9: good, 0.5 ≤ ICC < 0.75: moderate, ICC < 0.5: poor).
Cancers 15 00932 g009
Figure 10. Correlation between features with excellent robustness and MATV from: original SUV image discretized with FBN from 8 to 128.
Figure 10. Correlation between features with excellent robustness and MATV from: original SUV image discretized with FBN from 8 to 128.
Cancers 15 00932 g010
Figure 11. Correlation between features with excellent robustness and MATV from: original SUV and PVC images (with FWHM from 1 to 5) discretized with FBN = 128.
Figure 11. Correlation between features with excellent robustness and MATV from: original SUV and PVC images (with FWHM from 1 to 5) discretized with FBN = 128.
Cancers 15 00932 g011
Table 2. Feature families and corresponding feature number and aggregation methods.
Table 2. Feature families and corresponding feature number and aggregation methods.
Feature FamilyCountAggregation Methods
GLCM256
GLRLM166
GLSZM163
GLDZM163
NGTDM53
NGLDM173
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Peng, L.; Xu, H.; Lv, W.; Lu, L.; Chen, W. Impact of Aggregation Methods for Texture Features on Their Robustness Performance: Application to Nasopharyngeal 18F-FDG PET/CT. Cancers 2023, 15, 932. https://doi.org/10.3390/cancers15030932

AMA Style

Peng L, Xu H, Lv W, Lu L, Chen W. Impact of Aggregation Methods for Texture Features on Their Robustness Performance: Application to Nasopharyngeal 18F-FDG PET/CT. Cancers. 2023; 15(3):932. https://doi.org/10.3390/cancers15030932

Chicago/Turabian Style

Peng, Lihong, Hui Xu, Wenbing Lv, Lijun Lu, and Wufan Chen. 2023. "Impact of Aggregation Methods for Texture Features on Their Robustness Performance: Application to Nasopharyngeal 18F-FDG PET/CT" Cancers 15, no. 3: 932. https://doi.org/10.3390/cancers15030932

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop