Ultrasound Radiomics Nomogram to Diagnose Sub-Centimeter Thyroid Nodules Based on ACR TI-RADS

Simple Summary Fine-needle aspiration or surgical resection is used to determine whether thyroid nodules (TNs) are benign or malignant, but there is a risk of overtreatment. Guidelines such as the American College of Radiology thyroid imaging reporting and data system consider follow-up for TNs < 1 cm, even with high-risk ultrasound features; therefore, for these TNs, an ultrasound radiomics nomogram was developed for noninvasive assessment of benign and malignant TNs. In addition, to assess the model’s diagnostic capabilities, the present study tested the model using an external validation group. The test results suggest that ultrasound radiomics is an effective, noninvasive method by which to diagnose noninvasive TNs. Abstract The aim of the present study was to develop a radiomics nomogram to assess whether thyroid nodules (TNs) < 1 cm are benign or malignant. From March 2021 to March 2022, 156 patients were admitted to the Affiliated Hospital of Nantong University, and from September 2017 to March 2022, 116 patients were retrospectively collected from the Jiangsu Provincial Hospital of Integrated Traditional Chinese and Western Medicine. These patients were divided into a training group and an external test group. A radiomics nomogram was established using multivariate logistics regression analysis using the radiomics score and clinical data, including the ultrasound feature scoring terms from the thyroid imaging reporting and data system (TI-RADS). The radiomics nomogram incorporated the correlated predictors, and compared with the clinical model (training set AUC: 0.795; test set AUC: 0.783) and radiomics model (training set AUC: 0.774; test set AUC: 0.740), had better discrimination performance and correction effects in both the training set (AUC: 0.866) and the test set (AUC: 0.866). Both the decision curve analysis and clinical impact curve showed that the nomogram had a high clinical application value. The nomogram constructed based on TI-RADS and radiomics features had good results in predicting and distinguishing benign and malignant TNs < 1 cm.


Introduction
The occurrence of thyroid nodules (TNs) has increased in recent years in several countries throughout the world; however, this increase could be attributed to the availability of more accurate diagnostic and detection techniques [1]. The prevalence of TNs is 30-67%, comprising mainly malignant and benign nodules [2], with~7-15% being thyroid cancer. Except for those individuals with higher risk factors, screening for thyroid cancer is recommended. Routine screening for malignancy, such as fine-needle aspiration (FNA), is not recommended, especially for TNs < 1 cm that were conservatively observed [3]; however, 52.8% of patients already had cervical lymph node metastasis while having papillary thyroid carcinoma (PTC) diagnosed [4]. TNs < 1 cm is now the most common incidental nodule [5]. Although PT microcarcinoma (PTMC) < 1 cm usually progresses slowly, cervical lymph node metastases may occur in 24-63% of patients [6]. Other studies have found that a node > 7 mm is an independent risk factor for lymph node metastasis (LNM) or distant metastasis [7]; therefore, it is necessary to accurately diagnose the benign and malignant TNs in the early stages and formulate corresponding treatment measures.
Ultrasound plays an essential role as a screening tool in TN management [8]. Ultrasonography is the most common, useful, safe, and cost-effective method of thyroid pathological imaging [9]. The widespread use of high-frequency ultrasound machines is the main reason that the prevalence of thyroid cancer appears to be increasing [1]. The American College of Radiology (ACR) published a white paper in 2017 on thyroid imaging reporting and data systems (TI-RADS), in which different risk levels for TNs were proposed and helpful for radiologists to standardize the description of the lesions. The TNs were graded using scores for the following sonographic features: composition, shape, echo, margins, and calcifications [10]. In their diagnoses, doctors are influenced by experience when using ultrasound, and senior doctors have higher diagnostic accuracy than junior doctors [11]. For TNs < 1 cm, the diagnosis using ultrasound images is more dependent on the doctor's experience. ACR guidelines state that FNA evaluation for TNs < 1 cm is not recommended regardless of the ultrasound characteristics of the lesions [10]. There are studies on the diagnosis of TNs < 1 cm using the features in conventional ultrasound and contrast-enhanced modes [12]. Some researchers have suggested that the size of the nodule will affect the diagnosis of benign and malignant, and have found that the incidence of malignant tumors decreased with an increase in diameter, that the size of the nodule is inversely proportional to the risk of a malignant tumor, and that nodule size should not be regarded as an independent risk factor [13,14]. Therefore, a better, noninvasive diagnostic and therapeutic approach was needed to assess the status of TNs < 1 cm and avoid missing important diagnostic information.
Radiomics is a medical imaging diagnostic tool that has emerged in recent years. It is applied in the clinic to improve the accuracy of a diagnosis, prognosis, and prediction by extracting image feature data. Radiomics has become more and more important in cancer research and is the bridge between imaging and patient medical individuation [15]. Many studies have combined radiomics with clinical characteristics to predict the lymph node status in patients, the heterogeneity of the tumor, or the prognosis for a tumor patient [16][17][18]. Some studies have suggested that the diagnosis of TN using a "Rad-score" is better than that of primary doctors using ACR TI-RADS [19]. Studies are limited on whether ultrasound radiomics techniques can help distinguish the benign and malignant TNs < 1 cm.
The aim of the present study was to establish a model-visualized nomogram comprising radiomics combined with ACR TI-RADS to improve the diagnostic accuracy of TNs < 1 cm.

Patients and Groups
The present study comprised TN patients admitted to the Affiliated Hospital of Nantong University (training set) from March 2021 to March 2022 and to the Jiangsu Provincial Hospital of Integrated Traditional Chinese and Western Medicine (test set) from September 2017 to March 2022. The training set comprised 156 patients and the external test set 116 patients. This retrospective study waived informed consent and was approved by the hospital review board. The study inclusion criteria were as follows: (1) the maximum diameter of the target nodule was <1 cm, (2) pathological results were obtained by FNA or surgical resection, (3) ultrasonography was conducted within 2 weeks of FNA or surgery, and (4) the clinical and ultrasound data on the patients were complete. The exclusion criteria were as follows: (1) biopsy or resection had been conducted before ultrasonography, (2) thyroid inflammation, (3) other neoplastic diseases, (4) poor quality of ultrasound images, or (5) unclear pathological results. Figure 1 shows the patient pathways that two hospitals recruited according to the criteria of 272 patients. From the Affiliated Hospital of Nantong University, 122 women and 34 men were divided into the training set. The mean age was 45.64 ± 12.00 years and ranged from 18 to 76 years. From the Jiangsu Provincial Hospital of Integrated Traditional Chinese and Western Medicine, 92 women and 24 men were divided into the test set. The mean age was 47.47 ± 11.20 years and ranged from 20 to 77 years. More detailed selection steps can be obtained in Figure S2 in the Supplementary Materials. 116 patients. This retrospective study waived informed consent and was approved by the hospital review board. The study inclusion criteria were as follows: (1) the maximum diameter of the target nodule was <1 cm, (2) pathological results were obtained by FNA or surgical resection, (3) ultrasonography was conducted within 2 weeks of FNA or surgery, and (4) the clinical and ultrasound data on the patients were complete. The exclusion criteria were as follows: (1) biopsy or resection had been conducted before ultrasonography, (2) thyroid inflammation, (3) other neoplastic diseases, (4) poor quality of ultrasound images, or (5) unclear pathological results. Figure 1 shows the patient pathways that two hospitals recruited according to the criteria of 272 patients. From the Affiliated Hospital of Nantong University, 122 women and 34 men were divided into the training set. The mean age was 45.64 ± 12.00 years and ranged from 18 to 76 years. From the Jiangsu Provincial Hospital of Integrated Traditional Chinese and Western Medicine, 92 women and 24 men were divided into the test set. The mean age was 47.47 ± 11.20 years and ranged from 20 to 77 years. More detailed selection steps can be obtained in Figure S2 in the Supplementary Materials.

Clinical Data and Ultrasound Images
Baseline clinicopathological information comprised sex, age, ultrasonographic appearance (largest nodule diameter, location), and pathological results. The included ultrasound images were collected using the Esaote Mylab Class_C (Esaote Group, Genoa, Italy, shallow probe) and the HITACHI EUB-8500 (Hitachi, Tokyo, Japan, shallow probe) ultrasound equipment. The dynamic images and the maximum-diameter tomographic images of the nodules when scanning the lesions were retained and were stored in the picture archiving and communication system workstation for later measurements and evaluations by the doctors. The TI-RADS scoring items comprise composition, echo, shape, margin, and calcification. Two senior radiologists with >8 years of work experience from each hospital measured and scored the lesions in all patients. The four doctors were unaware of other assessments and pathological results, and any differences were resolved through

Clinical Data and Ultrasound Images
Baseline clinicopathological information comprised sex, age, ultrasonographic appearance (largest nodule diameter, location), and pathological results. The included ultrasound images were collected using the Esaote Mylab Class_C (Esaote Group, Genoa, Italy, shallow probe) and the HITACHI EUB-8500 (Hitachi, Tokyo, Japan, shallow probe) ultrasound equipment. The dynamic images and the maximum-diameter tomographic images of the nodules when scanning the lesions were retained and were stored in the picture archiving and communication system workstation for later measurements and evaluations by the doctors. The TI-RADS scoring items comprise composition, echo, shape, margin, and calcification. Two senior radiologists with >8 years of work experience from each hospital measured and scored the lesions in all patients. The four doctors were unaware of other assessments and pathological results, and any differences were resolved through negotiation to ensure the reliability of the data. Any histological and pathological findings of unclear significance were discarded, and only accurate and undisputed reports were used.

Region of Interest Segmentation and Radiomics Feature Extraction
The ultrasound images of all patients from the two different hospitals were uniformly formatted. Two radiologists (Doctor A and Doctor B) independently reviewed the ultrasound images. For each patient, a two-dimensional ultrasound image was selected that showed the maximum diameter of the nodule. The selected images were confirmed for consistency by the two radiologists. The open-source software ITK-SNAP (version 3.8.0; http://www.itksnap.org, accessed on 1 January 2022) was used to delineate and save the region of interest (ROI) of the target nodule on the selected ultrasound image. The features of the image, such as the first-order, morphology, gray histogram, and wavelet transform, were extracted using the extension package SlicerRadiomics in the open-source software 3D-Slicer (version 4.13.0; https://www.slicer.org, accessed on 20 March 2022), and the data were then stored. Intraclass and interclass correlation coefficients (ICC) were used to assess the reproducibility of extracted radiomics features. The ultrasound images of the nodules of 50 patients were randomly selected. Doctors A and B independently delineated the ROIs of the selected images. Doctor A delineated the lesion image again 2 weeks later. ICC analysis was conducted on the three sets of data so that the intra-observer and inter-observer reproducibility of features extracted by the two radiologists could be assessed. Features with ICCs > 0.8 were considered excellent and reliable.

Feature Selection and Rad-Score Building
Redundancy analysis was used on the feature data extracted from the training set. First, a normality test was conducted, and Pearson correlation analysis was used for normal features and Spearman correlation analysis was used for non-normal features. Redundant features were deleted with correlation coefficients > 0.9. The least absolute shrinkage and selection operator (LASSO) logistics regression algorithm was used for the remaining features, and five cross-validations were used to select the optimal penalty coefficient lambda. The radiomics features with non-zero coefficients in the training set were obtained, and these non-zero coefficients were weighted to form a Rad-score formula. This formula was used to calculate the Rad-score for each patient in the training and test sets, and the association between the Rad-score and TNs was assessed using the Mann-Whitney U-test.

Constructing Ultrasound Radiomics Nomogram
Taking p < 0.05 as the retention criterion and conducting a t-test for continuous variables and a chi-squared test for categorical variables, a univariate analysis of the clinical factors (including ACR TI-RADS score items) was conducted to determine risk factors related to benign and malignant nodules. A multivariate logistics regression analysis was conducted on the Rad-score and independent risk factors to determine the significant variables for the evaluation of TNs. Based on the variables selected using multivariate logistics regression analysis of the training set, a radiomics nomogram was established. Additional predictive models were established based on clinical factors and radiomics scores, respectively, to compare performance differences between models.

The Performance of Ultrasound Radiomics Nomogram
The nomogram calculated the total score of each patient by entering the corresponding independent risk factor scores, and the best Nomo-score cutoff value was assessed by maximizing the Youden index. The calibration curve and Hosmer-Lemeshow test were used to evaluate the correction effect of the nomogram on the training and test sets. The discriminative performance of the nomogram was evaluated using receiver operating characteristic (ROC) curves. Differences in the area under the curve (AUC) among the three models were compared using the DeLong test. A decision curve analysis (DCA) and clinical impact curve (CIC) were used to evaluate the clinical application value of the nomogram at different thresholds.

Statistical Analyses
R version 4.0.5 and SPSS version 24.0 (IBM Inc., Armonk, NY, USA) were used for statistical analyses. MedCalcm version 20.1 was used to draw and analyze the ROC curves. All statistical significance levels were two-sided, with p-values < 0.05 and a 95% confidence interval (CI). R and the associated step algorithm are provided in Table S1.

Clinical Information
The radiomics flow chart is shown in Figure 2. Clinical information on the training set and external independent validation set is summarized in Table 1. Except for the three features of shape, margin, and echogenicity that were significantly different in two datasets, the remaining clinicopathological features were not significantly different. The pathological results of the training and test datasets were not significantly different (p = 0.651). In the training set, using univariate analysis for each clinical factor, four factors were significantly different in the benign and malignant groups-margin, shape, calcification, and TI-RADS classification ( Table 2); therefore, multivariate logistic regression analysis was used to construct a clinical model for predicting benign or malignant nodules based on these four predictive risk factors. It should be noted that TI-RADS classification was excluded because there was no significant difference in the regression analysis data (p = 0.1298). One image from each of the two hospitals was selected as a demonstration. Region of interest (ROI) segmentation was conducted on the largest diameter of the nodule. Radiomics features were extracted from ROIs, including features such as shape, grayscale, texture, and wavelets. The optimal penalty coefficient lambda in the least absolute shrinkage and selection operator (LASSO) model was generated using five-fold cross-validation and a minimum criteria procedure. Ultrasound image features were selected using LASSO.

Establishment of Radiomics Score
Radiomic features were extracted from the two-dimensional images of each patient using 3D-Slicer with 464 features in each group. After random sampling for ICC analysis, the results showed that intragroup ICCs ≥ 0.8 was 92.6% (430), and intergroup ICCs ≥ 0.8 was 89.6% (416). Both have good repeatability of intra-observer and inter-observer operations. As Figure S1 shows, ten features with non-zero coefficients were screened for the construction of the radiomics score using LASSO regression on the training set. The relevant features, weighting coefficients, and specific formulas are provided in the Supple- One image from each of the two hospitals was selected as a demonstration. Region of interest (ROI) segmentation was conducted on the largest diameter of the nodule. Radiomics features were extracted from ROIs, including features such as shape, grayscale, texture, and wavelets. The optimal penalty coefficient lambda in the least absolute shrinkage and selection operator (LASSO) model was generated using five-fold cross-validation and a minimum criteria procedure. Ultrasound image features were selected using LASSO.

Establishment of Radiomics Score
Radiomic features were extracted from the two-dimensional images of each patient using 3D-Slicer with 464 features in each group. After random sampling for ICC analysis, the results showed that intragroup ICCs ≥ 0.8 was 92.6% (430), and intergroup ICCs ≥ 0.8 was 89.6% (416). Both have good repeatability of intra-observer and inter-observer operations. As Figure S1 shows, ten features with non-zero coefficients were screened for the construction of the radiomics score using LASSO regression on the training set. The relevant features, weighting coefficients, and specific formulas are provided in the Supplementary Materials. The Rad-score of the malignant nodule group was significantly higher than that of the benign group in both training and test sets ( Table 2).

Constructing and Evaluating Nomogram
Multivariate logistics regression analysis showed that four factors were significantmargin, calcification, shape, and Rad-score-as independent risk predictors for TN patients (Table 3). A radiomics nomogram was constructed with these four predictors (Figure 3a). The calibration curves show the calibration effect of the nomogram of the training set and test sets in predicting benign and malignant TNs (Figure 3b), and the Hosmer-Lemeshow test was used to observe the p-values of the two calibration curves (training set: 0.588; test set: 0.435). Table 4 indicates the performance of the radiomics nomogram in predicting benign and malignant TNs. The optimal threshold of the nomogram score for judging benign and malignant TNs was determined by maximizing the Youden index of 0.486 (Figure 4). The AUC value of the nomogram in the training set was 0.866 (95% confidence interval (CI), 0.802-0.915), and the AUC value of the nomogram in the test set was 0.866 (95% CI, 0.790-0.922), both of which reflect that the nomogram has good discrimination ability. The clinical and Rad-score models were performed separately for comparison (Figure 4). In the training set (Figure 4c), the nomogram and clinical model AUCs were 0.866 and 0.795 (p = 0.0019), respectively, the nomogram and Rad-score model AUCs were 0.866 and 0.774 (p = 0.0053), respectively, and the Rad-score and clinical model AUCs were 0.774 and 0.795 (p = 0.6748), respectively. In the test set (Figure 4d), the nomogram and clinical model AUCs were 0.866 and 0.783 (p = 0.0099), respectively, the nomogram and Rad-score model AUCs were 0.866 and 0.740 (p = 0.0006), respectively, and the Rad-score and clinical model AUCs were 0.740 and 0.783 (p = 0.4835), respectively. It was observed that the radiomics nomogram had better discrimination than the clinical model and the Rad-score model ( Table 5). In addition, the waterfall chart was predicted with a threshold score (0.486), showing the predictions of the test set patients separately ( Figure 5). Figure 6a shows the DCA curves of all patients under the different models. When thresholds were between 0.2 and 0.8, using the nomogram to diagnose TNs was more accurate than using the clinical model or Rad-score alone. Figure 6b shows the CIC of the radiomics nomogram and demonstrates the high clinical efficacy of the nomogram. When the risk threshold probability was >70% of the predicted score probability value, the malignant high-risk population and the actual malignant population had a high degree of matching.  Table 4 indicates the performance of the radiomics nomogram in predicting benign and malignant TNs. The optimal threshold of the nomogram score for judging benign and malignant TNs was determined by maximizing the Youden index of 0.486 (Figure 4). The AUC value of the nomogram in the training set was 0.866 (95% confidence interval (CI), 0.802-0.915), and the AUC value of the nomogram in the test set was 0.866 (95% CI, 0.790-0.922), both of which reflect that the nomogram has good discrimination ability. The clinical and Rad-score models were performed separately for comparison (Figure 4). In the training set (Figure 4c), the nomogram and clinical model AUCs were 0.866 and 0.795 (p = 0.0019), respectively, the nomogram and Rad-score model AUCs were 0.866 and 0.774 (p = 0.0053), respectively, and the Rad-score and clinical model AUCs were 0.774 and 0.795 (p = 0.6748), respectively. In the test set (Figure 4d), the nomogram and clinical model AUCs were 0.866 and 0.783 (p = 0.0099), respectively, the nomogram and Rad-score model AUCs were 0.866 and 0.740 (p = 0.0006), respectively, and the Rad-score and clinical model AUCs were 0.740 and 0.783 (p = 0.4835), respectively. It was observed that the radiomics nomogram had better discrimination than the clinical model and the Rad-score model ( Table 5). In addition, the waterfall chart was predicted with a threshold score (0.486), showing the predictions of the test set patients separately ( Figure 5). Figure 6a shows the DCA curves of all patients under the different models. When thresholds were between 0.2 and 0.8, using the nomogram to diagnose TNs was more accurate than using the clinical model or Rad-score alone. Figure 6b shows the CIC of the radiomics nomogram and demonstrates the high clinical efficacy of the nomogram. When the risk threshold probability was >70% of the predicted score probability value, the malignant high-risk population and the actual malignant population had a high degree of matching.

Discussion
The present study focused mainly on the identification of benign and malignant TNs < 1 cm. In the training set, univariate analysis was conducted on clinical factors, including TI-RADS score items, to acquire significant variables, then combined them with the radiomics features selected by LASSO regression. These variables were then subjected to multivariate logistics regression to obtain a radiomics nomogram, which had excellent performance in predicting benign and malignant TNs < 1 cm.
Imaging is crucial for early cancer detection, and radiomics is helped to build powerful classifier models to improve cancer screening and early detection [20]. Constructing a Radscore by screening relevant radiomics features using LASSO has been widely used in the research on related ultrasound radiomics [21][22][23]. LASSO finally screened out 10 features with non-zero coefficients related to the TNs < 1 cm. The sphericity feature had the largest weight coefficient. This feature is similar to the results of Dorota et al. [24] in evaluating the relationship between TN shape and malignant nodules.
In the training dataset, four clinical features-margins, shape, calcification, and TI-RADS grade-were selected using univariate analyses to be meaningful in distinguishing between benign and malignant TNs < 1 cm (p < 0.001); however, the subsequent multivariate logistics regression analysis showed insignificant differences in TI-RADS classification (p = 0.1298). Thus, this factor was excluded. Constructing a nomogram with clinical features and radiomics scores showed positive performance in the training and test sets. Several studies have combined radiomics features with clinical factors to construct a model to judge disease status [25,26]. Radiomics not only assesses the image features of the patients but also takes other factors into account. The diagnosis and treatment of patients are more individualized and comprehensive and used to resolve clinical issues [27].
Some studies on ACR TI-RADS have indicated that the diagnostic accuracy of TNs corresponds highly with the experience of the radiologists and will be more dependent on their overall experience [28]. Although guidelines recommend follow-up visits for TNs < 1 cm, 72% of PTMC patients choose surgery rather than follow-up [29]. In addition, there remains a small number of significant cases of PTMC that exhibit more biologically aggressive features, such as lymph node involvement and early metastases [30]. Ye et al. [31] have shown that~39.1% of patients with PTMC have developed central lymph node metastases. Other studies have used 5 mm as the cutoff value to observe the differences between PTMC and pathological features [32]. The results of the above studies may concern many patients, intervening in micronodules with high-risk features.
Zhang et al. [33] have shown that the sensitivity of diagnosing TNs with ultrasound alone was 77.14%, the specificity was 65.79%, and the AUC was 0.728. Ultrasound-guided FNA, studied by Gao et al. [34] for the diagnosis of TNs < 1 cm, showed excellent diagnostic performance (sensitivity, specificity, and AUC: 98.8%, 90.5%, and 0.947, respectively); however, it was an invasive method. In the present study, the sensitivity, specificity, and AUC of the nomogram constructed using clinical factors combined with radiomics were 78.87%, 85.88%, and 0.866, respectively. The results of the present study indicated that the nomogram may be a better choice than ultrasonography alone or FNA invasive examination. In addition, DCA and CIC curves are often used to evaluate the clinical value of various models [35,36], showing the clinical reliability of the nomogram, and compared with the clinical model and Rad-score, the nomogram has a greater improvement. The cutoff value corresponding to the maximum Youden value can also help in clinical evaluation. When the Nomo-score is >0.486, the TN < 1 cm is at high risk, and the patient can be given additional examination or treatment.
This study had some limitations. First, it was a retrospective study; therefore, there were some selection biases and imbalances in the data. The study included pathological results of TN nodules, all graded above TR2, because FNA or surgical resection for benign performance could not be implemented, which may have affected the results. More prospective studies are needed to verify the feasibility of using the nomogram in the future. Second, radiomics features were extracted on two-dimensional ultrasound images. Compared with three-dimensional images, some disease information may be missed using two-dimensional images. In the future, software and hardware optimization may be required to observe three-dimensional ultrasound images and obtain more information on diseases. In addition, the ultrasound examination and the delineation of the radiomics ROI are manual, which will inevitably lead to some mistakes; therefore, two experienced doctors were used in the present study to conduct the operation and consistent tests to avoid adverse effects as much as possible.

Conclusions
The present study proposed an ultrasound radiomics nomogram based on ACR TI-RADS for noninvasively predicting benign and malignant TNs < 1 cm, which proved to be superior to using the radiomics model alone or ACR TI-RADS.

Informed Consent Statement:
The ethics board waived informed consent because the data were obtained from preexisting institutional data.

Data Availability Statement:
Original comments presented in the research report are contained in the article/Supplementary Materials. For further inquiries, the corresponding authors may be contacted directly.