Comparison of Four Ultrasonography-Based Risk Stratification Systems in Thyroid Nodules with Nondiagnostic/Unsatisfactory Cytology: A Real-World Study

Simple Summary Although ultrasound-based risk stratification systems (RSSs) including Thyroid Imaging, Reporting and Data Systems (TIRADSs) may play an important role in triaging nodules with nondiagnostic/unsatisfactory cytology, no previous studies have compared ultrasound-based RSSs for these nodules. In this retrospective, longitudinal, real-world study in Korea including 1143 thyroid aspirations with nondiagnostic/unsatisfactory results from 1125 patients, further diagnostic evaluations, including repeat fine-needle aspiration, were conducted more commonly as the categories of ultrasound-based RSSs increased. The American Thyroid Association (ATA) guidelines, Korean (K)-TIRADS, and American College of Radiology (ACR) TIRADS were more competent in predicting malignancy from nondiagnostic/unsatisfactory nodules. The EU-TIRADS, although it was also helpful, demonstrated less effective diagnostic performance in predicting malignancy for nondiagnostic/unsatisfactory nodules in Korea, where iodine intake is more than adequate. These findings have implications for developing and verifying universal guidelines for the ultrasound-based stratification of thyroid nodules and applying these guidelines to nondiagnostic/unsatisfactory nodules. Abstract We compared American Thyroid Association (ATA) guidelines, Korean (K)-Thyroid Imaging, Reporting and Data Systems (TIRADS), EU-TIRADS, and American College of Radiology (ACR) TIRADS in diagnosing malignancy for thyroid nodules with nondiagnostic/unsatisfactory cytology. Among 1143 nondiagnostic/unsatisfactory aspirations from April 2011 to March 2016, malignancy was detected in 39 of 89 excised nodules. The minimum malignancy rate was 7.82% in EU-TIRADS 5 and 1.87–3.00% in EU-TIRADS 3–4. In the other systems, the minimum malignancy rate was 14.29–16.19% in category 5 and ≤3% in the remaining categories. Although the EU-TIRADS category ≥ 5 exhibited the highest positive likelihood ratio (LR) of only 2.214, category ≥ 5 in the other systems yielded the highest positive LR of >5. Receiver operating characteristic (ROC) curves of all systems to predict malignancy were located statistically above the diagonal nondiscrimination line (P for ROC curve: EU-TIRADS, 0.0022; all others, 0.0001). The areas under the ROC curve (AUCs) were not significantly different among the four systems. The ATA guidelines, K-TIRADS, and ACR TIRADS may be useful to guide management for nondiagnostic/unsatisfactory nodules. The EU-TIRADS, although also useful, exhibited inferior performance in predicting malignancy for nondiagnostic/unsatisfactory nodules in Korea, an iodine-sufficient area.


Introduction
For individuals with thyroid nodules, ultrasound (US) is a primary diagnostic modality to evaluate the risk of malignancy (ROM) and to inform decisions regarding the application of fine-needle aspiration (FNA) [1]. For the effective management of thyroid nodules, systems to guide US practitioners in recommending FNA based on US features have been proposed by professional societies or investigators [1][2][3][4][5]. For some of these USbased risk stratification systems (RSSs), the terminology of the Thyroid Imaging, Reporting and Data System (TIRADS) has been used [3]. These US-based RSSs include the nodule sonographic pattern system proposed by the 2015 revised American Thyroid Association (ATA) guidelines [4], the Korean TIRADS (K-TIRADS) by the Korean Thyroid Association (KTA)/Korean Society of Thyroid Radiology (KSThR) in 2016 [1,2], the European (EU)-TIRADS by the European Thyroid Association (ETA) in 2017 [5], and the American College of Radiology (ACR) TIRADS in 2017 [3]. Regarding these US-based RSSs that have been widely utilized in different areas of the world, there have been studies that compared their diagnostic performance [6][7][8][9][10][11][12]. Although a meta-analysis reported better performance for the ACR TIRADS than the ATA nodule sonographic pattern system or K-TIRADS in selecting nodules for FNA, comparisons across the commonly used systems were limited by the limited data availability [8].
FNA represents the standard tool to triage patients with thyroid nodules for excision or clinical observation [13,14]. Thyroid cytopathology reports have been standardized according to the Bethesda system for reporting thyroid cytopathology (TBSRTC), consisting of six diagnostic tiers with certain ROM ranges and management guidelines [15,16]. Of these six categories, the nondiagnostic/unsatisfactory (ND/UNS) is the main limitation [17,18]. Although it is suggested as ideal to limit the ND/UNS results to <10% of all cytopathologic specimens according to TBSRTC [15], a much higher percentage of ND/UNS cases (up to 23.6%) was reported in a meta-analysis [14]. Although the experience of aspirators and cytopathologists and onsite adequacy assessments may affect the inadequacy rate, the characteristics of the nodule itself are also important determinators [18]. In patients with technically challenging nodules, it can be difficult to obtain an adequate sample even with repeat aspirations [14,18,19], linked to the relatively large proportion of ND/UNS specimens in large academic/reference centers. For proper management, the ROM of nodules with inadequate specimens should be maximally predicted with given information, including US findings. Therefore, US-based RSSs may play a particularly important role in triaging ND/UNS nodules [17,20,21]. It is necessary to assess the US-based RSSs of different guidelines and to compare their diagnostic performance for ND/UNS nodules to establish effective evidence-based recommendations for nodules with inadequate specimens and to facilitate communication between radiologists, endocrinologists, surgeons, and pathologists regarding the management of these diagnostically challenging nodules. However, to the best of our knowledge, no studies have compared various US-based RSSs for ND/UNS nodules. Therefore, we determined and compared the utility of US-based RSSs (the ATA nodule sonographic pattern system, K-TIRADS, EU-TIRADS, and ACR TIRADS) in diagnosing malignancy for ND/UNS nodules.

Case Selection
This study was conducted at Samsung Medical Center, a hospital-based tertiary referral center in Korea. This study was approved by the Institutional Review Board (IRB) of Samsung Medical Center (SMC 2020-12-069). The IRB waived the requirement for informed consent because all data were deidentified. TBSRTC was adopted by our pathology department in April 2011 [13,18]. From a total of 16,321 thyroid aspiration cases retrieved from the medical records of the Pathology Department, from April 2011 to March 2016, only ND/UNS specimens were selected after excluding those with adequate FNA results according to the steps described in Figure 1. Finally, the remaining 1143 thyroid aspiration samples from 1125 patients were enrolled. of Samsung Medical Center (SMC 2020-12-069). The IRB waived the requirement for informed consent because all data were deidentified. TBSRTC was adopted by our pathology department in April 2011 [13,18]. From a total of 16,321 thyroid aspiration cases retrieved from the medical records of the Pathology Department, from April 2011 to March 2016, only ND/UNS specimens were selected after excluding those with adequate FNA results according to the steps described in Figure 1. Finally, the remaining 1143 thyroid aspiration samples from 1125 patients were enrolled.

Specimen Preparation and US Examinations
Methods of US examinations and specimen preparation have been described in our previous studies [13,18].

Review of US Images according to the US-Based RSSs
We retrospectively reviewed all US images obtained on the same day as aspirations and analyzed the sonographic features of the nodules while blinded to the final diagnosis. These features were interpreted following the 2015 revised ATA [4], 2016 KTA/KSThR [2], 2017 ETA [5], and 2017 ACR [3] guidelines. All nodules were categorized according to the ATA nodule sonographic pattern system [4], K-TIRADS [2], EU-TIRADS [5], and ACR TIRADS [3]. For effective communication, five groups of sonographic patterns categorized by the ATA guidelines were assigned to five category levels (category 5 for high suspicion of malignancy; category 4 for intermediate suspicion of malignancy; category 3 for low suspicion of malignancy; category 2 for very low suspicion of malignancy; and category 1 for benign). For the K-TIRADS, EU-TIRADS, and ACR TIRADS, their own TIRADS categories were used. Steps for case selection. TBSRTC = the Bethesda system for reporting thyroid cytopathology [15,16].

Specimen Preparation and US Examinations
Methods of US examinations and specimen preparation have been described in our previous studies [13,18].

Review of US Images According to the US-Based RSSs
We retrospectively reviewed all US images obtained on the same day as aspirations and analyzed the sonographic features of the nodules while blinded to the final diagnosis. These features were interpreted following the 2015 revised ATA [4], 2016 KTA/KSThR [2], 2017 ETA [5], and 2017 ACR [3] guidelines. All nodules were categorized according to the ATA nodule sonographic pattern system [4], K-TIRADS [2], EU-TIRADS [5], and ACR TIRADS [3]. For effective communication, five groups of sonographic patterns categorized by the ATA guidelines were assigned to five category levels (category 5 for high suspicion of malignancy; category 4 for intermediate suspicion of malignancy; category 3 for low suspicion of malignancy; category 2 for very low suspicion of malignancy; and category 1 for benign). For the K-TIRADS, EU-TIRADS, and ACR TIRADS, their own TIRADS categories were used.

Histological Follow-Up and Determination of Final Diagnoses
Participants had follow-ups until March 2020, and whether surgical resection was performed was confirmed. The final diagnosis of malignancy was based on the histopathological findings from surgical resection. We calculated the malignancy rate in two ways (maximum and minimum) [13,18,22]. The maximum malignancy rate, estimated only for resected nodules, was derived by dividing the number of surgically confirmed malignant nodules by the number of nodules selected for surgical resection. The minimum malignancy rate was determined by dividing the number of histopathologically confirmed malignant nodules by the total number of nodules aspirated, regardless of whether surgical resection was applied. Noninvasive follicular thyroid neoplasms with papillary-like nuclear features (NIFTPs) were not regarded as malignancies [23].
For the final diagnosis of benign nodules, the satisfaction of ≥one of the followings was required [24]: (1) benign histopathological findings after surgical resection; (2) benign pathological results from follow-up core needle biopsy (CNB); (3) benign cytological results from follow-up FNA procedures repeated at least twice; and (4) benign findings of FNA with a stable size on follow-up. Based on this definition, the benign rate was also calculated in two ways (maximum and minimum). The maximum benign rate was the percentage of benign cases calculated from the total number of nodules with cytological or histopathological follow-up data obtained by repeat FNA, CNB, and/or surgical resection. The minimum benign rate was the percentage of cases calculated from the total number of aspirated nodules.

Statistical Analyses
The baseline characteristics of the patients and nodules were analyzed with respect to all cases and cases treated with surgical resection. Continuous variables are shown as the median and interquartile range, while categorical variables are expressed as the frequency and percentage. The maximum and minimum malignancy/benign rates were calculated according to the categories of US-based RSSs. After including surgically-resected cases only, the positive and negative likelihood ratios (LRs) for category cutoffs of US-based RSSs in diagnosing malignancy were calculated according to the established method [25,26]. Since it is essential to use the gold standard in calculating the LR to evaluate the performance of diagnostic tests, the gold standard diagnosis based on surgical pathology was applied, including surgically-resected nodules only. The greater the LR is from 1.0, the greater the increase in the probability of malignancy [25]. LRs from 2 to 5 correspond to small increases in the posttest probability of malignancy, LRs from 5 to 10 indicate moderate increases, and LRs > 10 suggest that a positive test is adequate for ruling in a diagnosis of malignancy [25]. For LRs < 1.0, the smaller the LR is, the greater its effect is on the decrease in the probability of malignancy [25]. For surgically-resected nodules, the area under the curve (AUC) was obtained from receiver operating characteristic (ROC) curve analyses to assess the performances of US-based RSSs in predicting malignancy. In calculating the LRs and analyzing the ROC curves for the ATA nodule sonographic pattern system, three unclassifiable cases (isoechoic/hyperechoic nodules with ≥one of the following features: irregular margins, microcalcification, taller-than-wide shape, rim calcification with small extrusive soft tissue component, and evidence of extrathyroidal extension) were excluded from analyses. Pairwise comparisons of AUCs of US-based RSSs were conducted using the DeLong test [27]. Pairwise comparisons including ATA nodule sonographic pattern system were conducted after excluding three unclassifiable cases for the ATA guideline. MedCalc statistical software version 19.5.3 (MedCalc Software, Ostend, Belgium) was used for statistical analyses. The level of significance was set at p < 0.05 for two-tailed tests.

Characteristics of the Study Population and Nodules
During the five-year period, 1143 ND/UNS aspiration samples from 1125 patients were included (Table 1). The median age of the population was 54.0 years. Of the 1125 individuals, 843 (74.90%) were females. The median nodule size was 1.30 cm. Of 1143 thyroid nodules, 89 (7.79%) were surgically-resected.

Malignancy Rates According to Categories of US-Based RSSs
The malignancy rates are presented according to the categories of US-based RSSs ( Table 2). Of the 89 excised nodules, 39 were malignant on surgical pathology, yielding maximum and minimum malignancy rates of 43.82% and 3.41%, respectively. Among these 39 malignant nodules, 32 (82.1%) were papillary thyroid carcinomas (PTCs), 6 (15.4%) were follicular thyroid carcinomas (FTCs), and the rest one nodule (2.6%) was medullary thyroid carcinoma (MTC) ( Table S1). With respect to all US-based RSSs, as the category advanced, the minimum malignancy rate tended to increase. Under the ATA nodule sonographic pattern system, K-TIRADS, and ACR TIRADS, category 5, compared to the other categories, was associated with a marked increase in the maximum and minimum malignancy rates. Category 5 of the ATA nodule sonographic pattern system, K-TIRADS, and ACR TIRADS showed minimum malignancy rates of 14. 29-16.19%, while the other categories of these three US-based RSSs exhibited minimum malignancy rates of ≤3%. However, the minimum malignancy rate in category 5 of the EU-TIRADS was 7.82%, demonstrating a smaller difference from the values of categories 3-4 (1.87-3.00%).

Benign Rates According to Categories of US-Based RSSs
The benign rates are summarized according to the categories of US-based RSSs (Table 3). Among the 1143 thyroid nodules, 206 (18.02%) were followed-up cytologically or histopathologically through repeat FNA, CNB, and/or surgical resection. Of the 206 nodules with cytological or histopathological follow-up data, 115 were confirmed as benign at final diagnosis, demonstrating benign rates of 55.83% (maximum) and 10.06% (minimum). The proportion of cytological or histopathological follow-up was higher in more advanced categories than in less advanced categories. Under the ATA nodule sonographic pattern system, K-TIRADS, and ACR TIRADS, 32-35% of nodules were followed-up cytologically or histopathologically in category 5, whereas 19-20% were followed-up in category 4, and only 10-15% were followed-up in category 3. The proportion of cytological or histopathological follow-up was 25.51% in category 5, 18.67% in category 4, and 14.94% in category 3 of the EU-TIRADS. When we applied the ATA nodule sonographic pattern system, K-TIRADS, or ACR TIRADS, the maximum benign rate of category 5 was 30.56-33.33%, demonstrating less than half of the maximum benign rate in category 4 (67.03-68.18%). When analyzed in the EU-TIRADS, the maximum benign rate was 46.77% in category 5 and 66.07% in category 4. Table 4 presents the positive and negative LRs of US-based RSSs in diagnosing malignancy. Applying the cutoff of category ≥ 5 yielded the highest positive LRs of ≥5 in the ATA nodule sonographic pattern system, K-TIRADS, and ACR TIRADS (5.368, 5.449, and 5.128, respectively). However, the EU-TIRADS showed the highest positive LR of only 2.214 when the cutoff of category ≥ 5 was applied.

ROC Curves of US-Based RSSs to Predict Thyroid Malignancy
The ROC curves of all US-based RSSs in predicting malignancy were located statistically above the diagonal nondiscrimination line (Figure 2 and Table 5, p for the ROC curve of the EU-TIRADS = 0.0022, all others p = 0.0001). The differences in the AUC of US-based RSSs are summarized in Table 6. There were no significant differences in the AUC among the four US-based RSSs although the AUC of EU-TIRADS showed the numerically lowest value.

Discussion
We selected 1143 ND/UNS samples from 16,321 thyroid aspiration samples collected over a five-year period at a tertiary referral center. In these ND/UNS nodules, we estimated the malignancy/benign rates according to the categories of different US-based RSSs and compared the diagnostic performance of US-based RSSs. The ATA nodule sonographic pattern system, K-TIRADS, ACR TIRADS, and EU-TIRADS were effective in differentiating cases with a high ROM and reliably predicting thyroid malignancy among ND/UNS nodules, suggesting their role as useful diagnostic tools in these diagnostically challenging nodules. However, considering parameters including LRs, the EU-TIRADS demonstrated inferior diagnostic performance in predicting malignancy for ND/UNS nodules. In real-world practice, for ND/UNS nodules, cytological or histopathological follow-up was more common in the higher categories of US-based RSSs.
In this study, 7.79% of ND/UNS nodules were surgically resected, and the minimum malignancy rate of ND/UNS nodules was 3.41%, slightly above the upper limit of the ROM for benign cytology (0-3%). This level is lower than the suggested overall ROM for ND/UNS specimens in the 2017 TBSRTC (5-10%) [16] but within the 1-4% risk range recommended in the previous version of TBSRTC [15]. Considering that clinically suspicious nodules are more likely to undergo surgical resection early, it is reasonable to calculate the malignancy rate based on the total number of FNA cases [28]. Although previous studies reported a higher ROM (at least 5.3%) for ND/UNS nodules [17,24], they included cases with available follow-up CNB results only [24] or selected surgically-excised nodules, nodules with diagnostic results at repeat FNA, cases with nondiagnostic results at repeat FNA but no increase in size, or cases with a stable or decreased size during follow-up only [17]. Therefore, the possibility of ROM overestimation due to selection bias should be considered since nodules without histopathological or cytological follow-up, particularly those without even radiological follow-up, are more likely to be cases with low clinical suspicion. Therefore, it might be reasonable to estimate the overall ROM for ND/UNS  Table 5. Area under the receiver operating characteristic curve of ultrasonography-based risk stratification systems to predict thyroid malignancy in nodules with nondiagnostic or unsatisfactory cytopathological results.  Table 6. Pairwise comparison of the area under the receiver operating characteristic curve of ultrasonography-based risk stratification systems to predict thyroid malignancy in nodules with nondiagnostic or unsatisfactory cytopathological results.

Discussion
We selected 1143 ND/UNS samples from 16,321 thyroid aspiration samples collected over a five-year period at a tertiary referral center. In these ND/UNS nodules, we estimated the malignancy/benign rates according to the categories of different US-based RSSs and compared the diagnostic performance of US-based RSSs. The ATA nodule sonographic pattern system, K-TIRADS, ACR TIRADS, and EU-TIRADS were effective in differentiating cases with a high ROM and reliably predicting thyroid malignancy among ND/UNS nodules, suggesting their role as useful diagnostic tools in these diagnostically challenging nodules. However, considering parameters including LRs, the EU-TIRADS demonstrated inferior diagnostic performance in predicting malignancy for ND/UNS nodules. In realworld practice, for ND/UNS nodules, cytological or histopathological follow-up was more common in the higher categories of US-based RSSs.
In this study, 7.79% of ND/UNS nodules were surgically resected, and the minimum malignancy rate of ND/UNS nodules was 3.41%, slightly above the upper limit of the ROM for benign cytology (0-3%). This level is lower than the suggested overall ROM for ND/UNS specimens in the 2017 TBSRTC (5-10%) [16] but within the 1-4% risk range recommended in the previous version of TBSRTC [15]. Considering that clinically suspicious nodules are more likely to undergo surgical resection early, it is reasonable to calculate the malignancy rate based on the total number of FNA cases [28]. Although previous studies reported a higher ROM (at least 5.3%) for ND/UNS nodules [17,24], they included cases with available follow-up CNB results only [24] or selected surgically-excised nodules, nodules with diagnostic results at repeat FNA, cases with nondiagnostic results at repeat FNA but no increase in size, or cases with a stable or decreased size during follow-up only [17]. Therefore, the possibility of ROM overestimation due to selection bias should be considered since nodules without histopathological or cytological follow-up, particularly those without even radiological follow-up, are more likely to be cases with low clinical suspicion. Therefore, it might be reasonable to estimate the overall ROM for ND/UNS nodules encountered in routine clinical practice as <5%.
In our study, cytological or histopathological follow-ups for ND/UNS nodules were more common in cases with higher categories of US-based RSSs. This suggests that although US-based RSSs were helpful in predicting benign diagnosis, further evaluations to confirm a benign or malignant status were increased in advanced categories of US-based RSSs. Because of this phenomenon, the benign rates defined through the cytological or histopathological follow-up results did not concordantly match the categories of US-based RSSs. The proportion of benign cases proven through cytological or histopathological follow-up was only 10.06% (minimum benign rate). However, only 18.02% of the ND/UNS nodules in our study were followed-up cytologically or histopathologically. In the remaining 81.98%, repeat FNA, CNB, and/or surgical resection were not applied, and most of them are considered to be managed conservatively due to a lack of worrisome clinical/sonographic features. Although TBSRTC guidelines recommend repeat FNA with US guidance for ND/UNS nodules [16], in real practice, most ND/UNS nodules, especially those with low categories of US-based RSSs, were observed without repeat FNA, as nodules with benign cytology. This is the reality of clinical practice that has not been reported before. Making decisions on nodules with uncertain cytological data refers to "unknown knowns" (things we understand but are not aware of) [29,30]. When dealing with "unknown knowns", despite uncertainty, to efficiently utilize available resources and time, strict diagnostic efforts may be reduced for a significant proportion of ND/UNS nodules based on low category information of US-based RSSs. This is consistent with the current trend in clinical practice for thyroid nodules in which diagnostic and therapeutic interventions are focused on high-risk groups and conservative approaches are applied for low-risk groups.
The maximum benign rate of category 5 was less than half of that in category 4 of the ATA nodule sonographic pattern system, K-TIRADS, and ACR TIRADS. This phenomenon was not demonstrated when the EU-TIRADS was applied for ND/UNS nodules. This suggests the excellence of the highest category of the ATA nodule sonographic pattern system, K-TIRADS, and ACR TIRADS in predicting thyroid malignancy from ND/UNS nodules.
Category 5 in the ATA nodule sonographic pattern system, K-TIRADS, and ACR TIRADS demonstrated a positive LR of >5. Stated in another way, malignant nodules were >5 times more likely to be in category 5 of these three US-based RSSs than were nonmalignant nodules. Therefore, without priority, these three systems may be useful diagnostic tests to guide the further management of ND/UNS nodules. However, category 5 in the EU-TIRADS exhibited a positive LR of only 2.214, demonstrating inferior performance compared with the other systems. The criteria to be classified as category 5 are the least demanding for the EU-TIRADS compared to the other systems. It is sufficient for a nodule to be classified as category 5 of EU-TIRADS if it has one of the following four features: irregular shape (nonparallel orientation), irregular (spiculated/microlobulated) margin, microcalcifications, and marked hypoechogenicity (and solid) [5]. However, in K-TIRADS [2], to be classified as category 5, a nodule needs to be solid hypoechoic and ≥one of the three suspicious features (microcalcifications, nonparallel orientation, spiculated/microlobulated margin) should be present. Likewise, according to the ATA guidelines [4], to be stratified as a high suspicion nodule, it must be a solid hypoechoic nodule or solid hypoechoic component of a partially cystic nodule and at the same time, ≥one of the following features should be accompanied: irregular margin, microcalcification, nonparallel orientation, rim calcification with small extrusive soft tissue component, and evidence of extrathyroidal extension. Therefore, marked hypoechoic solid nodules without other suspicious features (category 4 in ATA nodule sonographic pattern system and K-TIRADS) and isoechoic/hyperechoic nodules with microcalcifications, nonparallel orientation, or spiculated/microlobulated margin (unclassifiable in ATA nodule sonographic pattern system and category 4 in K-TIRADS) are classified into category 5 only in EU-TIRADS. The ACR TIRADS has a totally different system, which allocates points for every US feature of a nodule, and the nodule ACR TIRADS level is defined based on the total sum of the points [3]. Although the presence of marked hypoechogenicity, nonparallel orientation, irregular margin, or punctate echogenic foci suggesting the possibility of microcalcification increases the points, presence of only one feature does not produce sufficient points to reach the ACR TIRADS 5. With respect to EU-TIRADS, these least stringent criteria to be stratified as category 5 may be associated with the highest number of cases in category 5 (Table S2) and the lowest positive LR of 2.214 in diagnosing malignancy for category ≥ 5, compared to the other three RSSs. Furthermore, differences in the epidemiology of thyroid tumors between Korea and Europe may have also partly affected the results. Each US-based RSS was established for its local population and the epidemiology of thyroid tumors may be varied by the iodine intake and/or genetic variations [31]. In Korea, where our study was conducted, iodine intake is more than adequate, and PTCs were reported to constitute 92% of thyroid cancers, while FTCs account for only approximately 3% [13,32]. In the current study, although the proportion of FTCs among the excised ND/UNS nodules turned out to be malignant (15.4%) was higher than the proportion among the total thyroid cancers reported in Korea, this proportion of FTCs is still much lower than that reported in European countries (27-37%) [33][34][35][36]. In Korea, application of the EU-TIRADS may be less appropriate, as follicular proliferative lesions, including follicular adenomas, FTCs, and NIFTPs, are much rarer than in Europe. Recently, the International Thyroid Nodule Ultrasound Working Group, standing for major thyroid associations, including the ATA, is developing the universal guidelines on the US-based stratification of thyroid nodules, called the U-TIRADS [37]. In developing this U-TIRADS, regional epidemiological variations in thyroid tumor histology may need to be contemplated.
Several limitations of this study should be acknowledged. First, analyses were performed per nodule, not per patient, which may have led to an overestimation of our results. However, when the same analyses were repeated after excluding cases where ≥two nodules were included from one patient, the findings were not changed. Second, we analyzed the positive and negative LRs and ROC curves of US-based RSSs for predicting thyroid malignancy only in surgically-resected ND/UNS nodules since the gold standard diagnosis based on surgical pathology is required to evaluate the performance of diagnostic tests. Therefore, this information is applicable to cases with high clinical suspicion rendered to surgery, not all ND/UNS nodules, most of which are managed conservatively. Third, our study is based on samples from Korea, an iodine-sufficient area. Therefore, caution should be applied before extrapolating our findings to countries with different dietary iodine contents and/or thyroid tumor epidemiology. Fourth, US-based RSSs that include new US parameters such as elastography or 4-dimensional (4D) vascularity [31] were not evaluated in the current study.

Conclusions
In conclusion, this real-world comparative study demonstrated the following important findings on the diagnostic performance and clinical applications of US-based RSSs for thyroid nodules with ND/UNS cytology. For ND/UNS nodules, although US-based RSSs aided in predicting benign lesions, in real-world practice, further diagnostic evaluations to confirm the benign status, including repeat FNA, were increased in higher categories of US-based RSSs. The ATA nodule sonographic pattern system, K-TIRADS, and ACR TIRADS, especially the highest category of these systems, were more competent in predicting malignancy from ND/UNS nodules. These systems may be useful diagnostic tools to guide the further management of ND/UNS nodules. The EU-TIRADS, although it was also helpful, exhibited relatively less effective diagnostic performance in predicting malignancy for nodules with ND/UNS cytology in Korea, where iodine intake is more than adequate. These findings have implications for developing and verifying the U-TIRADS and applying the U-TIRADS to ND/UNS nodules.

Supplementary Materials:
The following is available online at https://www.mdpi.com/article/ 10.3390/cancers13081948/s1, Table S1: Histologic subtypes of excised nodules with nondiagnostic/unsatisfactory cytology; Table S2: Frequencies of ultrasound imaging characteristics and EU-TIRADS categories in included thyroid nodules with nondiagnostic/unsatisfactory cytology. Informed Consent Statement: Patient consent was waived by the IRB because all patient data was de-identified.

Data Availability Statement:
Restrictions apply to the availability of these data. Data was obtained from Samsung Medical Center and are available from the corresponding authors with the permission of Samsung Medical Center.