Sonographic Risk Stratification Systems for Thyroid Nodules as Rule-Out Tests in Older Adults

Simple Summary The use of risk-stratification systems for thyroid nodules based on ultrasound features may reduce the number of biopsies to be performed. The aim of our study was to assess the diagnostic performance of these systems in different age groups. We confirmed that all systems had a significant discriminative performance in all age groups. The system proposed by the American College of Radiology was the best performing one, but all risk-stratification systems could avoid a sizable number of biopsies when applied as rule-out tests (to exclude malignancy) in elderly patients. Abstract Ultrasonographic risk-stratification systems (RSS), including various Thyroid Imaging Reporting and Data Systems (TIRADS), were proposed to improve reporting and reduce the number of fine-needle aspiration biopsies. However, age might be a confounder since some suspicious ultrasonographic features lack specificity in elderly patients. We aimed to investigate whether the diagnostic performance of the RSS varied between age groups. All patients consecutively referred for thyroid biopsy between November 1, 2015, and March 10, 2020, were included. The malignancy risk of each nodule was estimated according to five RSS: the American Association of Clinical Endocrinologists/American College of Endocrinology/Associazione Medici Endocrinologi guidelines, the American College of Radiology (ACR) TIRADS, the American Thyroid Association guidelines, the European TIRADS, and the Korean TIRADS. Overall, 818 nodules (57 malignant) were evaluated. The malignancy rate was higher in patients ≤ 65 years (8.1%) than in patients > 65 years (3.8%; p = 0.02). All RSS confirmed a significant discriminative performance in both age groups, with a negative predictive value of 100% in patients > 65 years, although specificity was lower in older patients. The ACR TIRADS was the best performing in both age groups. RSS could avoid a sizable number of biopsies when applied as rule-out tests in elderly patients.


Introduction
Various published risk-stratification guidelines [1][2][3][4][5] provide recommendations for the evaluation of thyroid nodules based on the combination of nodule size and ultrasonographic (US) appearance [6], with the aim of improving the standardization of thyroid ultrasound reporting and the identification of the small subset of nodules that warrant fine-needle aspiration biopsy (FNAB). The performance of these systems has been validated in retrospective [7][8][9][10] and prospective studies [11][12][13] and has also been confirmed by a recent meta-analysis [14]. Classification is usually based on the recognition of patterns of sonographic features, though the American College of Radiology (ACR) Thyroid Imaging Reporting and Data System (TIRADS) [4] assigns nodules points for each of five US categories, which are then added to determine a final class. The decision of whether to perform a biopsy or monitor the nodule is based on the maximum nodule diameter, with a different threshold for each risk class. For nodules in high-risk classes, FNAB is usually indicated if the maximum diameter is 1 cm or more. For nodules in lower risk classes, the size thresholds for FNAB range from 1.5 to 3 cm, depending on the risk-stratification system. It has been demonstrated that the various risk-stratification schemes vary in their ability to reduce the number of unnecessary FNABs. However, the ACR TIRADS has been found to outperform the other risk-stratification systems in its ability to decrease the number of biopsies while improving diagnostic accuracy [7,11,14].
Most recently, the ACR TIRADS and the sonographic risk-stratification systems proposed by the American Thyroid Association (ATA) [2] and the American Association of Clinical Endocrinologists/American College of Endocrinology/Associazione Medici Endocrinologi (AACE/ACE/AME) guidelines [1] have been validated in a geriatric population [15]. In that study, it is suggested that age might be a confounder since some suspicious US features of thyroid nodules lack specificity in elderly patients [15].
The aim of this study was to investigate whether the diagnostic performance (and the number of avoided biopsies) of the five most widely used sonographic risk-stratification systems (also including the EU-TIRADS of the European Thyroid Association [3] and the K-TIRADS of the Korean Society of Thyroid Radiology [5]) varied between age groups.

Results
A total of 1349 thyroid nodule sonographic examinations before biopsy were evaluated. Some biopsies were performed multiple times on the same nodule during the study period (n = 119) due to cytology report suggestions, indeterminate cytology, non-diagnostic cytology, nodule growth, or the appearance of new suspicious features. In these cases, only the last examination was considered. The actual number of biopsied nodules was 1230 (1145 patients). Of these, 113 nodules were excluded because the maximum diameter was less than one centimeter, and 299 were excluded because of an inconclusive diagnosis (non-diagnostic or indeterminate cytology report without surgical pathology). To evaluate the potential impact of these exclusions on the age distribution of our final cohort, we compared the age distribution in the excluded and analyzed groups. Individuals with smaller nodules were younger (median 52 years (interquartile range, IQR 42-63) versus 57 years (IQR 47-67), p = 0.003), while patients with an inconclusive diagnosis were older (median 58 years (IQR 47-68 years) vs. 55 (IQR 46-66), p = 0.005). However, the age distribution was comparable between the group with excluded nodules and the final cohort ( Figure 1).
The final cohort included 818 thyroid nodules, with a median maximum diameter of 20.7 (IQR 15-28.8) mm, of which 57 (7%) were classified as malignant. Seventy-five patients were submitted to surgery (23 benign nodules, and 52 of the malignant nodules), with a median maximum diameter of the biopsied nodule of 16.8 (IQR 13.1-27.7) mm, smaller than the not resected biopsied nodules (21.1 mm, IQR 15.4-29.1 mm; p = 0.025). The malignancy rate was higher in patients ≤ 65 years (8.1%) than in patients older than 65 years (3.8%; p = 0.02). The need for surgery was not significantly different between groups (13, 6.1% in the elderly group, and 62, 10.2% in the younger group; p = 0.096). We analyzed the distribution of single sonographic features (Table 1) and found no differences between the two age groups except for cystic nodules, which were more common in young patients, and calcifications, which were more frequent in the elderly. The final cohort included 818 thyroid nodules, with a median maximum diameter of 20.7 (IQR 15-28.8) mm, of which 57 (7%) were classified as malignant. Seventy-five patients were submitted to surgery (23 benign nodules, and 52 of the malignant nodules), with a median maximum diameter of the biopsied nodule of 16.8 (IQR 13.1-27.7) mm, smaller than the not resected biopsied nodules (21.1 mm, IQR 15.4-29.1 mm; p = 0.025). The malignancy rate was higher in patients ≤ 65 years (8.1%) than in patients older than 65 years (3.8%; p = 0.02). The need for surgery was not significantly different between groups (13, 6.1% in the elderly group, and 62, 10.2% in the younger group; p = 0.096). We analyzed the distribution of single sonographic features (Table 1) and found no differences between the two age groups except for cystic nodules, which were more common in young patients, and calcifications, which were more frequent in the elderly.  When using these features to classify nodules according to the five sonographic risk-stratification systems, we found no differences in the distribution of the two age groups with the AACE/ACE/AME, ACR TIRADS, and K-TIRADS systems (Table 2). However, elderly patients more commonly harbored EU-TIRADS 5 nodules and lesions that were non-classifiable in the ATA scheme (i.e., isoechoic nodules with other suspicious features like microcalcification, irregular margins, taller-than-wide shape, disrupted rim calcifications with a small extrusive hypoechoic soft tissue component, or evidence of extrathyroidal extension). However, if non-classifiable nodules were grouped with intermediate-suspicion nodules, the difference disappeared (Chi-square test; p = 0.214). The malignancy rate for each sonographic risk class is reported for each age group in Table 2.
Finally, we evaluated the diagnostic accuracy of the five systems by calculating sensitivity, specificity, positive and negative predictive values, and area under the receiver operating characteristic curve (AUROC) for patients younger and older than 65 years (Table 3). All systems confirmed a statistically significant discriminative performance in both age groups, with the specificity and positive predictive values of the systems being generally lower in older patients. However, all systems achieved a negative predictive value of 100% in patients > 65 years since no malignancy was missed by any of the systems. However, it is worth noting that for the ATA system, such a test performance was not confirmed if non-classifiable nodules were not submitted to biopsy. In fact, 16/172 (9.3%) of these nodules harbored a malignancy, and if they were not subjected to biopsy, the negative predictive value of the ATA system would decrease to 96.1% (95% CI 90.3-98.9%) in the > 65 group and to 94.1% (95% CI 90.4-96.6%) in patients ≤ 65 years. The application of these systems would avoid 13.2-45.3% of all FNABs in patients > 65 years. The ACR TIRADS was the best performing system as it was able to prevent the highest number of biopsies and achieve the best discriminative performance as estimated by the AUROC in both age groups.

Discussion
While the prevalence of thyroid nodules increases with increasing age, the malignancy rate is reported to be lower [16]; thus, the proper identification of the small number of lesions requiring clinical attention is of paramount importance in elderly patients. The chances of diagnosing asymptomatic thyroid nodules are increased by the frequent use of high-frequency ultrasound and cross-sectional imaging in routine clinical care [17]. However, while confirmed cancers in elderly patients are more likely to be aggressive [16], the risks associated with overtreatment of benign or low-risk malignant diseases should be carefully avoided in frail patients since the benefits are uncertain [18]. It is now clear that less aggressive treatment approaches are safe for low-risk thyroid malignancies [19,20], even if these are still relatively uncommon in real-world practice [21]. In elderly patients, an active surveillance approach may be used to defer or even definitively avoid surgery [22]. However, clinicians may be concerned by the potential occurrence of more aggressive tumors in older patients if a long-term follow-up protocol is adopted instead of immediate thyroid nodule biopsy.
In our cohort, we found that nodules submitted to biopsy in individuals > 65 years had more calcifications, even if the overall rate of malignancy was lower than in younger patients [16]. US-detected microcalcifications are associated with the presence of psammoma bodies [23] in papillary thyroid cancer. However, dystrophic or stromal calcifications and eosinophilic colloid may also appear as punctate hyperechogenic foci [24], similar to microcalcifications.
The distribution of risk categories was comparable between age groups in the different sonographic risk-stratification systems, with the exception of the EU-TIRADS and ATA guideline systems. Due to the higher rate of microcalcifications, the number of EU-TIRADS five and ATA non-classifiable nodules was higher in older patients. The ATA non-classifiable nodules are a significant proportion of the whole cohort and have a non-negligible malignancy rate, as previously reported by other authors [25]. This is due to the presence of key suspicious features in the context of isoechogenic nodules. For this reason, as suggested in the recent literature [25], non-classifiable nodules were counted as intermediate-suspicion nodules: in this way, the difference in ATA risk category distribution between age groups disappeared.
When analyzing the diagnostic performance of sonographic stratification systems, their discriminative ability was confirmed in people > 65 years, even if the low positive predictive values and specificities suggested the need to revise the definition or the relative weight of some features in the > 65 age group (microcalcifications seemed to be the most critical). These points should be taken into consideration when current guidelines are updated or in the development of new systems. These results were consistent with data reported by Di Fermo et al. [15], which supported the validity of sonographic stratification systems in elderly patients, even if the specificity of suspicious features, in this setting, was lower than expected. Conversely, in our cohort, there was no significant difference in diagnostic performance between the ATA and AACE/ACE/AME systems. The ACR TIRADS achieved the best discriminative performance. It is important to note that this system weighs microcalcifications and other punctate echogenic foci equally. Our results might be due to the low overall malignancy rate in our cohort. In settings with a higher pretest probability of malignancy (e.g., 18F-fluorodeoxyglucose positron emission-positive nodules), TIRADS with a higher propensity to indicate FNAB may be preferred [26].
This study had some limitations. First of all, the sample size might be limited. Most malignancies were confirmed by surgical histology, but false positives could not be excluded for patients with cytological diagnoses of malignancy who opted for conservative management. For cytologically-benign nodules, false negatives may occur, and a false negative rate of 3.7% has been reported [27]. Furthermore, we excluded subcentimeter nodules (9.2%) and lesions with inconclusive cytology (24.3%), although these exclusions did not alter the age distribution of our final cohort. The exclusion of indeterminate cytology nodules from the analysis might have reduced the amount of follicular thyroid cancers. However, scoring systems have also been found to correctly classify these cancers [28], mainly due to the suggestion to biopsy nodules greater than 20-25 mm, regardless of their sonographic pattern.

Materials and Methods
All patients consecutively referred to our center for FNAB of a thyroid nodule between November 1, 2015, and March 10, 2020, were included in the study. The study was conducted with institutional review board approval (Sapienza University Ethics Committee, study number 806/16) and written consent.
Patients were referred by our thyroid nodule clinic and by other specialists, including hospitalists, endocrinologists, nuclear medicine physicians, and surgeons, based on clinical risk factors, sonographic risk features, or patient preference.
Prior to FNAB, each nodule was examined with a HI-VISION Avius ® system (Hitachi Medical Corporation, Inc., Tokyo, Japan) and a 13-MHz linear-array transducer. During this re-examination, two clinicians experienced in thyroid imaging recorded their joint evaluation of the sonographic features of each nodule on a standardized form. Full details on the enrollment criteria and procedures used for sonographic assessment, risk stratification, and FNAB examination of the nodules have previously been published [11,29,30]. We previously used a subset of this cohort in previous studies we conducted to compare the diagnostic performance of the systems, evaluate the impact of intrathyroidal location, and propose a better definition of the taller-than-wide shape, the results of which have already been reported [11,31,32]. In summary, all nodule sonographic features were collected, and the malignancy risk of each nodule was estimated automatically according to five sonographic risk-stratification systems by applying an algorithmic approach: the AACE/ACE/AME guidelines, the ACR TIRADS, the ATA guidelines, the EU-TIRADS, and the K-TIRADS. Nodules that could not be classified with the ATA guidelines were considered intermediate-suspicion nodules (i.e., iso or hyperechoic nodules with high-suspicion features, including irregular margins, microcalcifications, taller-than-wide shape, disrupted rim calcifications with a small extrusive hypoechoic soft tissue component, or evidence of extrathyroidal extension) [25]. Nodules with a maximum diameter of less than 1 cm were excluded from this study since none of the risk-stratification systems routinely recommend FNAB for subcentimeter thyroid nodules.

Reference Standard
Cytology was classified according to the criteria published in the Italian consensus for thyroid cytopathology [33,34], a six-tiered system comparable to the Bethesda System for Reporting Thyroid Cytopathology. If surgery had been performed, the reference standard diagnosis (malignant vs. benign) was based on histological examination of the resected nodule. If the nodule was not resected, a cytology-based reference standard was applied. Nodules were considered malignant if they were classified as TIR4 or TIR5 (corresponding to Bethesda classes V and VI), and benign if they were classified as TIR2, corresponding to Bethesda class II. Unresected nodules that were cytologically classified as TIR1 (non-diagnostic), 3A (low-risk indeterminate), or 3B (high-risk indeterminate) were excluded.

Age Groups
Patients were grouped according to their chronological age, a younger group (≤65 years), and an elderly group (>65 years). It is the classical, conventional threshold, that we adopted, even if it is subject to changes based on comprehensive evidence in various aspects of social, cultural, and medical sciences [35].

Statistical Analysis
The nodules for which FNAB was indicated in each system were flagged as test positive. The sensitivity, specificity, positive and negative predictive values (PPV and NPV), and the AUROC, each with 95% confidence intervals, were computed for each system. Differences in categorical variables between groups were analyzed using the Chi-square test or the Fisher exact test.
The proportion of biopsies that would not have been indicated by the various systems were compared using the McNemar test. Data were analyzed with IBM SPSS Statistics, version 25.0 (IBM Corp., Armonk, NY, US). AUROC was compared with the DeLong approach [36] using the easyROC package [37].

Conclusions
In conclusion, when current risk-stratification systems were applied in clinical practice as rule-out tests for older patients, all were able to avoid a sizable number of biopsies, with a negative predictive value of 100%. Indeed, no malignancy was missed in any of the systems, though this result required that non-classifiable nodules in the ATA guidelines be considered intermediate-suspicion lesions. As previously reported in the general population, the ACR TIRADS outperformed the other systems as it avoided the highest number of biopsies and had the best discriminative power in the > 65 age group.