Malignancy Analyses of Thyroid Nodules in Patients Subjected to Surgery with Cytological- and Ultrasound-Based Risk Stratiﬁcation Systems

: The ﬁne needle aspiration (FNA) cytology is the gold standard for the preoperative diagnosis of thyroid cancer. However, up to 30% of FNA examinations yield nondiagnostic or indeterminate results and this complicates patient management. Clinical features and ultrasound (US) patterns, including US risk stratiﬁcation systems, could be useful in the preoperative diagnostic workup and prediction of malignancy, but the evidences are not univocal. Methods: 400 consecutive patients subjected to thyroid surgery were retrospectively enrolled at our institution in Calabria, Southern Italy. Preoperative US and FNA cytological descriptions, formulated according to the “Italian consensus for reporting thyroid ﬁne-needle aspiration cytology” (ICCRTC) classiﬁcation and three US risk stratiﬁcation systems (those developed by the American Association of Clinical Endocrinologists, American College of Endocrinology and Associazione Medici Endocrinologi (AACE / ACE / AME), American Thyroid Association (ATA), and American College of Radiology (ACR-TIRADS)), were collected, along with histological results. Results: 147 thyroid cancer cases, in large majority papillary carcinomas, were detected on ﬁnal histological examination. Almost two-thirds of patients subjected to thyroid surgery for either benign or malignant lesions were female. Patient’s age ≤ 20 years and between 21–30 years were clinical features associated with increased risk of thyroid cancer in logistic regression analyses. US features associated with thyroid cancer included irregular margins, solid composition, microcalciﬁcations, and marked hypoechogenicity. The AACE / ACE / AME, ATA, and ACR-TIRADS risk categories, corresponding to speciﬁc US patterns, were strong predictors of malignancy in both genders, but not in nodules with indeterminate cytology. A measured di ﬀ erence between the longitudinal (L) and the anteroposterior (AP) diameter > 5 mm, a proxy for a parallel-oriented oval


Introduction
Thyroid nodules, defined as discrete thyroid lesions-radiologically distinct from the surrounding parenchyma [1]-are among the most common endocrine disorders. Asymptomatic, non-palpable thyroid nodules can be diagnosed incidentally during an instrumental cervical examination in more than half of the general adult population, but only a small percentage (approximately 5-15% prevalence) account for thyroid cancer [1,2]. The prevalence of thyroid nodules increases with age and is two-to-four times higher in women than in men [3]. There is also a clear female predominance with respect to the occurrence of thyroid cancer, which is recognized as the first leading endocrine malignancy across the globe [4,5]. Classically, thyroid cancer is categorized into four main histological types: papillary carcinoma (PTC)-the most common type, accounting for 80-85% of thyroid cancer cases; follicular carcinoma (FTC)-the second leading type, accounting for approximately 9-40% of thyroid cancer cases, in relation to the population studied and iodine intake; medullary thyroid carcinoma (MTC)-accounting for less than 7% of thyroid cancer cases; anaplastic thyroid carcinoma (ATC)-the rarest and most serious type, accounting for less than 2% of thyroid cancer cases [5,6]. However, in 2017, several amendments to the histological diagnosis and classification of thyroid cancer have been made by the World Health Organization (WHO), including a reinterpretation of the encapsulated follicular variant of PTC as a "non-invasive follicular thyroid neoplasm with papillary-like nuclear features" (NIFTP), more representative of its indolent nature, and recognition of the Hürthle cell carcinoma (HCC) as a separate tumor entity from FTC variants [7]. The fine needle aspiration (FNA) cytology is the gold standard and obligatory tool for the preoperative diagnosis of thyroid cancer [6]. Unfortunately, up to 30% of FNA examinations performed in practice yield nondiagnostic or indeterminate results and this complicates patient management [8]. Indeed, the optimal treatment strategies in these cases are uncertain, and, based on patients' specific clinical considerations and preference, may include surveillance, repeat FNA, molecular testing and, most remarkably, diagnostic thyroidectomy [9], which can be followed by permanent surgical complications in both its total and partial form [10,11]. Thus, predicting which thyroid nodules are malignant at final histological examination, while sparing most patients with benign nodules from unnecessary surgical procedures and related consequences, is a challenge for the endocrinologist.
In 2014, the "Italian consensus for reporting thyroid fine-needle aspiration cytology" (ICCRTC) devised a six-tiered class grouping as a variance from the Bethesda System, with the purpose of promoting more appropriate surgical indications for thyroid nodules with indeterminate cytology [12]. According to the ICCRCT classification, indeterminate cytology lesions can be split into two subcategories with different risk of malignancy: the low-risk (<10%) indeterminate TIR3A and the high-risk (approximately 15-30%) indeterminate TIR3B [12]. However, as evidenced in a recent meta-analysis of retrospective studies, thyroid cancer rates in nodules with low-risk and high-risk indeterminate cytology frequently differ from those expected from the ICCRCT classification, being 17% and 47% respectively [13]. At the same time, more than half of patients harboring a benign nodule with TIR3B cytology still undergoes an unnecessary thyroidectomy [14]. Actually, only a few studies have considered the recent NIFTP reclassification in the assessment of malignancy rates for the ICCRCT [13,15,16], and none has specifically involved individuals from Calabria (Southern Italy), a geographical area historically exposed to iodine deficiency and endemic for goiter [17]. Over the last few decades, several benign thyroid disorders-including goiter and autoimmune thyroiditis-have considerably changed in their clinical presentation in this area, likely due to variations in environmental factors and iodine status [17,18]. Starting from the mid-1990s, the average annual frequency of autoimmune thyroiditis has been steadily increasing, until reaching a plateau in the mid-2000s [18]. Coincidently, both age at diagnosis and female-to-male ratio of this autoimmune thyroid disorder have progressively declined [18], alongside with the prevalence of goiter [17]. Even though less is known about the regional epidemiology of thyroid tumors, an excess thyroid cancer-specific mortality with respect to other Italian regions has been reported in Calabria up to 2009, which has been related to a former iodine deficit [19].
As documented in the literature, several clinical features could relate to an increased risk of thyroid cancer, including specific age groups, male gender, familial history for this tumor, radiation exposure, and the presence of a rapidly enlarging nodule causing compressive symptoms [3]. Ultrasound (US) assessment is also useful in the pre-FNA diagnostic workup, given that many US features of a nodule have often, but not consistently, been associated with malignancy risk, such as irregular margins, hypoechogenicity, microcalcifications, and solid composition [20].
Notwithstanding this, when considered in isolation, none of them are sufficiently sensitive to guide clinical decisions, and thus FNA represents a key decisional element [21,22]. Assessment of nodule shape with US can be used to better define malignancy risk, as nodules classified as spherical based on the ratio of their longest-to-shortest axis, have been associated with thyroid cancer, whereas nodules classified as oval were found to carry a substantial lower risk [23]. However, the assessment of nodule orientation, which refers to whether the transverse (LL) or longitudinal (L) diameters are longer than the anteroposterior (AP) diameter on their relative planes, is another critical point in US descriptions, given that nonparallel, taller-than-wide nodules are more concerning than oval nodules with parallel orientation [24]. Notably, in the last years, several risk stratification systems, based on specific US patterns of nodule features (each associated with an estimated risk of malignancy) have been developed by international Endocrine and Radiological Societies, in order to improve the diagnostic performance of US, and better identify thyroid nodules that should be subjected to FNA or followed up with more attention. Widely used systems include those proposed by the American Thyroid Association (ATA) [25], the American Association of Clinical Endocrinologists, American College of Endocrinology, and Associazione Medici Endocrinologi consensus (AACE/ACE/AME) [1,26], as well as the scoring model recommended by the American College of Radiology (ACR-TIRADS) [27]. Despite differences in number and nomenclature of US patterns, classification of thyroid nodules with US risk stratification systems allows a higher interobserver agreement than the evaluation of individual features [28], and reduces the number of unnecessary FNA on benign lesions [29]. Furthermore, there is initial evidence that the US risk categories might be useful not only to set the threshold for FNA, but also to personalize patient management and surgical indications, especially when the cytological diagnosis is indeterminate [22,30]. Nevertheless, other than giving information for repeating US and FNA examinations during follow up, the current guidelines do not yet recommend a different management for cytological indeterminate thyroid nodules based on US risk stratification systems [22].
On the basis of these premises, the purpose of this study was to determine thyroid cancer rates and cytohistological correlations, with reference to the ICCRTC classification, in a large retrospective cohort of patients who underwent thyroid surgery at our institution in Calabria, Southern Italy. Furthermore, we aimed at identifying the preoperative predictors of malignancy, particularly in the subset of thyroid nodules with indeterminate cytology, and verifying the diagnostic performance of the ATA, AACE/ACE/AME, and ACR-TIRADS US risk stratification systems for the detection of thyroid cancer.

Study Design
We retrospectively enrolled 400 consecutive patients subjected to total or partial thyroidectomy due to nodular thyroid disease at the Endocrine Surgery Unit (University Hospital Mater-Domini, Catanzaro, Italy) between January 2010 and September 2019. All eligible patients underwent preoperative US and FNA cytology assessments at our endocrinology outpatient clinic (University Hospital Mater-Domini, Catanzaro, Italy). The indications for surgery were thyroid lesions with US and/or cytological features suspicious for malignancy, or rarely, in case of benign and indeterminate cytology features, compressive symptoms, patient's specific factors, and personal preference [9]. Data concerning demographic features of eligible patients, preoperative US, and cytological descriptions, along with postoperative histopathologic results, formulated according to the WHO Classification of thyroid cancer, were reviewed and collected from digital medical records, as detailed in the following sections. The exclusion criteria were: missing or inadequate preoperative US imaging data or having performed preoperative cytological examination in other outpatient clinics. For patients with multiple thyroid nodules subjected to FNA procedures, only the one corresponding to the highest risk cytological category, as predicted by the ICCRTC classification (TIR5 (malignant) > TIR4 (suspicious of malignancy) > TIR3B (high risk indeterminate) > TIR3A (low risk indeterminate) > TIR1/1C (non-diagnostic) > TIR2 (benign)) [12], was considered for the outcome analyses. The data collection was approved by the ethics committee of Regione Calabria Sezione Area Centro (protocol registry no. 343 of 21 November 2019). As the data were analyzed anonymously, there was no need for written informed consent. The study was performed in accordance with the Declaration of Helsinki.

US Assessment
Real-time US assessment was performed on longitudinal and transverse planes with the use of a high-resolution ultrasound system (10 MHz, Aplio XG, Model SSA 790A, Toshiba Corp., Tokyo, Japan) by two expert endocrinologists, trained in cervical sonography, under routine clinical practice conditions. Standardized US description of thyroid lesions included position, composition, echogenicity, presence of calcifications, margins, vascularity, size, and shape. Position was defined as the location of a nodule within the thyroid gland in relation to isthmus, left lobe or right lobe. The composition was defined based on the ratio of the cystic component to the nodule volume as solid (liquid component ≤10%), mixed (liquid component >10% and ≤90%), cystic (liquid component >90%), and spongiform (multiple heterogeneous microcystic areas separated by thin septations occupying ≥50% of the nodule volume). Echogenicity was defined as the predominant brightness of the solid component of a nodule in comparison with thyroid parenchyma and/or adjacent infrahyoid muscles, using the following terms: isoechoic, hyperechoic, hypoechoic (relative to thyroid parenchyma), marked hypoechoic (relative to adjacent infrahyoid muscles), and anechoic (in case of cystic lesions). When present, calcifications were classified as micro-(<1 mm intranodular hyperechoic punctate foci), macro-(>1 mm hyperechoic foci), and peripheral rim calcifications. In order to improve visualization and assessment of microcalcifications, which are US features suspicious for malignancy, additional Micropure™ image processing (Aplio XG, Toshiba Corp., Tokyo, Japan) was used [31]. Margins were described as per definition (presence or absence of a clear demarcation between the nodule and the surrounding thyroid parenchyma) and regularity (absence or presence of edges and/or lobules) as well-defined, ill-defined, regular, or irregular. Nodule vascularity, with reference to the amount and distribution (peripheral or central) of blood flow on Power Doppler US analysis, was described as absent, perinodular, intramodular, and peri-intranodular. To estimate nodule size, three diameters were measured: AP, LL, and L diameter. The AP and LL diameters were measured on the transverse scan as the largest nodule dimensions from front to back and side to side, respectively, whereas the L diameter was measured on the longitudinal scan as the largest dimension from side to side ( Figure 1). The maximum diameter of the nodule was defined as the largest diameter on any imaging plane. Nodule volume and surface were calculated using the ellipsoid formulas. Based on comparisons among AP, L, and L diameters, nodule shape was defined as round (when the AP diameter was equal to LL and L diameters), ovoid with parallel orientation (when the AP diameter was shorter than LL and L diameters) and taller-than-wide with nonparallel orientation (when the AP diameter of a nodule was longer than its LL or L diameter). Cervical lymph nodes were classified as normal or, in case of one or more suggestive US features (e.g., round shape, absence of an echogenic hilus, hypo-or hyper-echogenicity, presence of calcifications, or structural and vascular changes) as suspicious [32]. US data were collected and stored for further analysis. Sets of US features were reviewed from digital medical records to classify the malignancy risk of thyroid nodules, according to the internationally endorsed ATA [25], Endocrines 2020, 1 106 AACE/ACE/AME [1,26] and ACR-TIRADS [27] US risk stratification systems. By using the approach reported in a previous study with similar retrospective design [33], a yes or no answer to each of the US features derived from the ATA, AACE/ACE/AME, and ACR-TIRADS guidelines were input, for the index nodule, into a prespecified Microsoft Excel worksheet (Microsoft Office 2016, Redmond, WA, USA) and then fitted into the correspondent category for the relevant system. However, in our work, the investigator responsible for data input was not blind to cytological and histological outcomes [33].
variables are reported as mean and standard deviation (SD) or median and interquartile range (IQR), as appropriate, whereas discrete variables are reported as numbers and percentages. Differences between continuous variables, including nodule diameters on different imaging planes and relative comparisons, were analyzed using the Mann-Whitney test, whereas the two-tailed Fisher exact test was used for the comparison of proportions. Logistic regression analyses were used to evaluate the effects of distinct nodule features and demographic characteristics as possible predictors of thyroid cancer. Odds ratios (ORs) with 95% confidence bounds were calculated. In each logistic regression analysis, missing data on predictive variables were handled with listwise deletion. When addressing the differences between nodule diameters, the decision about the most appropriate cut-off point for discrimination of malignancy risk was supported by calculating the maximum Youden's index J for a receiver operating curve (ROC), along with the minimum d value. In all analyses, statistical significance was fixed at an α level of 0.05. The statistical software package SPSS Statistics 20.0 (IBM Corp., Armonk, NY, USA) was used for data analysis.

FNA Cytology Assessment
FNA cytology assessment was performed under US guidance and aseptic conditions, by using a 22-to 27-gauge needle attached to a 20 mL disposable syringe within a syringe holder (Cameco syringe pistol). The smeared slides of FNA specimens were fixed in 95% ethyl alcohol, stained with the Papanicolau stain, and interpreted according to routine ICCRTC diagnostic classification by two expert thyroid cytopathologists. If eligible patients were enrolled before the introduction of ICCRTC in 2014 [12], available preoperative FNA slides were reviewed and reclassified for the study aims.

Statistical Analysis
Initially, continuous data were tested for normality using the Saphiro-Wilk test. Continuous variables are reported as mean and standard deviation (SD) or median and interquartile range (IQR), as appropriate, whereas discrete variables are reported as numbers and percentages. Differences between continuous variables, including nodule diameters on different imaging planes and relative comparisons, were analyzed using the Mann-Whitney test, whereas the two-tailed Fisher exact test was used for the comparison of proportions. Logistic regression analyses were used to evaluate the effects of distinct nodule features and demographic characteristics as possible predictors of thyroid cancer. Odds ratios (ORs) with 95% confidence bounds were calculated. In each logistic regression analysis, missing data on predictive variables were handled with listwise deletion. When addressing the differences between nodule diameters, the decision about the most appropriate cut-off point for discrimination of malignancy risk was supported by calculating the maximum Youden's index J for a receiver operating curve (ROC), along with the minimum d value. In all analyses, statistical significance was fixed at an α level of 0.05. The statistical software package SPSS Statistics 20.0 (IBM Corp., Armonk, NY, USA) was used for data analysis.

Characteristics of Study Participants and Final Histological Outcomes
Out of 400 patients subjected to endocrine surgery due to nodular thyroid disease, 153 thyroid cancer cases were detected on final histological examination: 132 (86.3%) PTC; 8 (5.2%) FTC; 9 (5.9%) MTC; 1 (0.7%) ATC, 2 (1.3%) HCC, and 1 (0.7%) thyroid lymphoma. Among the histological variants of PTC, there were only 5 (3.8%) NIFTP. Out of 247 patients with benign nodules, the most commonly reported disease was nodular goiter (N = 193, 78.1%), including 33 cases of nodular adenomatous hyperplasia, followed by follicular adenoma (N = 31, 12.5%) with 4 cases of Hürthle cells variants, and chronic lymphocytic thyroiditis (N = 23, 9.3%). Table 1 reassumes the demographic features of study participants, in relation to the final histological outcome for their resected index nodule. As shown in Table 1, approximately two-thirds of patients subjected to surgery for either benign or malignant thyroid nodules were female. On average, patients with thyroid cancer were significantly younger than patients with benign thyroid conditions. In particular, patients younger than 20 and 21-30 years of age were more likely to harbor a malignant nodule (p = 0.016 and p < 0.001, respectively), whereas patients between 51-60 years of age were more likely to undergo surgical procedures due to benign thyroid diseases (p = 0.046). No significant differences in thyroid cancer rates were found among other ranges of age groups (Table 1). Table 2 summarizes the US features for the relevant nodules. As shown in Table 2, malignant nodules were significantly smaller and accompanied by radiologically suspicious cervical lymph nodes, with respect to their benign counterparts (p < 0.001). Moreover, at US assessment, thyroid cancer frequently appeared as a solid, markedly hypoechoic nodule, with irregular and/or ill-defined margins (p < 0.001), whereas benign nodules presented a mixed composition (p < 0.001), with mild hypoechogenicity of the solid component (p = 0.007), or rarely, in forms of cystic/spongiform masses. Interestingly, an oval shape with parallel orientation was predominant in benign nodules (p < 0.001), while a tendency toward a nonparallel, taller-than-wide shape was found in malignant nodules. As expected, the presence of microcalcifications was significantly more common in malignant than in benign nodules (p = 0.015). No significant differences in vascularity patterns were observed, although malignant nodules showed a slight tendency toward an increased (peri-) intranodular flow.
Regarding the AACE/ACE/AME, ATA, and ACR-TIRADS risk stratification systems, thyroid nodules could be classified into three-to-five risk categories, according to their estimated malignancy risk and specific US patterns [1,[25][26][27]. Malignant nodules were significantly and more frequently classifiable as highly suspicious lesions, when compared to benign nodules (p < 0.001), irrespective of the US risk stratification system used. However, only the ACEE/ACE/AME system could categorize the large majority of thyroid cancer cases as high-risk lesions (N = 102; 66.7%). Most benign nodules were classifiable as intermediate-risk lesions according to the AACE/ACE/AME system, and as moderately suspicious lesions according to ACR-TIRADS (Table 2). When we analyzed the final histological outcomes in the high-risk categories for each classification system, we found that almost all predicted malignant nodules (range 87.3-90.2%) were PTCs (Supplementary Table S1). Still, there were 77 nodules based on AACE/ACE/AME, 33 nodules based on ATA, and 17 nodules based on ACR-TIRADS, which were categorized as highly suspicious for malignancy, but proved benign on final pathology. In these cases, as evidenced in Supplementary Table S1, nodular goiter was the most recurrent finding (range 57.1-60.6%).

Rates of Malignancy for the ICCRTC Categories
Thyroid nodules were also classified according to the cytological ICCRTC classification [12]. As evidenced in Table 3, benign and malignant nodules significantly differ for their preoperative FNA cytological features, except in case of repeatedly nondiagnostic TIR1 specimens. In line with the strong predictive values previously reported for the extreme categories of the ICCRCT classification [34], there was a limited number of cytologically malignant nodules (N = 10, TIR4; N = 1, TIR5) with final benign pathology on surgery (namely six cases of nodular goiter, two cases of nodular adenomatous hyperplasia, two cases of chronic lymphocitic thyroiditis, and one case of follicular adenoma). At the same time, there were only a few cytologically benign nodules (N = 6, TIR2), resulting malignant at final histological examination (five PTC cases and one FTC) ( Table 3). When we examined the rates of thyroid cancer for the different ICCRTC diagnostic categories, either considering NIFTP as malignant, or excluding it from thyroid cancer cases (N = 148), we found small, non-significant, reductions of malignancy rates in TIR3B (29.6% vs. 27.5%), TIR1/1C (17.8% vs. 14.2%), TIR5 (98.3% vs. 96.6%), and TIR2 (7.0% vs. 5.8%), but not in other risk classes (TIR3A 25.6% and TIR4: 77.3%).

Predictors of Malignancy in Thyroid Nodules and Diagnostic Performance of US Risk Stratification Systems
To test a series of variables for predicting final pathology outcomes in thyroid nodules, we performed logistic regression analyses. The tested categorical variables included a measured difference between L and AP diameters >5 mm, following ROC analyses showing that comparative differences beteween nodule diameters measured on the transverse and longitudinal scans were significantly associated with benign thyroid lesions ( Supplementary Figures S1 and S2). Although differences between both L, LL, and AP diameters were positively and significantly associated with bening thyroid pathology on surgery, as expected from the protective effect of an oval shape for a nodule [23], we focused on differences between L and AP diameters, given the higher area under curve of this predictive model (Supplementary Figure S1). Based on the ROC curve on Supplementary Figure S2, 5 mm for a measured difference between L and AP diameter, corresponding to the maximum Youden's J and minimum d, with a sensitivity for benign diagnosis of 70.4% and a specificity of 59.7%, could be used as the most appropriate cut-off value for discrimination of malignancy risk.

Predictive Values of a Measured Difference between L and AP Diameter ≤5 mm
Finally, mindful of the gold standard role of FNA for the preoperative diagnosis of thyroid cancer [6,34], and in order to confirm the reliability of our proposed critical cut-off for a measured difference between L and AP diameters in discriminating malignancy, sensitivity (SEN), specificity (SPE), and predictive values (positive, PPV; negative, NPV) of a difference between L and AP diameter ≤ 5 mm were calculated and compared with those of the ICCRCT system. TIR3B was used as the cytodiagnostic threshold for the definition of a positive test [34] (Table 6). Positive and negative predictive values of a measured difference between L and AP diameter ≤5 mm in predicting malignancy in all thyroid nodules, regardless of the correspondent ICCRTC category, were 52.8% and 75.6% respectively, with an overall accuracy of 66.4% (Table 6). When positive and negative predictive values were tested in the subset of thyroid nodules classified as indeterminate according to the ICCRCT, we found results of 38.8% and 80.3%, thus confirming that a measured difference between L and AP diameter > 5 mm, could be a good predictor of benign thyroid lesions even in indeterminate-risk cases, irrespective of their cytological subdivision (NPV in TIR3A nodules 80.4%; NPV in TIR3B nodules 80.3%), and with comparable accuracy (Table 6). Table 6. Predictive values of a measured difference between AP and L diameter ≤ 5 mm and FNA cytodiagnostic test.

Discussion
The determination of the nature of a thyroid nodule is crucial for an optimal patient management [5,6,35]. A reduction in the number of thyroid nodules subjected to final histological examination with routine preoperative US and FNA diagnostic procedures is found to have an important socio-economic impact, diminishing the number of unnecessary thyroidectomies and associated comorbidities. Our logistic analysis confirms some well-established predictors of malignancy, including young patient age (≤30 years), and typical US characteristics of irregular margins, microcalcifications, solid composition, and marked nodule hypoechogenicity [3,20,35]. However, we found no association between patient's age >60 years and risk of thyroid cancer, whereas patient's age between 51-60 years emerged as a protective factor. These findings are in contrast with other works, reporting the peak occurrence of thyroid cancer in patients aged 51-60 years [36] and increased risk of malignancy in older individuals [3,20,37]. Although a selection bias cannot be excluded due to the retrospective design of our study, we believe that these results might be, at least in part, indicative of a changing epidemiology of thyroid cancer related to nationwide prophylaxis programs aimed at increasing iodine intake [38]. Indeed, over the last few decades, a progressive and significant improvement of iodine nutrition has been observed in Calabria, a Southern Italian region where a mild to moderate iodine deficiency leading to endemic goiter has persisted until recently [17]. Increased iodine intake would also explain the relatively low prevalence of FTC (favored by iodine deficiency) and predominance of PTC among patients with malignant nodules in our study [4,6,38]. However, due to missing regional epidemiological data about earlier periods, this suggestion requires further confirmation.
Notwithstanding a slight tendency toward an increased intranodular flow in malignant nodules, this vascular pattern was not associated with thyroid cancer. While some investigations have stated that intranodular vascularity could relate to abnormal tumor growth, others have questioned its usefulness in the preoperative assessment of malignancy, because of the possibility of detection in hyperplastic nodules [14,39,40]. Moreover, fibrosis within PTC has been described as a common histological finding (up to 89% of papillary carcinomas), which would explain a low intranodular vascularity [14]. Although a larger size has been reported as an independent predictor of malignancy [3,37,41], the relationship between nodule size and risk of thyroid cancer remains uncertain. In concordance with the results of other studies [42][43][44][45], we observed that benign thyroid nodules were significantly larger than malignant nodules, and that the risk of thyroid cancer was inversely related to the largest nodule diameter. Even more relevant, benign lesions were more frequently oval-shaped and parallel-oriented. In a retrospective cohort study with initial prospective validation, nodules with a more spherical shape have been related to higher malignancy rates than oval-shaped nodules [23]. However, while all solid nodules with a longest-to-shortest axis ratio of 2.5 or more were found to be benign on cytological analysis, only a few patients would have been spared unnecessary surgery with this cut-off [23]. Furthermore, a longest-to-shortest axis ratio of 1.0-1.49, defining nodules with a more spherical shape, could not be regarded as pathognomonic of malignancy [23,46]. As opposed to the arbitrary categories Endocrines 2020, 1 114 of longest-to-shortest axis ratios proposed in this previous study and others [23,24], for the first time, we pinpointed the measured difference between L and AP diameters to better define the preoperative risk assessment of malignancy in thyroid nodules. Indeed, in our surgical series, a measured difference between L and AP diameter >5 mm, a proxy for a parallel-oriented oval shape of a nodule, was inversely associated with thyroid cancer, and this association remained significant even in high-risk indeterminate TIR3B cytology nodules.
In indeterminate thyroid lesions classified as either TIR3A or TIR3B according to ICCRCT, we found that US risk stratification systems would have failed to adequately estimate the risk of malignancy, given that only irregular margins and solid composition were conventional suspicious for US features associated with thyroid cancer. In nodules with indeterminate cytology, specific US patterns have been associated with different histological outcomes, making a suggestion to use the ATA system not only to set the thresholds for FNA, but also to personalize patient management regarding the urgency and purpose of intervention [30]. On the other hand, a suboptimal diagnostic accuracy has been also reported [47], so that the role of US risk stratification systems in the assessment of malignancy risk remains controversial. Even though the sensitivity of the ATA guidance could be improved by adjusting for nodule size [47], a drawback of this system is that the solid composition of a nodule is not categorized as an independent risk factor for thyroid cancer [24,45]. Based on our results, a measured difference between L and AP diameters >5 mm could be considered as an additional and practical tool for ruling out malignancy in indeterminate cytology nodules, with the potential to reduce unnecessary surgical procedures. However, even if the clinical decisions would be led by US and FNA assessments, a complete clinical workup of patients with thyroid nodules should always be made [33,35]. Actually, several benign lesions could mimic thyroid cancer at US examination. Patients' specific clinical details and changes in nodule morphology on the multiple imaging planes of US can be relevant for the differential diagnosis, often preventing further FNA and surgical procedures. For example, ill-defined hypoechoic nodules are common findings in patients affected by a subacute granulomatous thyroiditis, frequently, but not always, presenting with painful neck swelling, fever, or both [48]. Despite those malignancy-mimicking features, these transient thyroid lesions have a tendency to appear elongated on the longitudinal, but not on the transverse scan [48]. Also, changes in nodule morphology on serial US, including size reduction and the appearance of suspicious features (e.g., speculated margins, marked hypoechogenicity, taller-than-wide shape, solid content, micro-and macrocalcifications), causing an upgrade in the correspondent risk stratification category, may occasionally accompany the natural course of degenerating thyroid nodules [49]. In cystic or mixed thyroid lesions, the liquid content can shrink, either spontaneously or after a FNA procedure, whereas in benign solid lesions, a iatrogenic injury can trigger venous thrombosis and/or internal bleeding resulting in fibrosis and degenerative histological changes [49,50]. In such instances, even FNA may produce variable results, ranging from benign findings to atypia of undetermined significance (equivalent to TIR3A of the ICCRCT classification) and nondiagnostic specimens [49]. As 32 patients with benign thyroid nodules and an inconclusive cytological diagnosis, have been subjected to recurrent FNA procedures in our surgical series (data not shown), one could hypothesize that the preoperative prevalence of suspicious US features in these cases might have been related, at least partially, to FNA-induced injury and degeneration.
To date, contrasting data about a gender predominance in thyroid cancer have been issued. Although thyroid nodules are more common in women than men [3], and so thyroid cancer occurrence [4][5][6], male gender might be considered an independent risk factor for malignancy in patients with nodular thyroid disease [3,51]. However, our findings confirm the similar female-to-male ratio (approximately 3:1) in both benign and malignant thyroid nodules recently evidenced in a retrospective cohort of Asian ethnicity [45]. Furthermore, no significant gender-based differences were observed regarding the performance of US risk stratification systems in predicting thyroid cancer. The diagnostic performance of ACR-TIRADS and AACE/ACE/AME US categories in predicting a high-risk cytological diagnosis (≥TIR3B) at FNA assessment was significantly improved by considering younger age and male gender as independent risk factors for malignancy in other logistic regression models [33]. Discrepancies between results may arise from the fact that the high-risk indeterminate TIR3B cytological category was significantly and independently associated with a low probability of malignancy at final histological examination in our logistic regression analyses, in contrast to those lacking a complete validation of cytological outcomes [33]. In this respect, it should be noted that, in our work, thyroid cancer rates for the high-risk indeterminate TIR3B cytological category were in remarkable agreement from those estimated by the ICCRCT classification [12], even after excluding NIFTP from malignant cases.
The main strength of our study is the relatively large patient population subjected to thyroid surgery, that underwent a comprehensive preoperative US and FNA evaluation from a single tertiary level referral center for endocrine disorders. This facilitates the assessment of rigorous cytohistological correlations and limits the interobserver variations in measuring thyroid nodules commonly observed under routine clinical practice conditions [52]. However, some important limitations of this work should be outlined, including its retrospective design and lack of blinding for data input, which may have caused a selection bias. The study sample may not be representative of the entire Calabrian population with nodular thyroid disease, given that only people subjected to surgery and preoperative FNA and US assessments at our Institution were included. This can be particularly relevant in case of indeterminate cytology nodules, as surgical management may depend upon other patient's specific factors (e.g., presence of contralateral nodules, hypothyroidism, familial history of thyroid cancer, fluorodeoxyglucose avidity on positron emission tomography scan), [9] beside age, gender, and nodule-specific US appearance. As not available for the majority of patients, neither these clinical details, nor other biochemical parameters, including serum TSH and anti-thyroid antibody status, which may help in the preoperative diagnosis of thyroid cancer and management of thyroid nodules [53], were analyzed. More importantly, given that surgical indications were not based on size criteria, nodules of all size were included in this study. As surgical excision should be considered for growing benign lesions causing compressive symptoms or cosmetic concerns, this may have biased our analyses.
Finally, in view of their retrospective nature, our results are preliminary, and the predictive values of a measured difference between L and AP diameters >5 mm will need to be externally validated in a prospective cohort.

Conclusions
In this large, retrospective, Southern Italian patient population subjected to surgery due to nodular thyroid disease, thyroid cancer rates and histological subtypes were in agreement with those expected from the ICCRTC classification. Overall, patient's age and several conventional suspicious US features-along with the ATA, ACR-TIRADS, and AACE/AME/ACE risk categories-were able to predict malignancy at final histological examination. However, controversy surrounds the diagnostic performance of these US risk stratification systems for the detection of thyroid cancer in the subgroup of nodules with indeterminate cytology, raising doubts on their significance in patient management and suggesting their use only to set the thresholds for FNA. A measured difference between L and AP diameters >5 mm might represent an adjunctive practical tool for ruling out malignancy, with the potential to reduce the number of unnecessary thyroidectomies. Funding: This publication is co-financed with the support of the European Commission, FESR FSE 2014-2020 and Regione Calabria. The European Commission and Regione Calabria's support for the production of this publication do not constitute an endorsement of the contents, which reflect the views only of the authors, and cannot be held responsible for any use that may be made of the information contained therein.