In this prospective study, 200 consecutive patients were included (167 women, 33 men, average age = 53.5 years, range = 12–88 years) who were referred to the Department of Radiology of the University of Pécs, for thyroid US from 2019 February to 2019 June and deemed suitable regarding the study. The exclusion criteria included negative thyroid US, diagnoses of anatomical variants, and diagnoses of non-strictly thyroid related pathologies (e.g., parathyroid adenoma, adjacent abscess, adjacent malignancy of other origin, etc.), since the CAD system used in the present study (see CAD analysis) was developed to analyze only thyroid lesions. Further exclusion criteria included refusal of FNAB, and non-diagnostic or inconclusive FNAB (i.e., Thy 1, Thy 3, and Thy 4 according to the Bethesda system for reporting thyroid cytopathology [44
]) without the possibility of re-biopsy or surgery until the preparation of the manuscript. The reason for excluding these cases was to be able to clearly dichotomize nodules as “benign” or “requiring surgery/malignant” (see diagnosis definitions) as the study target outcome. Figure 1
depicts the flow chart of the study population selection.
All patients who underwent FNAB signed the general institutional informed consent regarding the advantages and risks of FNAB and the possible use of anonymized data for research purposes. The institutional review board approved the use of anonymized patient data in support of this study and waived the need for additional informed consent, since the study did not burden patients with other additional procedures than necessary based on the present clinical recommendations [8
] (Code: No. 7751-PTE 2019. date: 10 January 2019).
2.2. Ultrasonography (US) Examination and Fine Needle Aspiration Biopsy (FNAB), Diagnosis Definitions
All US examinations were performed using a high-end, real-time US system (RS85 A; Samsung Medison Co. Ltd., Seoul, Korea) and a 3–12 MHz linear probe at a fixed frequency of 10 MHz. Patients were examined by K.M., a radiologist specializing in head and neck radiology with over ten years of experience regarding diagnostic and interventional thyroid US.
Standard thyroid US examination was performed with all patients in a supine position and their neck in hyperextension. Neck lymph node regions, major vessels of the neck, and major salivary glands were also scanned; however, their findings were not included in the study.
If a thyroid nodule was present, the radiologist evaluated its US morphological features presented in Table 1
. Based on these features, a K-TIRADS score (Korean Thyroid Imaging Reporting and Data System) was assigned (2 = benign, 3 = low suspicion, 4 = intermediate suspicion, and 5 = high suspicion) [8
]. The reason for applying this score system was that since it is integrated in the applied CAD (see CAD analysis), direct comparisons could be performed. A nodule was regarded possibly benign with a K-TIRADS score of 2 or 3, while possibly malignant when associated with a score of 4 or 5.
Additional nodule features were asserted, such as “coarse calcification,” if macrocalcification was present with the largest diameter exceeding 50% of the nodule’s largest diameter, and “inspissated colloid cystic nodule” for well-circumscribed, completely avascular, not entirely anechoic nodules with colloid particles producing a comet tail artifact, which were completely evacuated during aspiration.
If a patient was afflicted with more than one nodule, one showing the highest risk of malignancy, or in the case of more nodules with the same malignancy risk, the largest nodule was included in the study. An axial plane B-mode image of all included nodules at their largest diameters was saved for further CAD analysis.
FNAB indication was based on the present international guidelines [8
] and performed by K.M. In case of discrepancy, the guideline indicating FNAB was applied. US-guided FNAB was performed using the parallel needle to probe technique with a 22 G needle using 10 mL syringe and Cameco biopsy gun; the nodules were panned across to sample their possibly largest portion. The aspirated material was rapidly expressed onto two glass slides, and two smears were created using the one-step smear method. One slide was fixed in 95% ethanol for H&E staining, and one was air-dried for May–Grünwald Giemsa staining. The rest of the obtained material was rinsed in formaldehyde solution for processing as a cell block. Aspiration was repeated if the material macroscopically appeared to be scanty or bloody. The cytological specimen was submitted to the cytopathology laboratory along with all relevant clinical and US information. The cytological analysis was performed by a cytopathologist (E.K.) with over 20 years of experience in cytopathology. Results were classified according to the Bethesda system regarding reporting thyroid cytopathology [44
Patients with thyroiditis were included in the study. Thyroiditis criteria in the present study included clinically [46
] and radiologically [37
] established thyroiditis, or thyroiditis substantiated by FNAB.
In patients suffering from thyroiditis, the following categories were specified: (a) focal inhomogeneity, proven not to be a nodule by biopsy or if it was completely unchanged compared with previous examinations within at least a 2-year timespan and was consistently regarded to be thyroiditis related focal inhomogeneity by the examiner; (b) pseudonodule substantiated through biopsy or was completely unchanged compared with previous examinations of at least a 2-year timespan and was consistently regarded as a pseudonodule by the examiner; (c) true nodule in addition to thyroiditis, which was managed in the same way as nodules without thyroiditis. In reference to focal in homogeneities and pseudonodules, the examiner assigned a K-TIRADS score of 1 (no nodule) for further statistical analyses.
An axial plane B-mode image representing these entities at their largest diameters were also saved for further CAD analysis to assess the accuracy of nodule detection in thyroiditis.
Nodules were regarded malignant or requiring surgery (referred to as “malignant/surgery” in further texts) if the cytological result was suspicious regarding malignancy (Thy 5), or malignant (Thy 6), and/or malignancy was evident in the surgical specimen. A benign nodule was diagnosed when any of the following criteria were met: (i) confirmation of benign status in a surgical specimen; (ii) benign or cystic cytology of an FNAB (Thy 1c or Thy 2); (iii) benign traits including spongiform or partially cystic nodules with comet tail artifacts, or pure cysts evident on US; (iv) low suspicion (K-TIRADS 3) nodules under 15 mm diameter, which were completely unchanged compared with previous examinations of at least a 2-year timespan, and no clinical poor prognostic factors were present, and therefore, FNAB was not indicated.
2.4. Statistical Analysis
Regarding statistical analysis, data were analyzed using MedCalc Statistical Software, version 18.11.3 (MedCalc Software bvba, Ostend, Belgium, https://www.medcalc.org
; 2019) [49
First, we aimed to identify US features and entities associated with CAD system misdiagnosis; therefore, we selected cases in which the radiologist’s diagnosis was correct and created two subgroups: one in which the CAD was correct and another in which the CAD was incorrect. Between these groups (CAD correct vs. CAD incorrect), the rates of entities such as focal inhomogeneity related to thyroiditis and pseudonodule related to thyroiditis (as defined earlier) were statistically compared using the comparison of the two rates tool [50
]. Next, only cases of true nodules were kept in the CAD correct and CAD incorrect groups (focal inhomogeneity related to thyroiditis and pseudonodule related to thyroiditis cases were excluded), and the rates of nodule US features assured by the radiologist (coarse calcification, macrocalcification without coarse calcification, inspissated colloid cystic nodule, true nodule related to thyroiditis, composition, echogenicity, orientation, margin, spongiform state, shape, and microcalcification) were statistically compared including the comparison of the two rate tools [50
Secondly, to assess the effect of these entities and US features related to CAD system misdiagnosis regarding the overall diagnostic performance, the receiver operating characteristic (ROC) curves with K-TIRADS scores provided by the examiner or CAD as variables and benign or malignant/surgery diagnosis as a classification variable in the following groups were compared using the comparison of independent ROC curves with the methodology by DeLong et al. [51
]: (a) total cohort, human rating, (b) total cohort, CAD rating, (c) a subgroup derived from the total cohort by excluding all cases in which the entities or US features identified to be significantly associated with CAD system misdiagnosis were present (=“screened subgroup”), CAD rating, and (d) the same screened subgroup, human rating.
Third, among these four groups, the sensitivity, specificity, and accuracy were compared using the McNemar test in reference to dependent sample comparisons and Pearson’s Chi-squared test for independent samples.
All these steps regarding statistical analysis were additionally run in the group including only those patients who had an FNAB (FNAB-only group).
As an ancillary step, the number of cases in which the radiologist’s diagnosis was incorrect but the CAD system diagnosis was correct was calculated.
Tests resulting with a p-value of <0.05 were considered statistically significant.