Radiologist versus Non-Radiologist Detection of Lymph Node Metastasis in Papillary Thyroid Carcinoma by Ultrasound: A Meta-Analysis

Papillary thyroid carcinoma (PTC) is the most common thyroid cancer worldwide and is known to spread to adjacent neck lymphatics. Lymph node metastasis (LNM) is a known predictor of disease recurrence and is an indicator for aggressive resection. Our study aims to determine if ultrasound sonographers’ degree of training influences overall LNM detection. PubMed, Embase, and Scopus articles were searched and screened for relevant articles. Two investigators independently screened and extracted the data. Diagnostic test parameters were determined for all studies, studies reported by radiologists, and studies reported by non-radiologists. The total sample size amounted to 5768 patients and 10,030 lymph nodes. Radiologists performed ultrasounds in 18 studies, while non-radiologists performed ultrasounds in seven studies, corresponding to 4442 and 1326 patients, respectively. The overall sensitivity of LNM detection by US was 59% (95%CI = 58–60%), and the overall specificity was 85% (95%CI = 84–86%). The sensitivity and specificity of US performed by radiologists were 58% and 86%, respectively. The sensitivity and specificity of US performed by non-radiologists were 62% and 78%, respectively. Summary receiver operating curve (sROC) found radiologists and non-radiologists to detect LNM on US with similar accuracy (p = 0.517). Our work suggests that both radiologists and non-radiologists alike detect overall LNM with high accuracy on US.


Introduction
Thyroid cancer is the most common endocrine malignancy [1]. Papillary thyroid carcinoma (PTC) is the most common thyroid cancer, accounting for 90% of thyroid cancer diagnoses [2,3]. Though sometimes described as an indolent disease, PTC often metastasizes into adjacent neck lymphatics, increasing the risk of disease recurrence [4]. The risk of PTC recurrence can be as high as 30% [5], and accordingly the determination of factors that can predict PTC aggressiveness is critical. One known independent risk factor for local PTC recurrence is lymph node metastasis (LNM) [5,6].
Cervical LNM is found in 20-50% of PTC patients [7]. Most commonly, lymphatic spread carries PTC metastases toward the central compartment of the neck (level VI). Metastasis is directly related to recurrence and mortality, making an accurate and efficient diagnosis of LMN paramount in managing these patients [8,9]. Furthermore, both the American Thyroid Association (ATA) and the British Thyroid Association (BTA) guidelines often recommend prophylactic central neck dissection with total thyroidectomy as the standard for operative PTC management [10,11]. Current ATA guidelines recommend ultrasound (US) as the first-line diagnostic technique in assessing LMN in PTC patients [11]. Though US is widely available and considerably cheaper than other imaging modalities, US diagnosis is operator-dependent and has been demonstrated to possess variable sensitivity and specificity [12,13]. Additionally, retrosternal, retropharyngeal, and mediastinal visualization may be challenging depending on the sonographer's level of training [14]. Importantly, accurate lymph node mapping allows for a targeted surgical approach, minimizing the area of neck dissection, decreasing mortality, and increasing optimal cosmetic outcomes [15].
Since US is common practice for patients with thyroid disease and yet an operatordependent technique, we thought to investigate the diagnostic value of US imaging performed by a radiologist, as opposed to performed by a non-radiologist, such as an ultrasound technician or surgeon.

Methods
This systematic review and meta-analysis followed the Preferred Reporting Items for Systematic Review and Meta-Analyses (PRISMA) guidelines for diagnostic test accuracy [16].

Literature Search
Multiple databases were searched in this meta-analysis including PubMed, Embase, and Scopus. The search terms were as follows: "thyroid" AND "lymph node metastasis" OR "LNM" AND "ultrasonography" OR "sonography" OR "ultrasound" OR "US." The search was performed on April 2022 and conducted without time or language restriction. All abstracts and the subsequent full texts were screened to determine the final articles.

Inclusion and Exclusion Criteria
Studies included in our analysis were those which were (1) cohort studies, case controls, or randomized controlled trials, (2) reporting pertinent parameters with respect to LNM detection on preoperative US in PTC patients, (3) confirmed by surgical pathology for the presence and/or absence of LNM. Importantly, each study must have reported sonographer qualification and diagnostic performance metrics such as sensitivity and specificity (or at least calculatable). Works reporting unoriginal work or not of the above-mentioned study types were excluded, including letters, opinions, editorials, case reports, singular abstracts, and reviews.

Data Extraction
Eligible articles were screened and reviewed by two investigators (P.P.I. and A.A.) and subsequently extracted. Any inconsistencies were settled by a senior author. Data relevant to the study were extracted, including study characteristics such as author, publication year, country and institution, study design, study period, and sample size. Importantly, the sonographer and their level of radiologic training (radiologist, US technician, or surgeon) and the number of lymph nodes detected were also included. Sensitivity and specificity were either extracted or manually calculated from the number of true positives, false positives, true negatives, and false negatives.

Statistical Analysis
Statistical analysis was performed using MetaDisc1.4 software (Unit of Clinical Biostatistics, Madrid, Spain) [17]. Sensitivity, specificity, likelihood ratios (LR), diagnostic odds ratio (DOR), and a 95% confidence interval (CI) were calculated. The area under the curve (AUC) was estimated for each group. Diagnostic accuracy measures were compared between studies with data recorded by radiologists and those by non-radiologists (US technicians and surgeons) using a student's t-test.
We quantified the heterogeneity using the I-square (I 2 ) and Chi-squared tests. A fixed-effects model was used to analyze the selected studies' consistency (I 2 < 50% and p > 0.05). If I 2 > 50% or p < 0.05, heterogeneity was present. Pooled estimates were performed using the random effects model if there was no obvious reason for heterogeneity. Possible heterogeneity caused by the threshold effect was tested. If there is a strong positive correlation between the logit of sensitivity and logit of 1-specificity (p < 0.05), assessed by Spearman's correlation coefficients, threshold effects were present. Furthermore, metaregression models were conducted to trace other heterogeneity sources according to the study design (retrospective versus prospective), sample size (400 patients or more versus less than 400), and year of publication (published in 2015 or more recently versus before). Regression diagnostic odds ratios (rDOR) were reported.

Literature Search
A total of 1654 unique articles (2423 total, 670 duplicates) were found using search terms previously mentioned. We excluded 1536 articles as they did not meet inclusion criteria. The remaining articles were reviewed in-depth and considered until the final number of studies was reached. A total of 25 studies were included in our meta-analysis. All studies were published between 2007 and 2022 and include works from Korea (11 studies), China (7 studies), the United States (5 studies), and Chile (1 study) [13,. Four works were published within the last two years (2020 and beyond), suggesting heightened interest in determining the accuracy of diagnostic imaging. The workflow of the literature search is depicted in Figure 1.
edicines 2022, 10, x. https://doi.org/10.3390/xxxxx www.mdpi.com/journal/biomedici fixed-effects model was used to analyze the selected studies' consistency (I 2 < 50% and 0.05). If I 2 > 50% or p < 0.05, heterogeneity was present. Pooled estimates were perform using the random effects model if there was no obvious reason for heterogeneity. Possi heterogeneity caused by the threshold effect was tested. If there is a strong positive cor lation between the logit of sensitivity and logit of 1-specificity (p < 0.05), assessed by Spe man's correlation coefficients, threshold effects were present. Furthermore, meta-regr sion models were conducted to trace other heterogeneity sources according to the stu design (retrospective versus prospective), sample size (400 patients or more versus l than 400), and year of publication (published in 2015 or more recently versus before). R gression diagnostic odds ratios (rDOR) were reported.

Literature Search
A total of 1654 unique articles (2423 total, 670 duplicates) were found using sear terms previously mentioned. We excluded 1536 articles as they did not meet inclusi criteria. The remaining articles were reviewed in-depth and considered until the fin number of studies was reached. A total of 25 studies were included in our meta-analys All studies were published between 2007 and 2022 and include works from Korea studies), China (7 studies), the United States (5 studies), and Chile (1 study) [13, Four works were published within the last two years (2020 and beyond), suggesti heightened interest in determining the accuracy of diagnostic imaging. The workflow the literature search is depicted in Figure 1.

Characteristics of the Study Population
The characteristics of the included studies are detailed in Table 1. Of 25 studies, 17 were retrospective in study design and eight were prospective. The overall study period included patients from 1993 to 2022. The total sample size consisted of 5768 patients with 10,030 lymph nodes analyzed. Radiologists performed ultrasounds in 18 studies, while nonradiologists performed ultrasounds in 7 studies, corresponding to 4442 and 1326 patients, respectively. All LNM diagnoses were confirmed by surgical pathology. In the study design column, "Pro" designates prospective, and "Retro" designates retrospective. US = Ultrasound.
After adjustment of study covariates including study design, US operator, sample size, and the year of publication, meta-regression analysis did not show any significant results. The rDOR based on the sonographer was 1.15 (95%CI = 0.77-1.72; p = 0.48) (Supplementary Table S2).

Discussion
Preoperative assessment of LNM in PTC patients is imperative in surgical planning and therefore directly impacts patient outcomes. Importantly, malignancy staging (i.e., TNM staging) is often more important than malignancy grading in determining patient prognosis. Since US is the most common and widely available imaging technique for the thyroid gland, its diagnostic accuracy, including sensitivity, specificity, DOR, and AUC has been studied extensively [33,42]. Yet, to our best knowledge, this is the first metaanalysis to investigate the diagnostic accuracy of overall LNM detection by US performed by radiologists and non-radiologists.
Ultrasound is currently the gold standard and first line of practice in preoperative assessment of PTC and in detecting LNM. It is easy to perform, widely available, low cost, and safe with no risk of radiation [43]. Previous studies have consistently determined sensitivity and specificity reports between 30-57% and 82-92%, respectively [33,44,45]. Our work corroborates the current literature, finding the overall sensitivity and specificity of LNM detection by US to be 59% and 85%, respectively. While subsequent sub-group analyses for central and lateral compartment LNM in our study is warranted, we found that further sub-stratification limited the patient cohorts significantly. Since LNM is an important independent predictor of patient prognosis, several studies have suggested adjunct imaging modalities, such as computed tomography (CT), to help improve the low sensitivity often seen in US to minimize the missed detection of true positives [46,47]. Though prophylactic lymph node dissection during thyroidectomy has been debated [48][49][50][51], complete and thorough surgical resection can positively affect patient survival [52,53]. In consequence, determining factors which could optimize the accuracy of LNM detection, such as the impact of US operator, is important in improving patient outcomes.
Traditionally, radiologists performed ultrasound prior to operative management. However, there has been a movement towards surgeon-preformed ultrasound as an extension of their operative management [54,55]. Oltmann et al. found that surgeons documented lymph node status more often than radiologists and that surgeon-performed US patients had less disease recurrence (0% versus 12%, p = 0.01) [56]. With respect to ultrasoundguided thyroid fine-needle aspiration (FNA), Graciano et al. reported no difference in efficacy when performed by radiologists or non-radiologists [57]. Other studies demonstrate that experience greater than seven years increases the positive predictive value and confidence of LNM detection [58]. While other meta-analyses have shown differences in diagnostic testing accuracy of US versus CT staging, our study is the first to search and analyze the literature for differences in US staging performed by radiologists compared to non-radiologists [45,59]. Our work found that US performed by radiologists were of similar diagnostic accuracy (p = 0.517) as those performed by non-radiologists. Though our study analyzes over 10,000 lymphatic nodules, our findings are not generalizable to the analysis of other cancers and warrants further study, such as randomized controlled trials, to further determine the role of US technicians in the non-radiologist grouping.
Detection of LNM on preoperative evaluation is critical in patients undergoing active surveillance management. Active surveillance management is the careful monitoring of patients with low-risk primary PTC (small size or without suspicious US features) by yearly or bi-yearly US and/or CT to detect thyroid nodule changes [60]. Importantly, LNM detection is an indication to immediately terminate active surveillance management and proceed with surgical treatment. Since early detection of cancer is vital in patient prognosis and allowing patients to maintain active surveillance management can increase mortality by as much as 130% [5], it is imperative that US diagnostic accuracy be optimized. Further work should elucidate other potentially-relevant factors such as patient body habitus [28].
Finally, we acknowledge several limitations of this study. Although the large sample size allowed for robust analyses, the majority of studies included were retrospective. Furthermore, sub-group analysis by cervical compartment was not feasible, as the four-way split (radiologist vs. non-radiologist as well as central vs. lateral LNM) limited the study population significantly. Additionally, studies took place in multiple countries. While this may lead to a more diverse patient population and greater generalizability, different training qualifications may exist for radiologists, US technicians, and surgeons across locations, which may provide for difficulty in comparing diagnostic testing parameters. In addition, the limited number of studies reporting US performance metrics of surgeons necessitated a non-radiologist versus radiologist comparison. Future works should look to determine the influence of surgeons and US technicians alone, as their trainings differ significantly. Another limitation in our study is the lack of studies reporting the readings of endocrinologists, leaving their detection accuracy unexplored. Finally, whether our findings are consistent with other cancer imaging studies is unknown and warrants further investigation.

Conclusions
Diagnostic accuracy of LNM detection on US performed by radiologists and nonradiologists were similar. Our work suggests that both radiologists and non-radiologists alike detect LNM with high accuracy on US.