Comparison of Liquid-Based Preparations with Conventional Smears in Thyroid Fine-Needle Aspirates: A Systematic Review and Meta-Analysis

Simple Summary We compared the diagnostic accuracy of conventional smears and liquid-based preparations for detecting thyroid lesions using fine-needle aspiration cytology. We reviewed 15,861 samples from 17 studies. There was no significant difference between conventional smears and liquid-based preparations in terms of diagnostic accuracy or the proportion of inadequate smears. SurePath outperformed ThinPrep in terms of diagnostic accuracy among the liquid-based preparations. Recommendations for one method over another should take cost, feasibility, and accuracy into account, necessitating additional research. Abstract Background: To compare conventional smears (CSs) and liquid-based preparations (LBPs) for diagnosing thyroid malignant or suspicious lesions. Methods: Studies in the PubMed, SCOPUS, Embase, Web of Science, and Cochrane database published up to December 2023. We reviewed 17 studies, including 15,861 samples. Results: The diagnostic odds ratio (DOR) for CS was 23.6674. The area under the summary receiver operating characteristic curve (AUC) was 0.879, with sensitivity, specificity, negative predictive value, and positive predictive value of 0.8266, 0.8668, 0.8969, and 0.7841, respectively. The rate of inadequate specimens was 0.1280. For LBP, the DOR was 25.3587, with an AUC of 0.865. The sensitivity, specificity, negative predictive value, and positive predictive value were 0.8190, 0.8833, 0.8515, and 0.8562. The rate of inadequate specimens was 0.1729. For CS plus LBP, the AUC was 0.813, with a lower DOR of 9.4557 compared to individual methods. Diagnostic accuracy did not significantly differ among CS, LBP, and CS plus LBP. Subgroup analysis was used to compare ThinPrep and SurePath. The DORs were 29.1494 and 19.7734. SurePath had a significantly higher AUC. Conclusions: There was no significant difference in diagnostic accuracy or proportion of inadequate smears between CS and LBP. SurePath demonstrated higher diagnostic accuracy than ThinPrep. Recommendations for fine-needle aspiration cytology should consider cost, feasibility, and accuracy.


Introduction
Thyroid nodules are predominantly benign but exhibit a prevalence of ~4-7% in the general population [1].Papillary thyroid carcinoma, the most frequent among malignant lesions, has been increasing in prevalence [2,3].Due to the extensive vascularization of the thyroid, histological biopsies are challenging to perform.Consequently, fine-needle aspiration cytology (FNAC) has become the primary minimally invasive diagnostic method for these nodules [4][5][6].
Conventional smear (CS) is a common method of FNAC for thyroid lesions [7].It is recognized for its simplicity and convenience [8].Furthermore, it is relatively safe, repeatable, and low risk [6,7,9,10].However, CS can report variable results, depending on the uneven thyroid tissue samples or cytopathologist's experience [8,10,11].Artifacts may also arise during the drying of specimens, and results can vary by technician [9].In addition, the presence of fibrosis and cystic lesions can result in poor cellularity [8].These limitations can lead to a ~50% increase in inadequate specimens, complicating accurate diagnoses by pathologists [12].
The liquid-based preparation (LBP) method is a novel diagnostic approach in FNAC and is extensively used for breast and salivary gland examinations [1,13,14].Introduced in 1996 as an alternative to the traditional Papanicolaou smear, LBP aims to standardize samples by minimizing artifacts and errors inherent in CS [1,8,10,15].Two commonly used kits are ThinPrep (Hologic, Marlborough, MA, USA) and SurePath (BD Diagnostics-TriPath Imaging, Burlington, NC, USA).LBP involves collecting aspirates in a special fixative and employing an automated machine to reduce cell debris, inflammatory cells, red blood cells, and artifacts, thus producing a uniformly distributed Papanicolaou smear slide [8,9].Through processes such as homogenization, vacuum application, and sedimentation, it provides well-preserved sample cells against a clean background [8,10].
However, the effectiveness of LBP for diagnosing thyroid lesions, where cell cluster shape and background are crucial, remains debatable [8].Few studies and reviews have compared CS and LBP, and many exhibit a bias toward LBP, with varied criteria for evaluating sensitivity and specificity [8, 10,13,14,[16][17][18].We performed a comparative metaanalysis of the diagnostic accuracy and rate of inadequate smears (RISs) between CS and LBP in FNAC of malignant or suspicious thyroid lesions, incorporating the latest research.In addition, we conducted subgroup analyses comparing two common LBP kits, ThinPrep and SurePath.

Study Protocol and Registration
This systematic review and meta-analysis followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines [19] and was conducted in accordance with recommendations for optimal searches of the literature in systematic reviews within the field of surgery [20].The study protocol was prospectively registered on the Open Science Framework (https://osf.io/zj4hv/,accessed on 11 December 2023).

Literature Search Strategy
Clinical studies were sourced from PubMed, SCOPUS, Embase, Web of Science, and the Cochrane Central Register of Controlled Trials up to December 2023.The search terms included 'thyroid gland', 'fine-needle aspiration', 'fine-needle aspiration biopsy', 'cytology', 'cytopathology', 'conventional smear', 'direct smear', and 'liquid-based preparation'.We also reviewed the references of identified articles to ensure no relevant studies were overlooked.Two independent reviewers scrutinized all abstracts and titles for eligible studies, excluding those unrelated to the diagnosis of thyroid malignancies or suspicious lesions through cytologic examination based on fine-needle aspiration and confirmed by surgical histologic examination.

Selection Criteria
The inclusion criteria were patients undergoing fine-needle aspiration biopsy for thyroid lesions, prospective or retrospective studies, studies comparing the diagnostic accuracy of CS and LBP against surgical histologic findings, and the availability of data for sensitivity and specificity analysis.The exclusion criteria included case reports, review articles, studies on other head and neck lesions such as neck lymph nodes or neck masses, and data not applicable for assessing the diagnostic value of imaging studies.The search strategy was summarized in a flow diagram to screen studies selected for the meta-analysis (Figure 1).articles, studies on other head and neck lesions such as neck lymph nodes or neck masses, and data not applicable for assessing the diagnostic value of imaging studies.The search strategy was summarized in a flow diagram to screen studies selected for the meta-analysis (Figure 1).

Data Extraction and Risk of Bias Assessment
Among all studies included in the meta-analysis, studies published after 2010 evaluated thyroid malignant lesions using the Bethesda system of reporting thyroid cytopathology.Other included studies that did not use the Bethesda system of reporting thyroid cytopathology were also compared, including suspicions for malignancy, malignant, nondiagnostic, or inadequate lesions evaluated by the Bethesda system.Because the cytologic categories, except malignant lesions, were different for each included study, it was difficult to evaluate benign and atypia lesions.Therefore, it was difficult to expect malignancy risk in the different diagnostic classes, and we summarized the numbers of excluded categories (Supplementary Table S1).
The DOR was calculated as (true positive (TP)/false positive (FP))/(false negative (FN)/true negative (TN)) to assess diagnostic accuracy with 95% confidence intervals (CIs) using random-effects models that accounted for both within-and between-study variation.The DOR values range from 0 to infinity, with higher values indicating better diagnostic performance.A value of 1 suggests that the test provides no diagnostic advantage.The sROC is preferred for meta-analyses of studies reporting sensitivity and specificity

Data Extraction and Risk of Bias Assessment
Among all studies included in the meta-analysis, studies published after 2010 evaluated thyroid malignant lesions using the Bethesda system of reporting thyroid cytopathology.Other included studies that did not use the Bethesda system of reporting thyroid cytopathology were also compared, including suspicions for malignancy, malignant, nondiagnostic, or inadequate lesions evaluated by the Bethesda system.Because the cytologic categories, except malignant lesions, were different for each included study, it was difficult to evaluate benign and atypia lesions.Therefore, it was difficult to expect malignancy risk in the different diagnostic classes, and we summarized the numbers of excluded categories (Supplementary Table S1).
The DOR was calculated as (true positive (TP)/false positive (FP))/(false negative (FN)/true negative (TN)) to assess diagnostic accuracy with 95% confidence intervals (CIs) using random-effects models that accounted for both within-and between-study variation.The DOR values range from 0 to infinity, with higher values indicating better diagnostic performance.A value of 1 suggests that the test provides no diagnostic advantage.The sROC is preferred for meta-analyses of studies reporting sensitivity and specificity pairs.As the discriminatory power of a test increases, the sROC curve approaches the top left corner in the ROC space, where sensitivity and specificity both equal 1 (100%) [42].The AUC, ranging between 0 and 1, reflects test performance quality; values between 0.90 and 1.0 are considered excellent, 0.80-0.90are good, 0.70-0.80 are fair, 0.60-0.70 are poor, and 0.50-0.60 are considered failures [43].
Data extracted from the studies included the number of patients, the correlations among scores in endoscopy and computed tomography, and TP, TN, FP, and FN for AUC and DOR calculations.The Quality Assessment of Diagnostic Accuracy Studies version 2 tool was employed to evaluate methodological quality and risk of bias [44].

Statistical Analysis and Outcome Measurements
The 'R' statistical software (Version 4.3.2) (R Foundation for Statistical Computing, Vienna, Austria) was used for meta-analysis.Homogeneity analyses employed the Q statistic to assess heterogeneity.Subgroup analyses were conducted using different types of imaging studies.Forest plots were used to depict sensitivity, specificity, and sROC curves.Begg's funnel plot and Egger's linear regression test were performed to evaluate potential publication bias.

Search and Study Selection
In total, 17 studies, including 15,861 samples, were included in the analysis (Figure 1).The characteristics of the studies are detailed in Table 1, and the bias assessment results are given in Table 2. Egger's test was significant (p < 0.05), indicating no apparent bias in the included studies, as suggested by Begg's funnel plot (Figure 2).
While there were no statistically significant differences in diagnostic accuracy and RIS among CS, LBP, and their combination, CS plus LBP appeared to have a relatively While there were no statistically significant differences in diagnostic accuracy and RIS among CS, LBP, and their combination, CS plus LBP appeared to have a relatively lower diagnostic accuracy compared to CS and LBP individually.Conversely, the combination of CS with LBP tended to reduce the RIS (Supplementary Table S2).

Subgroup Analysis of Diagnostic Accuracy According to the Methods of LBP
Several LBP kits were included in the enrolled comparative studies, including Thin-Prep, SurePath, CellPrepPlus, and an unspecified tool.Among these, ThinPrep and SurePath are the most commonly used.A subgroup analysis was conducted to determine which method is more accurate for diagnosing thyroid malignancies or suspicious lesions.
There were no statistically significant differences in diagnostic accuracy between CS, ThinPrep, and SurePath (Supplementary Table S3).However, when comparing the two kits (ThinPrep and SurePath), significant differences in the AUC (0.791 vs. 0.841; p = 0.019) were observed, suggesting that SurePath might be more accurate.

Discussion
FNAC of thyroid lesions is a classic, safe, and meaningful test, playing an important role in diagnosing thyroid lesions and guiding treatment [9].However, the efficacy and superiority of CS and LBP for FNAC in diagnosing thyroid lesions remain contentious.We analyzed the diagnostic accuracy of CS and LBP by comparing DOR, sensitivity, specificity, and AUC.
Our findings revealed no significant difference in diagnostic accuracy between CS, LBP, and the combination of CS with LBP.The diagnostic accuracy of CS and LBP was similar, while the accuracy of combining CS with LBP was notably lower.Sensitivity, specificity, and negative predictive value were also lower when combining CS with LBP compared to CS and LBP individually, but the differences were not statistically significant.Previous studies have reported no significant difference in diagnostic accuracy between CS and LBP [10,30,40,50].Sensitivity has varied, reported as 78.9-93.6% for CS and 65.9-93.9%for LBP [25,26,30].
In previous studies, combining CS with LBP has been shown to reduce unnecessary thyroidectomies.LBP serves as a useful adjunct diagnostic tool for CS to identify malignant or suspicious thyroid lesions [9,46].The rate of non-diagnostic results decreased when CS was combined with LBP, although not significantly, compared to CS alone [51].However, in our study, combining CS with LBP resulted in relatively low diagnostic accuracy.Rossi et al. noted that slide adequacy assessed via CS indicated an increase in the non-diagnostic rate when using CS and LBP together [52].In LBP, cells are preserved in a solution, preventing the real-time determination of sample adequacy [23,31].Nonetheless, LBP can result in a relatively lower non-diagnostic rate due to a clearer background and fewer drying artifacts, provided that CS is not employed for on-site adequacy evaluation [51].If slide cellularity is insufficient, additional slides can be utilized [7,30].The combination of CS and LBP enables efficient slide preparation without compromising slide adequacy [53,54].
Our subgroup analysis of LBP kits revealed no significant differences in diagnostic accuracy between CS, ThinPrep, and SurePath.While SurePath had a lower DOR, its AUC was significantly higher, and there were no significant differences in the DOR, sensitivity, specificity, negative predictive value, and positive predictive value.A previous study reported that SurePath or ThinPrep achieved similar or marginally improved sensitivity and specificity compared to CS [8].However, most studies on SurePath have been conducted in Belgium or Korea, limiting their generalizability.Furthermore, other studies have reported high sensitivity and specificity for both CS and SurePath [55].Further studies on different LBP kits and in various countries could enhance the generalizability of the results.
Regarding the RIS, previous studies have reported variability in LBP, ranging from 10% to 25% [24,38,56].One study observed better sample adequacy in LBP compared to CS [8], while another indicated a higher inadequacy rate for LBP than CS [31].Our findings are similar to a previous study that found that combining CS with LBP resulted in a lower RIS compared to using either method alone [5], although this was not statistically significant.In LBP, the RIS may increase due to cell dilution in suspension media or loss of colloids during processing [9].Repeated processing using LBP could mitigate this, enhancing sample adequacy and diagnostic accuracy [56].The accumulation of clinical data and a learning curve are essential to improve the adequacy of LBP samples.As a newer technology compared to CS, LBP requires enhanced technical skills, such as syringe cleaning, to address issues such as low levels of cytoplasm in samples.In addition, the learning curve for cytopathologists, particularly in recognizing colloids and follicles, must be improved [10].It is also important to consider the potential unclear effects of preservative solutions and artifacts that reduce inflammatory cells in LBP [10].
In previous studies, ease of interpretation has generally not correlated with the RIS.Only 3-5% of studies have evaluated LBP as being good for ease of interpretation compared to CS [9,38].Moreover, studies that have assessed inadequate specimen rates have mainly focused on the SurePath or ThinPrep kits; more research is needed on other new LBP tests and technologies.
In our study, there were no significant differences between CS and LBP, and combining CS with LBP did not yield better results.Subgroup analyses also suggested that using CS or LBP alone might be preferable, with SurePath being the recommended choice when using LBP.However, the advantage of the combination could be considered in cases where specimen collection is challenging, as the RIS was lower, although not statistically significant.
Furthermore, cytomorphological differences between LBP and CS may vary in papillary, anaplastic, and medullary carcinoma.Papillary carcinoma can exhibit diverse cell arrangements [57,58].Although CS and LBP do not allow for the detailed observation of tissue structure, understanding the clinical significance of morphological features is important [6].Adding LBP can help distinguish between benign and malignant lesions due to better nuclear observation in a clear background [51].Therefore, immunocytochemical and molecular studies should be concurrently considered for malignant or suspicious lesions as they can provide additional diagnostic assistance.If the FNAC results are uncertain, molecular testing for mutations such as BRAF and RAS can be useful [59].Nuclei remain stable for up to 6 months in LBP preservative solution, potentially ensuring high reliability in mutation testing [7,60].
This study had several limitations.First, statistical heterogeneity was high, which is common in pathological studies [61], but the sampling methods and LBP techniques were not uniformly represented.The heterogeneity in the subgroup analyses for SurePath and ThinPrep might have stemmed from varied study designs and differences in LBP proficiency among examiners.The presence or absence of a cytopathologist and the use of different instruments and ultrasonography in the FNAC process also contributed to variability.In order to increase diagnostic accuracy, using the Bethesda Reporting System with ultrasonography, other appropriate diagnostic criteria could have been applied, such as Thyroid Imaging Reporting and Data System.Second, histologic follow-up was not included in the analysis, potentially limiting the evaluation, as most cases analyzed only initial diagnoses without considering final pathological diagnoses or modified FNAC diagnoses post-surgery.Further studies that incorporate histological follow-up are necessary.Third, the retrospective nature of several studies could have introduced bias, as non-diagnostic nodules were excluded after surgery.Cases suspected of follicular neoplasm or those without surgical intervention for non-diagnostic lesions may have been omitted.Fourth, CS, LBP, and their combination might not have been performed on the same thyroid lesion.Fifth, because the cytologic categories were different for each included study, it was difficult to evaluate benign and atypia lesions.Further studies evaluating malignancy risk for benign and atypia lesions in included studies with the same cytologic category are needed.Finally, most studies were from the United States, Europe, and Korea, possibly introducing bias due to limited racial diversity.
To supplement the accuracy and feasibility of CS and LBP, repeated processing is required, along with improving sample adequacy and diagnostic accuracy.The accumulation of clinical data and a learning curve are critical for improving the adequacy of LBP samples.LBP necessitates more advanced technical skills, and the learning curve for cytopathologists needs to be improved.Our subgroup analyses indicated that using CS or LBP alone may be preferable, with SurePath being the recommended option when using LBP.Although not statistically significant, the combination's benefit might be taken into account in situations where collecting specimens is difficult.

Conclusions
There were no significant differences in diagnostic accuracy and RIS among CS, LBP, and their combination.While combining CS and LBP resulted in lower diagnostic accuracy and a decreased RIS, CS and LBP demonstrated similar accuracy.There were no significant differences in diagnostic accuracy among CS, ThinPrep, and SurePath.However, significant differences in the AUC suggest that the SurePath kit might be more accurate.Therefore, when choosing FNAC methods, cost, feasibility, and accuracy should all be considered.

Figure 1 .
Figure 1.Diagram of the selection of studies for meta-analysis.

Figure 1 .
Figure 1.Diagram of the selection of studies for meta-analysis.

Figure 4 .
Figure 4.The summary receiver operating characteristic curve of (a) all included studies, (b) conventional smear, (c) liquid-based preparations, and (d) combination of conventional smear with liquid-based preparations.Thick curve line (summary receiver operating characteristic curve), thin circular line (95% confident region), and small circle (summary estimate).

Figure 4 .
Figure 4.The summary receiver operating characteristic curve of (a) all included studies, (b) conventional smear, (c) liquid-based preparations, and (d) combination of conventional smear with liquid-based preparations.Thick curve line (summary receiver operating characteristic curve), thin circular line (95% confident region), and small circle (summary estimate).

Table 1 .
The characteristics of the included studies.

Table 2 .
Individual non-randomized controlled trial methodological quality.
A star rating system was used to indicate the quality of a study, with a maximum of nine stars.A study could be awarded a maximum of one star for each numbered item within the selection and exposure categories.a : Selection (4 items): adequacy of case definition; representativeness of the cases; selection of controls; and definition of controls.b : Comparability (1 item): comparability of cases and controls on the basis of the design or analysis.c : Exposure (3 items): ascertainment of exposure; same method of ascertainment for cases and controls; and non-response rate (same rate for both groups).Cancers 2024, 16, x 7 of 18