Comparison of Diagnostic Test Accuracy of Cone-Beam Breast Computed Tomography and Digital Breast Tomosynthesis for Breast Cancer: A Systematic Review and Meta-Analysis Approach

Background: Cone-beam breast computed tomography (CBBCT) and digital breast tomosynthesis (DBT) remain the main 3D modalities for X-ray breast imaging. This study aimed to systematically evaluate and meta-analyze the comparison of diagnostic accuracy of CBBCT and DBT to characterize breast cancers. Methods: Two independent reviewers identified screening on diagnostic studies from 1 January 2015 to 30 December 2021, with at least reported sensitivity and specificity for both CBBCT and DBT. A univariate pooled meta-analysis was performed using the random-effects model to estimate the sensitivity and specificity while other diagnostic parameters like the area under the ROC curve (AUC), positive likelihood ratio (LR+), and negative likelihood ratio (LR−) were estimated using the bivariate model. Results: The pooled sensitivity specificity, LR+ and LR− and AUC at 95% confidence interval are 86.7% (80.3–91.2), 87.0% (79.9–91.8), 6.28 (4.40–8.96), 0.17 (0.12–0.25) and 0.925 for the 17 included studies in DBT arm, respectively, while, 83.7% (54.6–95.7), 71.3% (47.5–87.2), 2.71 (1.39–5.29), 0.20 (0.04–1.05), and 0.831 are the pooled sensitivity specificity, LR+ and LR− and AUC for the five studies in the CBBCT arm, respectively. Conclusions: Our study demonstrates that DBT shows improved diagnostic performance over CBBCT regarding all estimated diagnostic parameters; with the statistical improvement in the AUC of DBT over CBBCT. The CBBCT might be a useful modality for breast cancer detection, thus we recommend more prospective studies on CBBCT application.


Introduction
Breast cancer is the most commonly diagnosed type of cancer among women that has led to the cause of cancer death in women of all ages [1,2]. This mortality rate can be reduced drastically if those cancers are detected early [1]. Digital mammography (DM) has been a conventional tool for early breast cancer diagnosis [3,4]. Recent research on both randomized controlled trials and observational studies has indicated that regular screening

Materials and Methods
This systematic review and meta-analysis was prospectively registered at PROS-PERO with the registration number of CRD: 42020180192 [22]. The systematic review was performed by two independent reviewers (TEK and OAO or CZ and GY) using a well-established review protocol adapted from the Cochrane collaborative approach for evaluating diagnostic test accuracy [23] with Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guidelines [24], see Supplementary File S1. The two reviewers discussed the discrepancies between the two results, and then a more experienced third reviewer (XY or JZ or ML) was consulted if the interrater consensus was not reached. We searched for women who underwent breast imaging screening using either CBBCT or DBT, which reported the characterization of malignant and benign lesions with well-documented diagnostic accuracy. We searched separately because no available literature reported comparison studies on CBBCT and DBT for diagnostic or screening purposes. This search includes comparative, prospective and retrospective studies, and interrater consensus.

Data Sources and Search Strategy
PubMed, Inspec, Web of Science and Cochrane Central Register of Controlled Trials (CENTRAL) libraries were searched for relevant literature published from January 2015 up to and including December 2021. We used selected controlled terms extracted from different studies retrieved from each database to build the text words and subject terms as "breast computed tomography", "Sensitivity", "Specificity" for the CBBCT arm, and "Digital breast tomosynthesis", "Sensitivity", "Specificity" for CBBCT arm and DBT arm, respectively, as shown in the complete PRISMA search path (Figure 1). These selected controlled terms gave a wide representation for the review. In PubMed and CENTRAL databases, selected controlled terms were input as MeSH terms while in the Web of Science and Inspec, we used them as text words for detail see Supplementary File S2.
Sensors 2022, 22, x. https://doi.org/10.3390/xxxxx www.mdpi.com/journal/sensors December 2021. We used selected controlled terms extracted from different studies retrieved from each database to build the text words and subject terms as "breast computed tomography", "Sensitivity", "Specificity" for the CBBCT arm, and "Digital breast tomosynthesis", "Sensitivity", "Specificity" for CBBCT arm and DBT arm, respectively, as shown in the complete PRISMA search path (Figure 1). These selected controlled terms gave a wide representation for the review. In PubMed and CENTRAL databases, selected controlled terms were input as MeSH terms while in the Web of Science and Inspec, we used them as text words for detail see Supplementary File S2.

Eligibility Criteria
Studies were eligible for inclusion in this meta-analysis if they met eligibility criteria adapted from Cochrane diagnostic test accuracy protocol using PRISMA guidelines [24]. Literature was included in the study if it utilized dedicated CBBCT and DBT to detect breast cancer, with at least the sensitivity and specificity reported. The included studies were retrospective, prospective studies, an observer performance study, clinical trials, and comparative studies in different modalities. The exclusion criteria were studies that involved literature reviews, phantom or simulation studies, other radiation studies apart from CBBCT and DBT like radiotherapy and studies with computer-aided detection (CAD), i negative likelihood ratios are computed when they cannot be extracted [25], and other details of formulations of estimated diagnostic test accuracy parameters can be found in [26]. Additionally, the percentage of benign and malignant cases with a brief intervention description is included (Table 1).

Risk of Bias and Quality Appraisal
The quality of included studies was assessed using Quality Assessment of Diagnostic Accuracy Studies-Comparative (QUADAS-C), a tool for comparative diagnostic accuracy tests with different cohorts [27], a modified version of QUADAS-2 [28] to ensure appropriateness for comparing the two modalities. The domains assessed were patient selection, index tests, reference standard, flow and timing, and applicability. Two reviewers

Eligibility Criteria
Studies were eligible for inclusion in this meta-analysis if they met eligibility criteria adapted from Cochrane diagnostic test accuracy protocol using PRISMA guidelines [24]. Literature was included in the study if it utilized dedicated CBBCT and DBT to detect breast cancer, with at least the sensitivity and specificity reported. The included studies were retrospective, prospective studies, an observer performance study, clinical trials, and comparative studies in different modalities. The exclusion criteria were studies that involved literature reviews, phantom or simulation studies, other radiation studies apart from CBBCT and DBT like radiotherapy and studies with computer-aided detection (CAD), i.e., machine and deep learning application in diagnostic accuracy.
Additionally, a study that reported two or more hybrid modalities like DBT with DM or contrast-enhanced CBBCT (CE-CBBCT) with non-contrast CBBCT (NC-CBBCT) was excluded. However, if it reports both modalities separately, the data for the modality under consideration will be extracted and vice versa. Likewise, for multiple publications that reported the same study or sub-set, the most detailed study in terms of data availability was used.

Study Selection
Articles retrieved for both arms were manually sorted, and duplicates were removed using titles/abstracts, then followed by full text according to the predefined search criteria, and final eligible studies were selected.

Data Collection Process
A standardized extraction sheet was developed, and two independent blinded reviewers (TEK and OAO or CZ and GY) extracted the information needed and resolved the conflict by interrater consensus from eligible studies, which include: study type (prospective or retrospective studies), study clinical settings (diagnostic or screening), number of patients and mean age of the patients, diagnostic equipment model, mean glandular dose, number of radiologists that interpreted the index test and year of experience, sensitivity and specificity. The positive and negative likelihood ratios are computed when they cannot be extracted [25], and other details of formulations of estimated diagnostic test accuracy parameters can be found in [26]. Additionally, the percentage of benign and malignant cases with a brief intervention description is included (Table 1).

Risk of Bias and Quality Appraisal
The quality of included studies was assessed using Quality Assessment of Diagnostic Accuracy Studies-Comparative (QUADAS-C), a tool for comparative diagnostic accuracy tests with different cohorts [27], a modified version of QUADAS-2 [28] to ensure appropriateness for comparing the two modalities. The domains assessed were patient selection, index tests, reference standard, flow and timing, and applicability. Two reviewers performed an independent quality assessment, and the final result was based on consensus. The overall study quality is shown in Figure 2.

Data Analysis
A univariate meta-analysis was performed separately for sensitivity and specificity in both CBBCT and DBT to estimate the diagnostic accuracy of each modality using the random-effects model (RE) [29]. The primary outcomes were sensitivity, specificity and summary receiver operating characteristic (SROC) curve. We calculated point estimates and 95% confidence intervals (CI) for each study to ensure consistency in sensitivity and specificity. To plot the SROC curve, we used a bivariate meta-analysis of sensitivity and

Data Analysis
A univariate meta-analysis was performed separately for sensitivity and specificity in both CBBCT and DBT to estimate the diagnostic accuracy of each modality using the random-effects model (RE) [29]. The primary outcomes were sensitivity, specificity and summary receiver operating characteristic (SROC) curve. We calculated point estimates and 95% confidence intervals (CI) for each study to ensure consistency in sensitivity and specificity. To plot the SROC curve, we used a bivariate meta-analysis of sensitivity and specificity using R version 4.1.2 with RStudio version 2021.09.1 + 372 implementing "mada" and "meta", R-packages to estimate the AUC of SROC [30]. Additionally, secondary outcomes like positive likelihood and negative likelihood ratios were estimated using MetaDiSc 1.4 software [31]. Statistical heterogeneity between studies was evaluated with Cochran's Q test and the I 2 statistic [32]. For the Q statistic, values range 0-40% imply insignificant heterogeneity, 30-60% connote moderate heterogeneity, and 75-100% implies a considerable heterogeneity. Publication bias was evaluated and visualized by constructing a funnel plot [33]. The p-values were based on two-sided tests, and the p-value < 0.05 was considered statistical significance.
The CBBCT arm comprises five studies only, retrospective observers' studies [12,47], prospective study [48], and retrospective diagnostic study [11]. This majorly consists of comparison studies, i.e., CBBCT vs. DM [12,13], CBBCT vs. DM vs. US, or MRI [11,49]. All the studies reported both the sensitivity and specificity of the diagnostic equipment, while the AUC of SROC was estimated separately like that of the DBT arm. All the studies reported the number of benign and malignant cases, 80% of studies acquired data via the Koning Breast CT (KBCT 1000) model [11][12][13]49].

Quality Assessment and Publication Bias
In the DBT arm, one study reported a high risk of bias due to inappropriate exclusion and method of patient selection [47]. Two studies (11.8%) reported an unclear risk of bias because the diagnostic threshold was not specified, and no information on whether the readers were blinded to the result of clinical outcomes [34,44]. One study (6.7%) did not give enough information about the pathological findings and, if necessary, follow-up was made, thus providing an unclear risk of bias for a reference standard [40]. Three studies (17.6%) did not give details information if the patients received the reference standard or if the appropriate time interval between the reference standard and index test, thus providing an unclear risk of bias for flow and timing [34,40,51]. Additionally, eight studies (47.1%) had a high risk of bias for applicability concerns regarding patient selection as the criteria for selecting patients did not match exactly our review questions, three studies (17.6%) provided high risk and unclear risk of bias regarding applicability for index test, only one study (5.9%) gave unclear applicability concerns regarding reference standard. The risk of bias and applicability concern and reviewers' judgment about each domain for all the included study is shown in Figure 2. Likewise, for the CBBCT arm, none of the studies reported a high risk of bias, although the unclear risk of bias exists in patient selection, reference standard, and flow and timing in one study due to scanty information [12,48]. The overview of bias and applicability risk is shown in Figure 3. A visual assessment of funnel plots revealed asymmetrical distribution around inverted funnel for included studies of DBT which signifies publication bias which might be attributed to reporting bias [33], as shown in Figure 4. However, the likelihood of publication bias might also exist in the CBBCT arm due to the small number of studies included in the meta-analysis. More details about the risk of bias and applicability of concerns using QUADASS-2 assessment is shown in Figure 3.

DBT Meta-Analysis
A total of 17 studies with different observations on sensitivity, specificity, and AUC contributed to the meta-analysis of the DBT arm [10,[34][35][36][37][38][39][40][41][42][43][44][45][46][47][48][49]. The forest plot of sensitivity and specificity with point estimates of 95% confidence intervals across different studies are shown in Figure 5. The pooled sensitivity was 86.7% (95% CI: 80.3-91.2, I 2 = 89) and specificity is 87.0% (95% CI: 79.9-91.8, I 2 = 95). Since all the within studies had Higgins I 2 for both sensitivity and specificity above 75%, and the p-value of Cochran Q statistic is less than 0.05, which implies there is substantial heterogeneity. To show both practical and statistical significance between DBT and CBBCT modalities, the difference in sensitivity and specificity of these modalities were estimated, the result of the difference in effect size for sensitivity is 3% (p-value = 0.7622) and specificity is 16.4% (p-value = 0.0622). The effect size for DBT exceeded CBBCT by 3% and 15.3% for sensitivity and specificity, respectively, which indicate better performance for DBT. Although it is statistically is non-significant since both p-values are greater than 0.05. The pooled positive likelihood ratio ( ) is 6.28 (95% CI: 4.40-8.96, = 93), while the pooled negative likelihood ratio ( ) is 0.17 (95% CI: 0.12-0.25, = 92), as shown in Figure 6. The pooled AUC of SROC is 0.925, as shown in Figure 7a. To show both practical and statistical significance between DBT and CBBCT modalities, the difference in sensitivity and specificity of these modalities were estimated, the result of the difference in effect size for sensitivity is 3% (p-value = 0.7622) and specificity is 16.4% (p-value = 0.0622). The effect size for DBT exceeded CBBCT by 3% and 15.3% for sensitivity and specificity, respectively, which indicate better performance for DBT. Although it is statistically is non-significant since both p-values are greater than 0.05. The pooled positive likelihood ratio (LR + ) is 6.28 (95% CI: 4.40-8.96, I 2 = 93), while the pooled negative likelihood ratio (LR − ) is 0.17 (95% CI: 0.12-0.25, I 2 = 92), as shown in Figure 6. The pooled AUC of SROC is 0.925, as shown in Figure 7a.

CBBCT Meta-Analysis
A total of five different observation studies were included in the meta-analysis of the CBBCT arm; the summary of all necessary information is tabulated in Table 1. Pooled sensitivity with 95% confidence intervals across the studies is 83.7% (95% CI: 54.6-95.7, = 94); while the pooled specificity is 71.3% (95% CI: 47.5-87.2, = 94); as shown in Figure  8. There is substantial heterogeneity within studies for both sensitivity and specificity as the value of is higher than 75% and a p-value less than 0.05. Due to the small number of included studies, further subgroup analyses for evaluating a potential source of heterogeneity were not performed. The pooled positive likelihood ratio ( ) is 2.71 (95% CI: 1.39-5.29, = 95), while the

CBBCT Meta-Analysis
A total of five different observation studies were included in the meta-analysis of the CBBCT arm; the summary of all necessary information is tabulated in Table 1. Pooled sensitivity with 95% confidence intervals across the studies is 83.7% (95% CI: 54.6-95.7, I 2 = 94); while the pooled specificity is 71.3% (95% CI: 47.5-87.2, I 2 = 94); as shown in Figure 8. There is substantial heterogeneity within studies for both sensitivity and specificity as the value of I 2 is higher than 75% and a p-value less than 0.05. Due to the small number of included studies, further subgroup analyses for evaluating a potential source of heterogeneity were not performed. The pooled positive likelihood ratio (LR + ) is 2.71 (95% CI: 1.39-5.29, I 2 = 95), while the pooled negative likelihood ratio (LR − ) is 0.21 (95% CI: 0.07-0.32, I 2 = 97), as shown in Figure 9. The pooled AUC of SROC is 0.831, as shown in Figure 7b.

Discussion
The systematic review identified 17 studies for the DBT arm and five studies for the CBBCT arm, comparing the diagnostic accuracy using sensitivity, specificity, mean AUC of SROC, positive and negative likelihood ratios as a figure of merits. Our results showed that the pooled sensitivity of DBT was 86.7% (95% CI: 80.3-91.2) and was higher than that of the pooled sensitivity of CBBCT 83.7% (95% CI: 54.6-95.7), with about 3% with a p-value of 0.7622. Likewise, the pooled specificity of DBT showed an improvement over CBBCT from 87.7% (95% CI: 79.9-91.8) and 71.3% (95% CI: 47.5-87.2) by 16.4%. The pooled LR + of DBT is 6.28 (95% CI: 4.40-8.96) and was slightly higher than that of CBBCT with pooled LR + of 2.71 (95% CI: 1. 39-5.29). The result signifies that DBT is six times more likely to detect patients with breast cancer than patients without breast cancer, as LR + is greater than 10 and LR − is less than 0.1 produces the greatest efficiency [25]. The pooled AUC of SROC of the DBT arm is 0.925 and was significantly higher than that of the CBBCT arm (p-value = 0.016), 0.831. The pooled LR + and LR − of the CBBCT are 2.71 and 0.21, respectively, which cause a small change in the pre-test probability [25]. Although the result presented by Uhlig et al. [19] showed a pooled sensitivity of 78.9%, the specificity of 69.7% and AUC of 0.817, the result of our CBBCT arm showed higher improvement in terms of pooled sensitivity and sensitivity and mean AUC value. The summary of pooled results is shown in Table 2.  We decided to check the effect of the different study protocols (prospective and retrospective studies) on diagnostic performance by conducting a sub-group analysis. The analysis with retrospective studies has a sensitivity of 84.6% (95% CI: 74.6-91.1, I 2 = 84% for 8 studies), while that of prospective studies was 86.7% (95% CI: 80.3-91.3, I 2 = 89% for 9 studies), indicating no significant heterogeneity between the sensitivity as shown in Appendix A ( Figure A1). In addition, the specificity is 83.0% (95% CI: 69.2-91.3, I 2 = 93% for 6 studies) for retrospective studies, while the specificity of prospective studies is 87.0% (95% CI: 79.9-91.8, I 2 = 96% for 9 studies) in Appendix A ( Figure A1). The result indicates that prospective studies of DBT show a slight non-significantly improvement over retrospective studies in terms of sensitivity and specificity with a p-value of 0.2509. This increase in mean AUC of DBT might have resulted from the significantly higher value of sensitivity and specificity recorded by most of the included studies [34][35][36]39,40,[42][43][44]. In contrast, similar lower specificity has been recorded in the CBBCT counterparts [12,48,49], contrarily [11,13] reported higher specificity like that of its DBT counterparts as likely supported by Chappell et al. [30], that an effective diagnostic test should have corresponding high sensitivity and specificity, which significantly contribute to the AUC of the SROC curve. The pooled result of our study has demonstrated the diagnostic potency of DBT over the CBBCT for both sensitivity, specificity, positive and negative likelihood ratio, and AUC. When we compared our pooled sensitivity and specificity with that of Belair et al. [20], which had a sensitivity of 87% (95% CI: 80-92) and 70% (95% CI: 60-79) for DBT and CBBCT and specificity of 81% (95% CI: 72-87) and 67% (95% CI: 57-77), we discovered that our pooled sensitivity for the DBT is within the same range, while the pooled specificity has improved by approximately 7.2%. Comparing Belair et al. [20] with our pooled result for CBBCT showed that sensitivity and specificity have improved by 13.7% and 4.3 %, respectively. According to Zuley et al. [21], for lesion visibility and diagnostic accuracy of CBBCT, DBT, and MRI, the AUC of 0.84 and 0.75 was estimated for DBT and CBBCT pooled AUC result improved by 11.3% and 10.8%. The result shows a statistical significance in the pooled AUC for DBT with p-value = 0.016, as this will provide better diagnostic power compared to univariate sensitivity and specificity. Although the abbreviated 3D breast MRI has been used to screen patients with a high risk of breast cancer due to its high sensitivity between 80-94% and specificity of 80-100% [52,53], however, some small lesions of less than 5 mm in size and ductal carcinoma in situ (DCIS) are not easily visible due to their diffuse pattern of spread [53,54]. Additionally, the cost of an MRI examination and the time cost for each examination has limited its widespread application [55]. Previous studies on the comparison of CBBCT with DM have shown the higher performance of CBBCT on breast masses characterization [12,13], in cancer detection [48] and improved performance and good interrater agreement among readers [47], therefore making CBBCT a potential modality for improved diagnosis of breast cancer.
The studies have several limitations; firstly, the result of both arms was not extracted from the same studies (comparison with a different cohort) according to Yang et al. [27], as no comparison studies between CBBCT and DBT were available within the study's scope and range of year covered, which might have introduced a potential bias between the result. Secondly, the sample size of the CBBCT arm is also one-third of that of the DBT arm, the pooled estimate may not fully represent the statistical power we are looking for; thus, the CBBCT result is underrepresented; therefore, the statistical significance of CBBCT might reduce as more sample size tends to increase the statistical significance of a model. Thirdly, due to the recent introduction of CBBCT as a screening or diagnostic imaging modality, no large multicenter prospective or clinical trial studies are available with no standardized acquisition protocol [19], thus making a direct comparison with the DBT modality a daunting task.

Conclusions
Our study demonstrates that DBT shows improved diagnostic performance over CBBCT with pooled sensitivity, specificity AUC, and positive and negative likelihood ratios. This improvement shows a statistical significance for AUC diagnostic parameter, as this parameter would represent higher diagnostic power compared to its derivative sensitivity and specificity. We believe that the diagnostic performance of CBBCT would continue to improve due to more understanding of the underpinned imaging physics of this modality coupled with computer-aided detection application and better experiences of a radiologist. We recommended more prospective studies on the direct comparison of diagnostic accuracy of CBBCT and DBT for breast cancer characterization and detection.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/s22093594/s1, File S1: PRISMA checklist table; File S2: Detailed search strategy describing the MeSH and text-word for all the databases.
Author Contributions: T.E.K.: methodology, validation, formal analysis, investigation, data curation and conceptualization, writing-original draft, writing-reviewing, and editing. C.Z.: methodology, validation, formal analysis, investigation, data curation, writing-original draft, writing-reviewing and editing. O.A.O.: methodology, validation, formal analysis, investigation, data curation, writingoriginal draft, writing-reviewing and editing. G.Y.: methodology, validation, investigation: methodology, validation, formal analysis, investigation, writing-reviewing and editing. Q.D.: methodology, validation, investigation and formal analysis. M.L.: methodology, validation, investigation and formal analysis. J.Z.: methodology, validation, investigation and formal analysis, writing-reviewing and editing. X.Y.: methodology, validation, formal analysis, investigation, data curation and conceptualization, writing-original draft, writing-reviewing and editing, project administration and supervision. All authors have read and agreed to the published version of the manuscript. Acknowledgments: The authors acknowledged Kayode Charles Komolafe at Jackson State University, United States of America for proofreading this article and other anonymous reviewers for their constructive criticism.

Conflicts of Interest:
The authors declare no competing financial interest or personal relationship that could have appeared to influence the work reported in this paper. Appendix A Figure A1. Univariate sub-group analysis of sensitivity and specificity with random model based on the different study protocols. g represents sub-group analysis of data when g = 0 (Retrospective studies) and g = 1 (Prospective studies).