The Potential of Adding Mammography to Handheld Ultrasound or Automated Breast Ultrasound to Reduce Unnecessary Biopsies in BI-RADS Ultrasound Category 4a: A Multicenter Hospital-Based Study in China

The appropriate management strategies for BI-RADS category 4a lesions among handheld ultrasound (HHUS) remain a matter of debate. We aimed to explore the role of automated breast ultrasound (ABUS) or the second-look mammography (MAM) adjunct to ultrasound (US) of 4a masses to reduce unnecessary biopsies. Women aged 30 to 69 underwent HHUS and ABUS from 2016 to 2017 at five high-level hospitals in China, with those aged 40 or older also accepting MAM. Logistic regression analysis assessed image variables correlated with false-positive lesions in US category 4a. Unnecessary biopsies, invasive cancer (IC) yields, and diagnostic performance among different biopsy thresholds were compared. A total of 1946 women (44.9 ± 9.8 years) were eligible for analysis. The false-positive rate of category 4a in ABUS was almost 65.81% (77/117), which was similar to HHUS (67.55%; 127/188). Orientation, architectural distortion, and duct change were independent factors associated with the false-positive lesions in 4a of HHUS, whereas postmenopausal, calcification, and architectural distortion were significant features of ABUS (all p < 0.05). For HHUS, both unnecessary biopsy rate and IC yields were significantly reduced when changing biopsy thresholds by adding MAM for US 4a in the total population (scenario #1:BI-RADS 3, 4, and 5; scenario #2: BI-RADS 4 and 5) compared with the current scenario (all p < 0.05). Notably, scenario #1 reduced false-positive biopsies without affecting IC yields when compared to the current scenario for ABUS (p < 0.001; p = 0.125). The higher unnecessary biopsy rate of category 4a by ABUS was similar to HHUS. However, the second-look MAM adjunct to ABUS has the potential to safely reduce false-positive biopsies compared with HHUS.


Introduction
Mammography (MAM) is widely used as the standard modality for detecting and screening early breast cancer. However, the diagnostic accuracy of MAM is limited in women with dense breasts [1][2][3]. Another barrier for MAM to apply and expand sustainability is the lack of equipment, especially in low resources areas [4].
Conventional handheld ultrasound (HHUS) offers a low-cost and portable way of breast cancer detection without the limitations of breast density [5], thereby increasingly being used in clinical breast examination. However, operator dependence has long been a concern for HHUS and causes interobserver variability. Automated breast ultrasound (ABUS) is a newly designed tool with the potential to overcome the criticism of HHUS by separating image acquisition from interpretation to increase reproducibility [6]. Multiplanar reconstructions also provide an advantage for evaluating breast lesions which might help improve the diagnosis accuracy [6].
To provide standardized ultrasound (US) findings reporting systems and aid quality assurance and risk assessment, the Breast Imaging Reporting and Data System (BI-RADS) is generalized worldwide [7]. The latest nationwide survey in mainland China reported the average utilization rate of BI-RADS was up to 87.02% among 5460 departments providing ultrasound diagnoses [8]. However, the application of category 4 subdivisions in the new fifth BI-RADS lexicon offers a challenge for managing BI-RADS 4a. The malignant rate of BI-RADS category 4a is meager (2-10%), in which immediate biopsy referral is recommended, while BI-RADS category 3 refers to probably benign masses (<2%) with short-term followup imaging recommended [7]. In case to avoid missed diagnoses, observers tend to upgrade breast masses into 4a when it is difficult to determine category 3 or 4a, but this may result in unnecessary biopsies.
The benign biopsy rate on breast US of 4a patients is a considerable percentage (more than 50%) [9,10]. Unnecessary biopsies can result in negative consequences for normal women, including the risk of complications, psychological anxiety, and additional financial costs [11][12][13]. Previous studies about stratifying and managing the way of 4a patients were mainly focused on incorporating elastography into US workflows or developing predictive models including radionics and clinical factors [9,10,14,15]. However, avoiding excessive biopsies of HHUS category 4a by supplementing other techniques remains for further exploration.
The diagnostic performance between ABUS and HHUS has been proven comparable based on the fifth BI-RADS edition [16]. However, to our knowledge, there is not yet established evidence to identify whether the accuracy of category 4a on ABUS is higher than HHUS. Furthermore, given the advantages in diagnosing calcification lesions [17], MAM provides a potential complementary option to improve diagnostic performance when combined with US (ABUS or HHUS). Few studies have evaluated whether adding MAM to lesions assessed as US category 4a improves diagnostic accuracy and reduces unnecessary biopsies rate. Therefore, we aimed to exploratorily assess the diagnostic performance of ABUS or the second-look MAM adjunct to US to help reduce false-positive diagnoses of 4a patients without impacting the breast cancer detection rate.

Study Population and Design
The research design has been published in detail elsewhere [18]. Briefly, this multicenter cross-sectional study was conducted in five high-level hospitals located in China (including Beijing, Tianjin, Shanghai, Hangzhou, and Guangzhou) from February 2016 to March 2017. Female outpatients with breast-related complaints were recruited in this study. The exclusion criteria included aged <30 and ≥70 years; previously received a diagnosis of or treatment for breast cancer; undergone surgical or percutaneous breast procedures in the past 12 months; had a history of lumpectomy, contra-lateral mastectomy, or breast augmentation; and currently pregnant, breastfeeding, or planning to become pregnant.
All participants were invited to attend both HHUS and ABUS, while those aged 40 years and above also underwent MAM. Patients with the most severe category on three modalities, including BI-RADS 4 and 5, were considered positive findings and required a biopsy, whereas those with BI-RADS 1, 2, or 3 were categorized as negative findings. The study was registered in the Chinese Clinical Trial Registry (ChiCTR1800017908) and approved by the Institutional Review Board of Cancer Institute, Chinese Academy of Medical Sciences (IRB approval No.15-061/988), and the Institutional Review Board of all participating hospitals. According to the study aims, we included the participants with HHUS or ABUS categories 3 and 4a as the analysis set. The study design was shown in Figure 1 in detail. study was registered in the Chinese Clinical Trial Registry (ChiCTR1800017908) and approved by the Institutional Review Board of Cancer Institute, Chinese Academy of Medical Sciences (IRB approval No.15-061/988), and the Institutional Review Board of all participating hospitals. According to the study aims, we included the participants with HHUS or ABUS categories 3 and 4a as the analysis set. The study design was shown in Figure 1 in detail.

Image Acquisition and Interpretation
The participants underwent ABUS using Invenia ABUS (GE Healthcare, Sunnyvale, CA, USA) performed by technicians who received training for 3 days and interpreted by radiologists with 3-6 months of experience with ABUS. Three planes (including lateral, anteroposterior, and medial) are collected on each breast. The image in three views could be transmitted to the workstation and reconstructed in the breast and displayed in 3D volumes. The HHUS images were acquired by one of the following devices, including GE LOGIQ9 (GE Medical Systems, Milwaukee, WI, USA), iU22 Ultrasound System (Philips Medical System, Bothell, WA, USA), S2000 (Siemens Medical Solutions, Mountain view, CA, USA), and the Aixplorer system (Supersonic Imagine, Aix en Provence, France),

Image Acquisition and Interpretation
The participants underwent ABUS using Invenia ABUS (GE Healthcare, Sunnyvale, CA, USA) performed by technicians who received training for 3 days and interpreted by radiologists with 3-6 months of experience with ABUS. Three planes (including lateral, anteroposterior, and medial) are collected on each breast. The image in three views could be transmitted to the workstation and reconstructed in the breast and displayed in 3D volumes. The HHUS images were acquired by one of the following devices, including GE LOGIQ9 (GE Medical Systems, Milwaukee, WI, USA), iU22 Ultrasound System (Philips Medical System, Bothell, WA, USA), S2000 (Siemens Medical Solutions, Mountain view, CA, USA), and the Aixplorer system (Supersonic Imagine, Aix en Provence, France), which was performed by qualified radiologists with 5-25 years of experience in five hospitals. All MAM examinations were performed by one of three techniques including GE Sengraphe DS (GE Medical Systems, Milwaukee, WI, USA), Hologic Selenia (Hologic, Bedford, MA, USA), and Fujifilm FDR MS-2500 (Fujifilm Crop, Tokyo, Japan) and interpreted by doctors with 5-25 years of experience. All screening physicians involved in the study were trained in the protocol and related technical specifications and diagnostics before starting the study.
During the study, different experienced radiologists reviewed and interpreted images from three modalities and were blinded to each other. However, they were provided with information on participants' clinical examinations.

Statistical Analysis
We analyzed and compared the detection rate of normal/benign and malignant lesions classified as US categories 3, 4a, 4b, and 4c using the Chi-squared test for trend. In clinical practice, observers always have difficulty in better characterizing category 3 and 4a lesions even for highly qualified experts. Therefore, to evaluate the clinical and image features influencing the false-positive lesions in category 4a, we selected those who were categorized as 3 and 4a and underwent biopsy and evaluated them as benign breast lesions as an analysis set. With category 3 as the reference group, multivariable logistic regression analysis was used to estimate odds ratios (ORs) and confidence intervals (CIs). As for HHUS, the following characteristics of the lump were included in the analysis: maximum diameters, shape, orientation, margin, posterior feature, calcification, distorted structure, duct change, and vascularity. As for ABUS, we also analyzed retraction phenomenon in the coronal view. Furthermore, age, menopausal status, breast density, and palpability of the mass were also included in logistic regression analysis to control for potential confounding variables. Variables that were statistically significant in the univariate analysis would be prioritized for inclusion in the multivariate analysis, and for that were non-significant in the univariate but clinically valuable also be considered for analysis.
Unnecessary biopsy rate, invasive cancer (IC) detection rate, malignant rate of biopsy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and area under curve (AUC) were calculated to evaluate the diagnostic performance among different biopsy thresholds, which were compared using the McNemar's tests or the Chisquared test. The statistical analysis was performed with SAS, version 9.4 (SAS Institute, Cary, NC, USA). A p-value <0.05 was considered statistical significance.

Clinical and Imaging Factors Associated with False-Positive Lesions in Category 4a
Among 371 benign lesions assessed as categories 3 and 4a on HHUS, 127 were assessed as false-positive cases in 4a. Meanwhile, the false-positive cases were 77 in ABUS 4a among 357 benign lesions in 3 and 4a. Tables 2 and 3

Diagnostic Performance of Adding MAM to HHUS or ABUS
We evaluated the effect of changing biopsy thresholds for women with US category 4a lesions who underwent MAM among women aged 40 and above (HHUS, 138 women; ABUS, 94 women). Three scenarios about different biopsy thresholds are shown in Table 4, including all women with BI-RADS-US (HHUS or ABUS) category 4a undergoing biopsy (current scenario), women with BI-RADS-US category 4a and BI-RADS-MAM category 3, 4, and 5 undergoing biopsy (scenario #1), and women with BI-RADS-US category 4a and BI-RADS-MAM category 4 and 5 undergoing biopsy (scenario #2).
The diagnostic performance of different scenarios was compared among women with BI-RADS-US category 3 and 4a lesions ( Table 4). The AUCs of the combination of HHUS and MAM (both scenarios #1 and #2) were similar to that of the current scenario (p = 0.238; p = 0.095). Meanwhile, only scenario #1, which adds MAM to ABUS, obtained a similar AUC compared with the current scenario (p = 0.277). Although sensitivity was significantly lower in both new scenario groups than in the current scenario group, specificity and PPV improved for HHUS and ABUS (all p < 0.05). Table 5 shows the effect of increasing biopsy thresholds on unnecessary biopsy rate, IC detection rate, and malignancy rate of biopsy when integrating MAM with HHUS or ABUS. For HHUS, the unnecessary biopsy rate was significantly reduced to 39.86% (55/138) and 28.26% (39/138) for scenario #1 and scenario #2 compared with the current scenario, respectively (all p < 0.001), and the malignancy rate of biopsy increased to 45.54% (46/101) and 51.25% (41/80), respectively (p = 0.102; p = 0.008). However, both new scenarios had significantly lower IC detection rates than the current scenario (all p < 0.001). Similar patterns were recorded for ABUS, apart from scenario #1, which significantly reduced the false positive biopsies (p < 0.001) without decreasing IC yield (p = 0.125).

Value of Adding MAM to HHUS or ABUS in Reducing Unnecessary Biopsy
We also compared the unnecessary biopsy rates, IC yields, and malignant rate of biopsy between the two new scenarios and the current scenario by age, breast density, and palpability of the mass (Table 5). In all subgroups, a lower unnecessary biopsy rate was always significantly noted for the two new scenarios in both HHUS and ABUS (all p < 0.001). The IC yields of the two new biopsy thresholds were not inferior to the current scenario for HHUS in women with less dense breasts (p = 1.000) and those with palpable masses (p = 0.063). For ABUS, we did not observe a significant difference in diagnostic performance in all subgroups between the two new scenarios and the current scenario, except for the IC yields of scenario #2 of women with dense breasts (p = 0.016).

Discussion
The potentially large number of unnecessary biopsies resulting from the current recommendation for BI-RADS-US category 4a creates an additional burden for women and impacts clinical resources. Our findings showed that the false-positive rate of category 4a in ABUS was almost 65.81%, which was similar to HHUS (67.55%). Meanwhile, clinical and sonographic factors influencing the 4a false-positive lesions were observed differently between HHUS and ABUS, which might be associated with radiologists' experiences and equipment difference. To note, the potential added value of the second-look MAM adjunct to HHUS 4a was identified to reduce unnecessary biopsy procedures among women with dense breasts or palpability masses without influencing the invasive cancer detection. In addition, the new strategy combining ABUS category 4a and MAM 3, 4, and 5 as a new biopsy threshold would have the potential to safely reduce false-positive biopsies.
Current criticisms of HHUS include concern about the false positive results and associated unnecessary biopsies [19]. The range of the malignancy rate for BI-RADS-US 4 lesions is wide (2~95%) [7]. In particular, considerable overlapped image features between benign and malignant lesions in category 4a result in difficulty to distinguish malignancy. The primary reason is lacking objective criteria for the subclassification of category 4 lesions which are largely based on the experience of the sonographers [20]. Our results also reflected that the benign biopsy rate of 4a was higher even when performed by highly qualified experts from high-level hospitals, which was following the conclusions of previous studies [9,10].
The potential of ABUS in the diagnostic setting of breast cancer has currently become the research focus because of its benefits [21]. Some unique features through multiplanar reconstructions can provide additional information for differentiating benign and malignant masses [22]. For example, the retraction phenomenon, as the specific feature observed in ABUS coronal view, has been suggested to be a predictable characteristic of breast cancer [23]. Our previous studies have suggested that specificity and PPV were significantly higher in ABUS, compared with that of HHUS [24,25]. However, this study showed that the unnecessary biopsy rate of ABUS among 4a masses is similar to that of HHUS. This might be explained by the lower ability of radiologists who review the ABUS images to evaluate category 4a even with standardized training before. Additionally, there is not yet a well-established specific criterion in determining lesion characteristics with ABUS images worldwide, primarily based on BI-RADS-US descriptors. Of note, the non-essential biopsy rate was 72.84% in 81 patients assessed with both ABUS and HHUS. Thereby, the technological inherent limitations of US equipment may also be another important reason.
In routine clinical practice, the interpretation criterion of category 4a is a mass with benign ultrasound appearance but exhibiting any suspicious sign [26]. Of all benign lesions, duct change, nonparallel masses, and architectural distortion increased the level of suspicion for these masses and preferred BI-RADS 4a to 3 for HHUS in our study. Surrounding background tissue change results in the poor demarcation between masses and normal tissue, which may partly be explained by these features impeding the evaluation accuracy the breast lesion [27][28][29]. Meanwhile, calcification was observed as being associated with false-positive cases in lesions of category 4a examined using ABUS. Due to the influence of probe frequency, tissue background echo, and operator technology, US is not ideal to detect microcalcification in lesions even though it is the key imaging feature for the diagnosis of breast cancer [30]. Notably, we also found that menopausal status tended to have higher probability of false positives. ABUS separates image acquisition (performed by the technicians) from interpretation. Therefore, sonographers will pay more attention to the clinical characteristics of patients compared with HHUS, such as menopausal status. Above all, benign possibilities should be taken into account when these features are found, which suggests that examiners need to integrate other important image features when interpreting ultrasound images by receiving specific training about BI-RADS descriptors. More importantly, supplemental other diagnostic tools might be effective strategies to help triage populations with lower risk by delaying biopsy interventions and avoiding making unnecessary recommendations.
Previous works have explored the management strategies of US category 4a. Several studies mainly focused on the new US imaging technique, elastography, and have confirmed the potential of combined shear wave and strain elastography to US to reduce unnecessary biopsies in breast cancer diagnostics [10,14,15]. However, the evidence of evaluating the value of other methods added to US is scant. To date, no other trials of integrated MAM in US have reported results. Lacking sufficient evidence to reduce breast cancer mortality could be a barrier to implementing the widespread US as the stand-alone screening modality. Currently, supplemental US to MAM has become a mainstay of diagnostic breast imaging for women with mammographically dense breasts. Some low-and middle-income countries (LMICs) are exploring HHUS application as a primary screening method for breast cancer because of the advantages of being cheap, having higher access, and being noninvasive [31][32][33]. A systematic review demonstrated that studies focusing on HHUS applications in LMICs have risen nearly by 60%, which reveals the increasing adoption of HHUS equipment worldwide [34]. However, given the lower specificity and higher false-positive rate of HHUS, it is important to explore US-based diagnostic strategies in combination with other techniques.
US (HHUS or ABUS) category 4a combined with MAM positive results (category 4 and above) as the biopsy threshold can significantly improve diagnostic performance and reduce false-positive biopsies when compared to the current scenario, but probably with the risk of missing invasive cancer. The most likely explanation is that more than 70% of participants were aged 40 and older and almost 50% of them were premenopausal who underwent MAM and were found to have dense breasts in our study, which may be associated with the lower sensitivity of MAM [18]. Of note, our findings also revealed that the new biopsy threshold did not affect the invasive cancer yield, which is comparative with the current scenario for women with less dense breasts. Furthermore, this study was conducted in hospitals and the conclusions came from the symptomatic population who has a higher risk for breast cancer than the asymptomatic population. In view of these issues, whether an immediate biopsy strategy is needed for this group still depends on clinicians' perceptions of acceptable risks based on an individual patient basis to balance the pros and cons.
Notably, we found that the added value of the second-look MAM adjunct to HHUS 4a could acquire higher cancer yields when breast masses were palpable, which might be related to the probability of malignancy being fairly high in palpable lesions. Palpability is likely to be viewed with more suspicion by these masses, providing information to aid diagnosis for radiologists [35]. A previous study showed the combination of MAM and HHUS could potentially increase the negative predictive value among women with palpable breast abnormality [36].
Most importantly, this study provides a more practical perspective that when the biopsy threshold identified BI-RADS 3 and above for MAM combined with BI-RADS 4a for ABUS has benefited over the current biopsy strategy for reducing false-positive biopsies without affecting the detection performance. Findings from a prospective study indicated that ABUS has a higher ability to detect architectural distortions, one of the risk factors of subsequent breast cancer in mammographic findings [37], on the coronal plane than HHUS [38]. Additionally, ABUS can supplement mammography to detect more non-calcified carcinomas compare with HHUS in women with dense breasts [38]. This might explain the higher diagnostic performance of the biopsy threshold (scenario #2) for ABUS than that of HHUS. Furthermore, we also acknowledged that the difference between HHUS and ABUS might result from the limited sample size in category 4a.
The reasons for false-positive findings need to be identified through external quality assessment in clinical practice. Some cases without abnormal pathological findings might have image changes that mimic the appearance of precancerous lesions, resulting in misclassification as positive results. This group then needs to be given priority atten-tion, because the image feature abnormalities are more likely to be risk markers of breast cancer [39]. A previous retrospective study performed by Hofvind et al. showed that a higher interval breast cancer rate appeared after a false-positive result in a MAM-based screening program [40]. The biological susceptibility maybe contributes to the increased risk for breast cancer [41]. Thereby, risk-based stratification management strategies play a vital role for women with false-positive results. However, because of our cross-sectional study design, future works should be conducted to explore the safe screening intervals for false-positive recalls.
The main strength of this study is that it is the first to evaluate the added value of the second-look MAM adjunct to US (HHUS or ABUS) category 4a. It possibly contributes to the understanding that MAM might be a useful additional tool for US in breast cancer diagnostics to better distinguish which patients require a histopathologic confirmation of suspicious lesions on imaging. This also provides potentially helpful strategies for improving diagnostic performance in areas where US is applied as the first-line breast diagnostic method.
This study had several limitations. First, the experience of the radiologists among five research centers could affect the ability of image acquisition and interpretation. However, the variability among radiologists might be avoided to some extent by the standardized training before the research. Another limitation is that the absence of follow-up information may affect the accurate evaluation of long-term effectiveness results in patients with falsepositive biopsies of US 4a among different biopsy thresholds. In addition, the study participants were recruited from hospital outpatients with a higher risk of breast cancer, which does not reflect the new biopsy thresholds applications for the general population. To address this issue, we now have conducted ongoing real-world research to explore the screening effectiveness for HHUS, ABUS, and MAM in average-risk populations.

Conclusions
The higher unnecessary biopsy rate of category 4a by ABUS was very similar to HHUS, reflecting the image factors influencing the false positive 4a lesions should be the focus of integrated training. The second-look MAM adjunct to HHUS had the potential to reduce overdiagnosis for women with less dense breasts or palpable breast masses. Notably, BI-RADS 3 and above for MAM combined with BI-RADS 4a for ABUS benefited from the current biopsy strategy and safely reduced false-positive biopsies. Future work is still needed to explore the appropriate follow-up interval for false-positive patients in specific populations.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The dataset analyzed during the current study is available from the corresponding author upon reasonable request.