Diagnostic Efficacy across Dense and Non-Dense Breasts during Digital Breast Tomosynthesis and Ultrasound Assessment for Recalled Women

Background: To compare the diagnostic efficacy of digital breast tomosynthesis (DBT) and ultrasound across breast densities in women recalled for assessment. Methods: A total of 482 women recalled for assessment from January 2017 to December 2019 were selected for the study. Women met the inclusion criteria if they had undergone DBT, ultrasound and had confirmed biopsy results. We calculated sensitivity, specificity, PPV, and AUC for DBT and ultrasound. Results: In dense breasts, DBT showed significantly higher sensitivity than ultrasound (98.2% vs. 80%; p < 0.001), but lower specificity (15.4% vs. 55%; p < 0.001), PPV (61.3% vs. 71%; p = 0.04) and AUC (0.568 vs. 0.671; p = 0.001). In non-dense breasts, DBT showed significantly higher sensitivity than ultrasound (99.2% vs. 84%; p < 0.001), but no differences in specificity (22% vs. 33%; p = 0.14), PPV (69.2% vs. 68.8%; p = 0.93) or AUC (0.606 vs. 0.583; p = 0.57). Around 73% (74% dense and 71% non-dense) and 77% (81% dense and 72% non-dense) of lesions assigned a RANZCR 3 by DBT and ultrasound, respectively, were benign. Conclusion: DBT has higher sensitivity, but lower specificity and PPV than ultrasound in women with dense breasts recalled for assessment. Most lesions rated RANZCR 3 on DBT and ultrasound are benign and may benefit from short interval follow-up rather than biopsy.


Introduction
Breast cancer screening using two-dimensional (2D) mammography is currently the primary standard of care [1]. In women aged 40-69 years, screening resulted in a significant-44%-reduction in breast cancer mortality, compared to 16% in those who were not screened [2]. The decline in mortality can be attributed to timely detection of breast cancer through screening and to developments in breast cancer management and treatment [3][4][5]. This drop in mortality is a major step toward minimising the breast cancer burden. Despite the benefits accrued through screening, limitations remain around the diagnostic accuracy of digital mammography (DM). Further, 2D mammography misses between 20% and 30% of breast cancers because of a masking effect in dense breast parenchyma, resulting in low sensitivity [6]. On the other hand, superimposition of normal fibroglandular tissue can yield erroneous mammograms, causing a high rate of unnecessary recalls [7,8].

Study Design
We retrospectively reviewed the radiologists' reports for the recalled women. In the BreastScreen program, the screening mammography cases are independently interpreted by two radiologists. All cases were interpreted by radiologists trained in mammography image interpretation and dedicated to breast imaging, and all were involved directly in the clinical and screening activities within the BreastScreen program. Cases independently rated as being suspicious of malignancy by two radiologists were recalled based on the RANZCR breast imaging lesion classification used by BreastScreen Australia [28]. This classification system is based on a simple 1-5 grading scale: 1 = 'no significant abnormality', 2 = 'benign', 3 = 'equivocal', 4 = 'suspicious lesion' and 5 = 'malignant lesion'. Two mammographic views were acquired for each breast: cranio-caudal (CC) and medio-lateral oblique (MLO). Further mammography spot views were acquired if deemed necessary. The case is returned to routine screening if it is classified as no significant abnormality (RANZCR 1) or benign (RANZCR 2) but recalled for assessment if it is classified as equivocal (RANZCR 3), suspicious (RANZCR 4), or malignant (RANZCR 5). Cases rated RANZCR 3, 4, or 5 on screening mammography were assessed using DBT, ultrasound, and percutaneous needle biopsy (fine needle aspiration (FNA) cytology or core biopsy). Breast density was reported according to the Breast Imaging Reporting and Data System (BI-RADS, 5th edition) [29]: BI-RADS A: "The breasts are almost entirely fatty" BI-RADS B: "There are scattered areas of fibroglandular density" BI-RADS C: "The breasts are heterogeneously dense, which may obscure small masses" BI-RADS D: "The breasts are extremely dense, which lowers the sensitivity of mammography" DBT and ultrasound examinations were performed before patients were referred for biopsies. DBT scanning was performed using Selenia Dimensions, Hologic. Radiologic features such as calcification, stellate lesions, discrete masses, non-specific density, architectural distortion, and multiple masses were used to describe lesions identified on DBT.
Real-time B-mode Breast ultrasound was performed using an ACUSON S2000 Ultrasound System (HELX Evolution with Touch Control, Siemens Medical Solutions), equipped with a 12L4 linear array transducer . Colour Doppler was also used for characterisation of breast lesions. Where a lesion was detected on ultrasound, the sonographic features were described. The descriptions included indeterminate mass, cystic mass, solid mass (probably benign), solid mass (probably malignant) and axillary lymph nodes. Lesions detected on DBT and ultrasound were also rated using the RANZCR breast imaging lesion classification scale. Information such as lesion size, lesion location, tumour grade, patient age and personal/family history of breast cancer were retrieved from the database. Both DBT and ultrasound were interpreted by one radiologist depending on the digital mammographic findings.

Histopathological Testing
All cases graded as equivocal, suspicious, or malignant were biopsied using needle core biopsy or FNA with image guidance, e.g., ultrasound or mammography, as part of the BreastScreen Australia program. Needle core biopsy was the procedure of choice, while FNA was limited to simple cysts and lymph nodes. Needle core biopsy provided histological confirmation of malignant status (e.g., invasive or non-invasive), cancer type, and tumour grade in breast malignancies.

Statistical Analysis
Using these data, we calculated the diagnostic performance of DBT and ultrasound in terms of sensitivity, specificity, positive predictive value (PPV) and the area under the curve of the receiver operator characteristics (AUC) curve across dense and non-dense breasts. For the analysis, the RANZCR breast imaging lesion classifications of 1 and 2 were considered as negative findings, and classifications of 3, 4, and 5 were considered positive findings. For breast density, cases categorised as BI-RADS A and B were considered non-dense breasts, and those classified as BI-RADS C and D were considered dense breasts. The difference between cancer sizes in dense and non-dense breasts were compared using a Mann-Whitney U test. McNemar's test was used to compare the sensitivity and specificity of DBT and ultrasound in dense and non-dense breasts. The Two Proportion Z-Test was used to compare the PPVs of DBT and ultrasound in dense and non-dense breasts. The method for paired sample design, devised by Delong et al. [30] was employed to compare the AUCs of DBT and ultrasound in dense and non-dense breasts. A p-value ≤ 0.05 was considered statistically significant. These statistical analyses were conducted via the open-source Jamovi software (1.6.22) and R statistical software (4.0.3).

Discussion
Even with excellent mammographic technique and independent double reading, an image may be difficult to interpret, lesion may not be well categorised, or radiologists may show inter-reader disagreement. Assessment modalities, such as DBT and ultrasound, may be required to thoroughly evaluate recalled mammographic findings. Both techniques, however, may differ in their classification of breast lesion types. In this study, we compared the diagnostic performance of DBT and ultrasound in women with dense and non-dense breasts. We found that, in dense breasts, DBT showed significantly higher sensitivity, but significantly lower specificity, PPV and AUC, than ultrasound. In non-dense breasts, the sensitivity of DBT was also significantly higher than that of ultrasound; however, no significant differences were found in specificity, PPV or AUC.
Our findings differ from previous studies [25][26][27] due to the following reasons. First, it is the first to compare DBT and ultrasound in mammography-recalled women across dense and non-dense breasts. Second, the synoptic breast imaging report used by BreastScreen Australia differs from the interpretation strategies used in non-Australian studies. A score of 3 in the US BI-RADS lexicon indicates that the lesion will require a six-month follow-up, whereas a score of 3 in the RANZCR scoring scheme indicates further assessment and biopsy, which may increase unnecessary recalls. Third, our data contains many recalled calcifications, which remain a challenge for ultrasound [31,32].
Our findings that adding DBT and ultrasound to screening programs enhances the early detection of breast cancers in both dense and non-dense breasts are consistent with the literature [12,33]. Importantly, we found that cancers detected by DBT, and ultrasound are small and/or invasive DBT and ultrasound detected many breast cancers, including 79% of invasive cancers and 49% of cancers smaller than 1 cm. Early cancer detection for small, invasive cancers is beneficial for prognosis and treatment [6,34,35]. It has been reported that the 10-year survival from breast cancer is substantially higher for women with small-sized cancers. For example, the 10-year survival from breast cancers no larger than 1 cm is 87%, compared to 76% and 75% for cancers ranging in size from 1.1-2 cm and 2.1-3 cm, respectively [36]. Therefore, imaging tools that improve the detection of smallsized malignant tumours could improve treatment outcome. Another interesting finding with respect to tumour size was that cancers detected in dense breasts were significantly larger than those in non-dense breasts. Larger sized tumours in dense breasts can be explained by two factors. Firstly, dense tissue contains high proportion of stromal cells, which regulate the proliferation of epithelial cells and are progenitors of collagen, which binds to growth hormone to support tumour reorganisation. These factors may act together to facilitate rapid growth of tumours in dense compared to fatty breasts. Secondly, some of the lesions in dense tissue may be interval cancers or cancers that were missed at previous mammography screening due to the masking effect of mammographic density [37][38][39]. Regardless, the findings suggest the need to tailor screening intervals and pathways according to mammographic density to detect small-sized early-stage disease.
We observed that calcifications were more likely to be classified as positive on DBT relative to ultrasound in all breast compositions. This is indicated by the significantly higher sensitivity on DBT compared to ultrasound. Nevertheless, the advantages of DBT in detecting calcifications (100% sensitivity) should be balanced against its disadvantages, which include low specificity and PPV, particularly in dense breasts. Given the high prevalence of calcifications in the screening population, reasonable positive thresholds are required to increase both specificity and PPV for DBT. A previous work shows that DBT improves diagnostic accuracy in suspicious calcification features, with an excellent AUC of 0.903 in dense breasts and 0.904 in non-dense breasts [40]. We found that DBT shows very low AUCs in suspicious calcifications, for both dense (0.509) and non-dense breasts (0.500). It should be noted that the previous work included only women who had a biopsy for suspicious calcifications (BI-RADS 4A or higher), and classified BI-RADS 4A as a negative finding since it indicates a low risk of malignancy according to the BI-RADS Atlas. These differences in study methods could have influenced the results [40].
Furthermore, we observed that ultrasound underestimated most calcification features, classifying them as RANZCR grade 1. This RANZCR rating suggests that the calcifications may have been missed or dismissed on ultrasound, particularly in dense breasts, where the sensitivity was only 37%. It has been shown that Cooper's ligaments and ductal walls may mimic calcifications, particularly in fibrocystic changes [41,42], and may be responsible for calcifications being dismissed. It should also be noted that the efficacy of ultrasound is dependent on the operator's experience and the transducer technology. For example, a 7.5 MHz transducer has a lateral resolution of approximately 1 mm, while calcifications typically measure 0.1-1.0 mm. Therefore, ultrasound could miss calcifications smaller than 1 mm [42]. The inverse relationship between transducer frequency and beam penetration may have also contributed to the low sensitivity of ultrasound for calcification in dense breasts [32]. Our findings suggest that mammography-recalled calcifications should not be completely ruled out solely on ultrasound findings.
In noncalcified lesions, DBT was comparable to ultrasound in dense breasts, but showed significantly higher sensitivity in non-dense breasts. Noncalcified lesions are mostly hypoechoic; the contrast between hypoechoic tumour and echogenic dense tissue may contribute to the high sensitivity of ultrasound in dense breasts [43,44]. The high sensitivity of DBT in women recalled suggests that 2D DM spot views may not be needed where this assessment tool is available. This is further supported by the findings that, in all breast compositions, the RANZCR grades assigned to the additional spot views and the DBT assessments were in significant agreement, as shown in Supplementary Materials (Table S1).
Recalling lesions with a low probability of cancer [28] using the RANZCR Grade 3 (a combination of BI-RADS 3/4A) may be justifiable for the following reasons. First, the screening program's primary purpose is to detect cancers in their earliest stages [45]. Second, DM is also affected by breast density, prompting the use of assessment tools to optimise breast cancer diagnosis. However, our findings show that using the same biopsy threshold for both DBT and ultrasound should be reconsidered because reducing false positives is just as important as improving breast cancer detection [7]. Our findings on DBT are consistent with a recent Australian study [21], which reported improved cancer detection rates but increased false positives. The high number of false positives could be caused by excessive use of the RANZCR Grade 3 in Australia. It is possible that the high false-positive rate associated with DBT may decrease in Australia as screening program readers gain expertise with this technology [21,46].
We found that DBT changed lesion classification in 36% of cases rated RANZCR 3 on DM, with an upgrade in 22% (86% of which had breast cancer) and a downgrade in 14% (90% of which were benign). Whereas ultrasound changed the classification in 71% of lesions rated RANZCR 3 on DM, with an upgrade in 29% (68% of which had breast cancer) and a downgrade in 42% (72% of which were benign). Another interesting finding was that the benign rate of lesions where DBT and ultrasound led to no change in classification (RANZCR 3) was much higher than the cancer rate. Biopsy results revealed that around 73% (74% dense and 71% non-dense) and 77% (81% dense and 72% non-dense) of lesions assigned a RANZCR 3 by DBT and ultrasound, respectively, were benign (Table 4). Thus, considering short-term follow-up instead of biopsy for cases classified as equivocal (RANZCR 3) on DM and during DBT and US assessment may reduce unnecessary biopsies, particularly in dense breasts (Figure 2). This strategy may lead to a small number of invasive cancers being missed, but will significantly decrease the false-positive rates, as well as decrease the overdiagnosis in the screening program. Previous studies [47][48][49] show that low-risk category ratings (BI-RADS 3/4A/4B) safely decreased the rate of biopsies for false-positive results. In addition, cancers originally classified as low-risk were found to be early-stage tumours when biopsies were conducted during a short follow-up or at a subsequent screening; this did not lead to clinically significant delays in breast cancer diagnosis. These findings suggest that the use of the RANZCR Grade 3 during DBT and ultrasound assessments in Australia should be reconsidered, as should views regarding short interval follow-ups. Reconsidering the use of RANZCR grade 3 may provide a baseline for identifying what thresholds will optimise cancer detection, while minimising unnecessary biopsies. This study is not without limitations. All radiologists had prior knowledge of the original mammographic findings when interpreting subsequent DBT and ultrasound images. This factor could influence the radiologists' decision-making, leading to the increased false-positive rates for both modalities. Furthermore, our data is from a single centre. Larger multi-centre studies are needed to verify and translate the results of our This study is not without limitations. All radiologists had prior knowledge of the original mammographic findings when interpreting subsequent DBT and ultrasound images. This factor could influence the radiologists' decision-making, leading to the increased false-positive rates for both modalities. Furthermore, our data is from a single centre. Larger multi-centre studies are needed to verify and translate the results of our study. Conversely, our data represents real-life clinical experience of using DBT and ultrasound for assessment of mammography recalled women and account for the independent double reading system practiced in Australia, which has been associated with increased breast cancer detection [50]. To our knowledge, this is the first study that has compared the diagnostic performance of DBT and ultrasound in women recalled due to digital mammographic findings in BreastScreen Australia. Therefore, our study provides baseline data to inform assessment modalities of women recalled for additional imaging due to mammography findings.

Conclusions
Digital breast tomosynthesis has higher sensitivity, but lower specificity and positive predictive value than ultrasound in women with dense breasts recalled for assessment. Both DBT and ultrasound demonstrate significant limitations in the assessments of calcifications, with DBT limited in correctly characterising benign calcifications and ultrasound underestimating the malignant potential of many malignant calcifications. Most lesions rated RANZCR Grade 3 on DBT and ultrasound assessments are benign and may benefit from short interval follow-up rather than biopsy. Therefore, optimising the assessment of calcifications and lesions rated RANZCR 3 as well as the thresholds for biopsy recommendations for these lesions may reduce unnecessary biopsies and improve the management of women with such lesions.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/diagnostics12061477/s1, Table S1: The agreement of RANZCR breast lesion classifications between DBT and DM spot views on 219 lesions.