Diagnostic Value of Different Risk-Stratification Algorithms in Solid Breast Lesions

In the past few years, elastography has gained ground as a complementary method to ultrasonography in noninvasive breast cancer screening. Despite positive outcomes, there is a further need to refine the method, especially regarding BIRADS scores 3 and 4A, where the distinction between benignancy and malignancy is established. The aim of the present study was to evaluate the best risk-stratification system using both qualitative and semiquantitative elastographic methods for solid breast nodules. A total of 1405 solid nodules, described in 657 female patients, were examined in our endocrine unit between January 2018 and December 2019. The inclusion criterion for our retrospective study was the presence of any solid breast mass in women of all ages (mean, 40.85 ± SD 27.11), detected during ultrasound examination using a HITACHI PREIRUS machine (Hitachi Medical Corporation, Tokyo, Japan). The Breast Imaging Reporting and Data System (BIRADS)–US criteria were used in the assessment of each nodule by conventional US (gray-scale mode) and Doppler evaluation. The Ueno score and strain ratio were also measured for all the described lesions. We considered multiple algorithms for the risk reassessment of solid breast nodules: classical BIRADS–US, EFSUMB BIRADS, worst-case scenario BIRADS and BIRADS TM. There were 93 malignant nodules out of 1405. The diagnosis was based on histopathological results for all the malignant lesions. Benign lesions were diagnosed based on histopathological results, Tru-Cut biopsy, mammography and MRI. The Sensitivity (Se), Specificity (Sp), Positive Predictive Value (PPV), Negative Predictive Value (NPV) and Accuracy (Acc) were obtained for all the proposed risk-stratification reporting systems: conventional BIRADS-US (Se, 74.23%; Sp, 63.95%; PPV, 13.53%; NPV, 97.79%; Acc, 65%); EFSUMB BIRADS (Se, 71.23%; Sp, 81.55%; PPV, 22.68%; NPV, 97.99%; Acc, 81%); worst-case scenario BIRADS (Se, 84.23%; Sp, 58.23%; PPV, 13.29%; NPV, 98.84%; Acc, 60%); BIRADS TM (Se, 81.23%; Sp, 75.84%; PPV, 20.35%; NPV, 98.81%; Acc, 77%). We found that the most efficient risk-stratification reporting system was the proposed one, BIRADS TM, which considers both upgrading and downgrading the conventional BIRADS-US, followed by the worst-case scenario BIRADS and EFSUMB BIRADS.


Introduction
Breast cancer among women, although one of the most common cancers, can have a good prognosis if detected and treated promptly [1]. For these reasons, it is necessary to continuously improve and adjust diagnostic methods in order to increase patient receptivity. Attention must be focused on improving noninvasive diagnostic methods in order to minimize unnecessary surgery.
Mammography screening was proven to detect cancer at an early stage with an estimated relative reduction of mortality of 15% in women aged 50 and over [2]. Although effective, mammography has its limitations [3]. Several factors such as dense breasts and patient age under 40 limit its applicability [4][5][6].
Breast ultrasonography as a complementary diagnostic method allows circumventing the age and breast density limitations of mammography, as well as being able to differentiate solid from cystic lesions and evaluate palpable breast lesions not visible on mammography and being suitable for pregnant patients [7][8][9][10].
Subsequently, by introducing high-frequency transducers, ultrasonography allows the differentiation of malignant from benign breast masses [8,11,12], leading to it being a candidate for a first-choice diagnostic method in breast cancer screening [13]. As a simple, noninvasive, nonirradiating and painless technique, it can be easily performed routinely in order to solve the patient addressability issue.
Real-time elastography is a recent imaging technique used complementarily to conventional US, resulting in increased diagnostic accuracy [8,14,15].
Sonoelastography screening for breast cancer is a valuable diagnostic tool, leading to a decrease in unnecessary biopsies [16,17]. Recent studies show a 9.4% unnecessary biopsy rate if breast sonoelastography is implemented as the screening method [16]. Despite this, breast cancer screening through ultrasonography can still be improved to reduce the number of unneeded biopsies [18][19][20], which is required in order to avoid causing the patients additional discomfort and emotional distress [21,22].
The American College of Radiology (ACR) developed a common and reproducible way of breast ultrasound interpretation. Breast lesions are evaluated based on well-defined criteria including the shape, boundary, echo pattern, orientation, margin and posterior acoustic features and classified into BIRADS (Breast Imaging Reporting and Data System) categories 1 to 5 [23].
Current recommendations by the EFSUMB and WFUMB guidelines are to downgrade the BIRADS scores 4A and below in cases with low stiffness but not in cases with a BIRADS score of 4B or higher [24].
One contribution of our study was the introduction of two new risk-stratification reporting systems (BIRADS TM and BIRADS worst case) combining both qualitative and semiquantitative elastography methods with conventional breast ultrasound, allowing both downgrading and upgrading the conventional BIRADS US score, a novel approach not yet identified in the current literature.
Another contribution with regards to the current available literature on the subject was the evaluation of a significantly higher number of solid breast nodules. Comparable studies have analyzed breast nodule sample sizes ranging from 99 to 200 [18,19,25]. We analyzed 1405 solid nodules, of which 93 were later confirmed to be malignant.
The current study is a retrospective study, evaluating the best risk-stratification reporting system using any combination of conventional US, doppler evaluation and strain elastography (both qualitative and semiquantitative techniques) through multiple algorithms for the risk reassessment of solid breast nodules in order to identify the optimal threshold for malignancy.

Solid Breast Lesions
The retrospective study included 1405 solid nodules examined in our endocrine unit between January 2018 and December 2019.
The inclusion criterion for our study was the presence of any solid breast mass in women of all ages, detected during ultrasound examination.
The exclusion criteria were previous breast surgery for malignant lesions, radiation therapy, a completely cystic lesion and subcutaneous lipomas.
The standard criterion for the presence or absence of breast cancer was a pathologic result from fine-needle cytology, core-needle biopsy, excision biopsy or a combination of mammography and breast MRI.
The study was performed in accordance with the ethical guidelines of the Helsinki Declaration and was approved by the Ethics Committee of our Center. Written informed consent was obtained from all patients in order to use the pathology results and ultrasound images for a post hoc analysis.

Classical Ultrasound
Bilateral High-Resolution B-mode grey scale ultrasonography was performed on all cases, using a HITACHI PREIRUS machine (Hitachi Medical Corporation, Tokyo, Japan) with Doppler and elastography software. A special breast probe, EUP-L53L, 920 mm wide, with a water bag device, was used for gray-scale evaluation, with the patient resting in the supine position and both hands bent under the head, and ductal breast evaluation was performed according to Amy's technique [26]. The aforementioned is a radial technique, examining the breast with the nipple positioned in the left upper corner and the peripheral lobar structure in the right part of the screen, visualizing all recommended layers on the screen starting from the skin (upper layer) to the rib structures and pleura (lower layers).
A complete US evaluation was performed for each solid lesion individually including conventional gray-scale, color Doppler scanning and strain elastography performed in the same session.
The malignancy criteria according to the American College of Radiology [27] were considered as follows: marked hypoechogenic lesion; inhomogeneity; ill-defined borders; "taller than wide" shape; spiculate, poly-lobular or rough margins; posterior acoustic shadowing; intralesional microcalcifications; tortuous internal vascularity; proliferative infiltration of the afferent duct; and altered lymph nodes: round, with no evidence of hilum, eccentric hilum or cortical edema, with a cortical width bigger than 3 mm.
The Breast Imaging Reporting and Data System (BIRADS)-US criteria [24] were used in the evaluation of each nodule in conventional US-mode gray-scale and Doppler evaluation, resulting in a BIRADS score for each nodule.

Elastography
Real-time elastography was performed after the conventional ultrasound evaluation, in the same examination session, using a HITACHI PREIRUS machine (Hitachi Medical Corporation, Tokyo, Japan) equipped with real-time elastography software, using a 5-18 MHz linear multifrequency probe positioned perpendicular to the skin during compression.
The qualitative SE (standard blue-red-green color map) was determined for each nodule. A Ueno score of 1 to 5 was assigned for each nodule based on the color balance of green and blue inside the tumor and the surrounding area. At least two measurements were performed for each solid nodule, in radial and anti-radial sections. All ultrasounds were performed by the same operator prior to any surgery, biopsy or fine-needle aspiration.
The semiquantitative technique used was the Fat-to-Lesion Ratio (FLR) automatically computed by the machine. After placing the Region of Interest (ROI) around the entire solid lesion, the machine automatically generates a second ROI placed in the surrounding pre-glandular subcutaneous fat, resulting in a stiffness ratio called the FLR score being attributed to each nodule. A FLR ratio higher than 4.5 was considered indicative of malignancy [19].

Risk Stratification
Risk stratification was quantified using the following BIRADS scores for each nodule: the BIRADS US, defined according to ACR guidelines; BIRADS EFSUMB, according to EFSUMB guidelines; and the two new proposed risk-stratification reporting systems BIRADS TM and BIRADS worst case.
The BIRADS US score, based solely on ultrasound evaluation results, was calculated separately per nodule on a five-point grading scale, increasing with the likelihood of malignancy. Furthermore, a BIRADS score of 4 can be divided into 4A, 4B or 4C if 1, 2 or more than 3 malignancy US markers were observed. For each nodule, we assigned the appropriate BIRADS score as follows: 1 indicates a normal result; 2, a benign lesion; 3, a solid hypoechoic lesion without any US malignancy marker; 4, a lesion with one or more US malignancy markers; 5, a malignant lesion; 0, an indeterminate result.
We defined and integrated the BIRADS EFSUMB score (US-ELASTO BIRADS) [19], where stiffness was considered an extra risk factor according to the current recommendations of the EFSUMB guidelines [36]. The conventional downgrade of low stiffness was performed for BIRADS 4A and below, with cases 4B, 4C and 5 not downgraded.
Our proposed algorithm "BIRADS TM" reassessed classical ultrasound BIRADS scores, both upgrading and downgrading based on elastographic results. The scores were upgraded irrespective of the initial result if high stiffness was found, but only scores equal to or lower than 4B were downgraded in the case of low stiffness.
Finally, a "worst-case scenario" (BIRADS worst case) was also analyzed, considering the highest risk category defined by any of the previous evaluations, allowing only a risk upgrade strategy and no downgrade regardless of any additional information.

Statistical Analysis
Microsoft Excel was used for experimental data handling, and all solid nodules were centralized in a table. For each diagnosis criterion, the sensitivity, specificity, Positive Predictive Value (PPV), Negative Predictive Value (NPV) and accuracy were calculated; afterwards, the Receiver Operating Curve (ROC) was drawn for each of the used diagnosis criteria in order to visualize the differences in diagnostic value. Lastly, the Area Under the Receiver Operating Curve (AU) was computed as a differentiating performance index for all the considered risk-stratification reporting systems.

Results
This retrospective study included 1405 solid nodules described in 657 female patients, aged between 16 and 78 years old (mean age, 40.85 ± SD 27.11), examined in our Endocrine Unit between January 2018 and December 2019. Of the 1405 nodules, 93 were found to be malignant (three cases of Carcinomas in Situ-CIS), detected in 63 patients, aged between 16 and 78 (mean age, 47 ± SD 13.61). The following anamnestic data were also documented but were not considered in the risk-stratification analysis. Only the sonoelastographic parameters were considered. For the patients with malignancies, they were as follows: the body mass index (BMI) (mean, 25 ± SD 4.40; median, 25), menarche (mean, 14 ± SD 1.20; median, 14), climax age (mean, 47 ± SD 5.00; median, 49), number of births (mean, 1 ± SD 0.80; median, 1); breastfeeding (mean, 9 months ± SD 9.48; median, 5), the use of oral contraceptives (17 out of 63 patients, with mean, 3 years ± SD 3.63; median, 2); breast cancer heredity (9 out of 63 patients), and smokers (23 out of 63 patients). None of them had ovarian cancer, and 2 out of 63 underwent irradiation therapy for other malignancies.
The addressability for US evaluation was the result of routine screening, breast tenderness, the palpation of a breast lump, skin changes, nipple retraction, nipple discharge or previous breast trauma.
The malignancy results for the 1405 solid nodules assessed by histopathological results, Tru-Cut biopsy, mammography or MRI are shown in the Table 1 below.

Conventional Gray-Scale US
In order to define the BIRADS-US score for each nodule, the ultrasound malignancy markers according to the ACR [27] were analyzed. The results can be seen in the following table (Table 2), where for each malignancy marker, the total number of occurrences is stated. Next, the absolute and relative values for malignancy, as checked against later investigations by histopathological results or a combination of mammography and MRI, were obtained. The diagnostic power was computed for each conventional US characteristic for identifying breast cancer. The analysis results are embodied by the following measures of diagnostic accuracy-Sensitivity (Se), Specificity (Sp), Positive Predictive Value (PPV), Negative Predictive Value (NPV) and Accuracy (ACC)-all expressed as percentages (Table 3). Although not explicitly used later in our study, the diagnostic power of the considered malignancy markers is relevant, as the BIRADS-US score assessment is based on their occurrence as described by the WFUMB guidelines [24]. Based on the malignancy marker occurrence shown in Table 2 and the WFUMB guidelines [24], for each nodule, a BIRADS score was attributed. In the following table (Table 4), the total number of occurrences for each of the four considered BIRADS possible scores is stated. Of the possible BIRADS scores of one through five, BIRADS scores of one and two were attributed to normal breast and cystic lesions and were excluded from the study. After the assessment, the results were checked against later investigations by histopathological results or a combination of mammography and MRI. The numbers of malignant and benign confirmed nodules are shown for each BIRADS score.

Elastography Results
Following qualitative real-time elastography, each nodule was automatically given a Ueno score of one through five. Ueno 1 scores were only found in cystic lesions, which were excluded from the study. In the following table (Table 5), the total occurrence for each Ueno score is shown as well as the number of malignant and benign nodules as confirmed by later investigations with histopathological results or a combination of mammography and MRI. Lastly, semiquantitative real-time elastography was performed, and Fat-to-Lesion Ratio (FLR) values were automatically given for each nodule. The FLR values were split into three groups by likelihood of malignancy, as described earlier in Section 2.2 and can be seen in the following table (Table 6). Additionally, the total occurrence per group is shown alongside malignant and benign occurrences as confirmed by later investigations using histopathological results or a combination of mammography and MRI.

Diagnostic Value of All Analyzed Methods
After attributing all nodules a BIRADS score according to each proposed risk-stratification system described in Section 2.3 (BIRADS US, EFSUMB BIRADS, BIRADS TM and worst-case scenario BIRADS), the diagnostic power was computed for each risk-stratification system alongside the ones for purely elastographic scores. The following measures of diagnostic accuracy are shown ( Table 7): Sensitivity (Se), Specificity (Sp), Positive Predictive Value (PPV), Negative Predictive Value (NPV) and Accuracy (ACC), all expressed as percentages.

Receiver Operating Characteristic Curve
In order to graphically show the diagnostic value of each proposed risk-stratification system and remove the influence of the cut-off values, a Receiver Operating Characteristic Curve (ROC curve) was drawn for each proposal. In the following figure (Figure 1), all the ROC curves are plotted alongside the ROC curve for a "chance" diagnostic.
After plotting the ROC curves, the area under each curve was calculated using the trapezoidal rule. The resulting areas with values between 0.5 for chance and 1 for a theoretically perfect differentiating method are shown in the following table (Table 8). This allows for a quick, one-parameter comparison between the diagnostic powers of the different proposed risk-stratification systems.  After plotting the ROC curves, the area under each curve was calculated using the trapezoidal rule. The resulting areas with values between 0.5 for chance and 1 for a theoretically perfect differentiating method are shown in the following table (Table 8). This allows for a quick, oneparameter comparison between the diagnostic powers of the different proposed risk-stratification systems.

Discussion
The main goal of the present study was to evaluate the best risk-stratification reporting system for sonoelastographic breast examination, in order to improve its diagnostic value, considering multiple algorithms of risk reassessment for 1405 solid breast nodules.
The best outcomes for ultrasonography in the evaluation of solid breast nodules was proven to be the addition of elastography [14,20,34,37]. It has been previously shown that elastography can improve the accuracy of classical ultrasonography by up to 26% [14].

Discussion
The main goal of the present study was to evaluate the best risk-stratification reporting system for sonoelastographic breast examination, in order to improve its diagnostic value, considering multiple algorithms of risk reassessment for 1405 solid breast nodules.
The best outcomes for ultrasonography in the evaluation of solid breast nodules was proven to be the addition of elastography [14,20,34,37]. It has been previously shown that elastography can improve the accuracy of classical ultrasonography by up to 26% [14].
There is still the need for improvement, especially in the BIRADS categories 3 and 4A, in order to properly differentiate malignant from benign lesions. BIRADS 4A lesions may be the group where elastography should play a decisive role when deciding between immediate biopsy and only regular follow-up [18,38,39].
In our study, both semiquantitative and qualitative methods proved to be significantly accurate (Acc, 94%) as can be seen in Table 7, with Se values of 79.56% and 55.15%, respectively, and Sp values of 95.20% and 95.56%; the results are found to be comparable to other studies where Se = 88%, Sp = 83% and Acc = 85% for the fat-to-lesion ratio, and Se = 86.5%, Sp = 89.8% and Acc = 88.3% for the elastography score values are described [14,18].
The optimal FLR value from the data set (obtained by maximizing the sensitivity to false-positive differences) was 4.34 and can be seen in the following table (Table 9). This is comparable with the threshold of 4.88 from a previous study carried out in our endocrine unit [19]. Other studies found optimal FLR values from 2.24 to 5.6 [24,25,31,39]. Although the initially considered threshold of FLR >= 4.5 is different from the resulting optimal value from the dataset, the diagnostic metrics shown in the table above are nearly identical. Both can be used with the same degree of confidence.
Our study showed ( Table 7) that the sensitivity, specificity and accuracy of classical ultrasound for the diagnosis of malignant lesions, using a BIRADS cut-off value of 4A, were 74.23%, 63.95% and 65%, respectively, and its positive and negative predictive values were 13.53% and 97.79%, respectively. Due to the low PPV, it cannot be used alone to rule out malignancy given the probability of false-negative results. As can be seen in Table 4, there were 19 malignancies out of 93 in BIRADS category 3 when assessing only by classical US alone.
The BIRADS score downgraded according to EFSUMB proved to have better diagnostic quality compared to classical BIRADS, showing a 71.23% sensitivity, an 81.55% specificity and an accuracy of 81%. The PPV and NPV were 22.68% and 97.99%, respectively. While there was no significant difference in sensitivity after combining conventional US with elastography, the accuracy increased by 16% and the PPV was increased by 9.15%.
The newly proposed risk-reassessment BIRADS TM showed an 81.23% sensitivity, 75.84% specificity, 77% accuracy, 20.35% PPV and 98.81% NPV. It has similar diagnostic value to EFSUMB BIRADS but with a 10% higher sensitivity. It appears that considering both upgrading and downgrading the conventional BIRADS is more advisable than downgrading alone.
We also considered the worst-case scenario BIRADS score. The sensitivity, specificity and accuracy values for this assumption were 84.23%, 58.23% and 60%, respectively, while the PPV and NPV values were 13.29% and 98.84%, giving this assumption the best sensitivity but the poorest PPV of all the considered risk-stratification reporting systems.
The "worst-case scenario" is the most conservative approach, with the highest sensitivity but also with the lowest specificity. Using this approach may lead to an increase in false-positive malignancy results and subsequent biopsy recommendations, which would be against the goal of this study with regards to lowering unnecessary biopsy rates.
The Receiver Operating Curve was used to evaluate the diagnostic value for each of the proposed diagnostic methods. Furthermore, the area under the ROC curves was calculated for use as a one-dimensional measure of diagnostic value.
The ROC curves of the fat-to-lesion ratios and Ueno elastography scores showed much better AUC values of 0.90 and 0.88, respectively (a theoretically perfect diagnostic having AUC = 1), than the curve for classical ultrasound BIRADS (0.77). Other studies also showed that the values are slightly better for FLR in comparison with Ueno scores (0.85 and 0.79, respectively) [15].
For the different risk-reassessment algorithms, the AUC values were 0.80 for EFSUMB BIRADS and 0.83 for worst-case scenario BIRADS, and the proposed BIRADS TM showed the best result of 0.86 (as can be seen in Table 8).
In our study, elastography improved the AUC value of breast cancer ultrasound screening, starting from 0.77 for classical ultrasound and improving to 0.86 when adjusting the classical ultrasonography BIRADS score by upgrading or downgrading based on both qualitative and semiquantitative elastographic results ("BIRADS TM").

Conclusions
By including elastography as a parameter of the BIRADS score and considering both upgrading and downgrading the conventional BIRADS as opposed to only downgrading according to EFSUMB recommendations, a more efficient risk-stratification reporting system was established in the form of BIRADS TM. The proposed BIRADS TM risk-stratification system, while having slightly lower accuracy compared to BIRADS EFSUMB, yields better diagnostic value, as shown by higher AUROC values. Further development and standardization of the method may improve the evaluation of high-risk nodules.
The second proposed risk-stratification reporting system in the form of the worst-case BIRADS shows the lowest accuracy, specificity and positive predictive values but, on the other hand, yields the highest sensitivity and negative predictive values of all the analyzed risk-stratification methods. This shows it is the most conservative approach, leading to the highest number of assumed malignant lesions and, in turn, to an increase in biopsy rates, which would be against our goal of reducing unnecessary invasive diagnostic method usage. For these reasons, we do not consider it advisable.
The strong points of the study are the evaluation of a significant number of solid breast nodules and the assessment of multiple risk-stratification reporting systems combining classical ultrasound and elastography (both qualitative and semiquantitative methods), two being novel proposals and one (BIRADS TM) being shown to yield potentially better diagnostic value than the current EFSUMB recommendations.
The limitations of the study are represented by the lack of histopathological results for all the evaluated nodules (only 93 of the malignancies were found as a result of histopathological investigations). The other nodules were diagnosed based on a combination of mammography, breast MRI and Tru-Cut biopsy.