Artificial Intelligence in Breast Ultrasound: From Diagnosis to Prognosis—A Rapid Review

Background: Ultrasound (US) is a fundamental diagnostic tool in breast imaging. However, US remains an operator-dependent examination. Research into and the application of artificial intelligence (AI) in breast US are increasing. The aim of this rapid review was to assess the current development of US-based artificial intelligence in the field of breast cancer. Methods: Two investigators with experience in medical research performed literature searching and data extraction on PubMed. The studies included in this rapid review evaluated the role of artificial intelligence concerning BC diagnosis, prognosis, molecular subtypes of breast cancer, axillary lymph node status, and the response to neoadjuvant chemotherapy. The mean values of sensitivity, specificity, and AUC were calculated for the main study categories with a meta-analytical approach. Results: A total of 58 main studies, all published after 2017, were included. Only 9/58 studies were prospective (15.5%); 13/58 studies (22.4%) used an ML approach. The vast majority (77.6%) used DL systems. Most studies were conducted for the diagnosis or classification of BC (55.1%). At present, all the included studies showed that AI has excellent performance in breast cancer diagnosis, prognosis, and treatment strategy. Conclusions: US-based AI has great potential and research value in the field of breast cancer diagnosis, treatment, and prognosis. More prospective and multicenter studies are needed to assess the potential impact of AI in breast ultrasound.


Introduction
Breast cancer (BC) is the most common malignancy in women and the second leading cause of cancer death, so the early diagnosis of BC remains crucial [1]. Mammography is the first-line examination for breast cancer screening. In several settings, another fundamental tool is represented by breast ultrasound (US). In particular, US is recommended as a first-line examination in young women, during pregnancy or breastfeeding, and as an additional examination in women with dense breasts after mammography [2]. US presents several advantages, including safety, low cost, rapid execution, and cost-effectiveness. However, US remains an operator-dependent examination. For these reasons, in recent years, there has been a strong development of artificial intelligence (AI) in breast ultrasound. AI systems involve three working steps: image processing, segmentation, and feature extraction. AI is used to provide predictive models based on the analysis of the features extracted from radiological data. The first step is represented by lesion detection and segmentation (unsupervised or supervised). Then, radiomic analysis is performed with biomarker extraction and analysis used to obtain information for diagnosis or prognosis, as will be explained later. AI systems process a large amount of iconographic data from different imaging modalities to obtain output information [3]. In this framework, we can divide these into two main broad approaches: Machine learning (ML)-a complex multistep process that uses texture analysis to extract quantitative information from radiological images to create prediction models and decision support tools.
Deep learning (DL)-represents an evolution of ML. The system is able to extrapolate inputs directly from images. These inputs come directly to a multilayer neural network. It does not require manual design features required by traditional methods but automatically learns features [4].
Currently, radiomics is a complex process that involves several steps, and generally, its application is not fully automated. Unsupervised machine learning does not need the training phase to work, but it is typically used just for classification purposes. Supervised machine learning has more general applications, for example, for regression and prediction. Deep learning is an extreme modification of machine learning. AI looks promising in diagnosing breast lesions, predicting molecular subtypes of breast cancer, evaluating axillary lymph node status, and evaluating the response to neoadjuvant chemotherapy, as demonstrated by several studies in the literature. The aim of this rapid review was to assess the current development and research status of US-based AI in the field of breast cancer.

Identification of Studies
Two investigators with experience in medical research performed literature searching and data extraction on PubMed. The two researchers have had extensive experience in medical research and systematic reviews (N.B. and A.T.), with specific expertise in breast cancer imaging, ultrasound imaging, and radiomics. Literature search on PubMed was conducted with the following search strategy: ((("diagnostic imaging"(MeSH Subheading) OR ("diagnostic"(All Fields) AND "imaging"(All Fields)) OR "diagnostic imaging"(All Fields) OR "ultrasound"(All Fields) OR "ultrasonography"(MeSH Terms) OR "ultrasonography"(All Fields) OR "ultrasonics"(MeSH Terms) OR "ultrasonics"(All Fields) OR "ultrasounds"(All Fields) OR "ultrasound s"(All Fields)) AND ("breast"(MeSH Terms) OR "breast"(All Fields) OR "breasts"(All Fields) OR "breast s"(All Fields)) AND ("artificial intelligence"(MeSH Terms) OR ("artificial"(All Fields) AND "intelligence"(All Fields)) OR "artificial intelligence"(All Fields)))) AND (2017:2023(pdat)).
Only publications after 2017 and in the English language were considered. Reports or case series, review articles, letters, comments, or studies with incomplete data were excluded. The studies included in this rapid review evaluated the role of artificial intelligence in breast ultrasound providing data on BC diagnosis, prognosis, or BC staging. In Figure 1, a schematic representation of a typical workflow of data extraction from US images is presented. After the segmentation is completed, a radiomic analysis is carried out, extracting quantitative features from the obtained volumes. The extracted data are processed by software for possible clinical use, such as diagnosis, reduction in biopsies number, or as prognostic tools. We only used data available in the published studies without contacting authors.

Studies Examination and Data Extraction
Two authors independently extracted the data from the eligible study. Fifty-eight main studies were identified, as reported in Figure 2. Discrepancies were resolved by discussion between two authors. From each study, we extracted the following data: first author, publication year, design of the study (retrospective or prospective, single center or multicenter), study population, aim of the study, imaging modality, AI modality, test data set, and training-validation dataset. Papers without solid statistical analysis were excluded from the study. Then, we performed a descriptive analysis of these studies.

Studies examination and data extraction
Two authors independently extracted the data from the eligible study. Fifty-eight main studies were identified, as reported in Figure 2. Discrepancies were resolved by discussion between two authors. From each study, we extracted the following data: first author, publication year, design of the study (retrospective or prospective, single center or multicenter), study population, aim of the study, imaging modality, AI modality, test data set, and training-validation dataset. Papers without solid statistical analysis were excluded from the study. Then, we performed a descriptive analysis of these studies.

Results
A total of 58 main studies, all published after 2017, were included. Only 9/58 studies were prospective (15.5%), and 13/58 studies (22.4%) used an ML approach. The vast majority (77.6%) used DL systems. Studies performed with the aim of evaluating the role of AI in the diagnosis of BC was 32/58 (55.1%), assessing lymph node status was 8/58 (13.8%),

Results
A total of 58 main studies, all published after 2017, were included. Only 9/58 studies were prospective (15.5%), and 13/58 studies (22.4%) used an ML approach. The vast majority (77.6%) used DL systems. Studies performed with the aim of evaluating the role of AI in the diagnosis of BC was 32/58 (55.1%), assessing lymph node status was 8/58 (13.8%), predicting the response to neoadjuvant chemotherapy (NAC) was 6/58 (10.3%), and predicting molecular subtypes of BC was 8/58 (13.8%). Only four studies investigated the role of AI in the upstage of ductal carcinoma in situ (DCIS) to invasive ductal carcinoma (IDC) and in the prediction of the BC prognosis. Only 8/58 studies (13.7%) also used shear wave imaging (SWE)/quantitative ultrasound (QUS) and color doppler flow imaging (CDFI) in addition to B-mode US. Tables 1-4 show all the included studies, their aims, and the performance of the AI algorithms.

Diagnosis of Breast Cancer
US represents a fundamental tool to evaluate and characterize breast masses after mammography. Interpreting breast ultrasound is challenging. According to BI-RADS [59], many features have to be analyzed to evaluate a breast nodule, including size, shape, margin, echogenicity, posterior acoustic features, and orientation. Diagnosis depends on the experience of radiologists, and thus, great interobserver variability may happen. The malignancy risk of BI-RADS-4 lesions covers a range from 2% to 95%. After breast US, 5-15% of patients are recalled, and 4-8% of these undergo biopsy with the problem of false positive examinations [2]. One of the advantages of the AI system is the possibility of detecting ultrasound features that cannot be distinguished by the human eye. Especially in BI-RADS 4 breast lesions, AI could help to reduce the rate of unnecessary biopsies. When AI is present in the clinical workflow of radiologists, it can be used as a "second opinion" in the assessment of suspicious breast nodules [60]. Several studies have assessed the value of an AI system in the decision-making process (downgrading and upgrading). According to Lyu et al. [8], an adjusted BI-RADS classification after using AI could lead to a decreased biopsy rate (from 100% to 67.29% in this study), with a great reduction in unnecessary biopsies. As also demonstrated by Wang et al. [11], after hypothetically downgrading with AI, 14 samples would have been avoided without cancer missed. Shen et al.'s paper [12], based on a large amount of data, showed how, helped by artificial intelligence, radiologists could decrease false positive rates by 37.3% and reduce requested biopsies by 27.8%, maintaining the same sensitivity. Gu et al. [27], in an important multicenter study with a large training cohort, internal test cohort, and external test cohorts, demonstrated that a DL system presented a similar performance with respect to the expert and had a significantly higher performance than that of inexperienced radiologists. AI could also have a fundamental role in reducing the US interpretation time, as demonstrated by Lai et al.'s study [34], in which it was reduced by approximately 40% due to AI system support.
Concluding, at present, many studies have shown that AI has excellent diagnostic performance in breast cancer diagnosis [61,62]. However, most studies are only singlecenter studies, and in most cases, there was no independent external test set.

Prediction of Molecular Subtypes of Breast Cancer
As known, there are four different molecular subtypes of breast cancer: luminal A, luminal B, HER-2 overexpression, and triple-negative breast cancer (TNBC). Each subtype presents different biological characteristics, imaging characteristics, prognosis, and treatment [63]. Molecular subtypes, in fact, can affect the response to NAC; HER2+ and TNBC cancers have a higher probability of responding well to preoperative therapy, but, on the other hand, patients with nonluminal disease have worse recurrence-free and disease-specific survival [64]. Therefore, it is essential to make a histological and molecular diagnosis before surgery. Currently, the gold standard is the microhistological (core or vacuum) biopsy. One of the main limitations of pathological biopsy is the possible underestimation of the sample because of the heterogeneity of the tumor, and for this reason, the biopsy may sometimes not be representative of the cancer [65]. In the literature, there are several studies based on the prediction of breast cancer molecular subtypes based on the AI US system. Ma et al. [42] developed an ML system to differentiate luminal/HER 2+/TNBC/versus other subtypes, ER-positive versus ER-negative, HER2-positive versus HER2-negative, and high Ki-67 versus low Ki-67 expression, proving that a machine learning model can assist radiologists to evaluate the molecular subtype of breast cancer.
Other studies, on the other hand, have focused only on the diagnosis of TN subtypes. Several retrospective studies showed, in fact, that TNBC was more likely to show benign features, such as an oval or round shape or smooth or circumscribed margins, and was less likely to have an echogenic halo [66]. All these characteristics may lead to a diagnostic delay, resulting in a poor outcome. Therefore, the AI-based diagnosis of TN breast cancer is crucial in clinical diagnosis and treatment. Every study included in this paper showed good diagnostic performance in the diagnosis of TNBC versus other histological subtypes. Zhou et al. [41] established a multimodal approach based on US, SWE, and CDFI to predict the subtype of breast cancer. This multimodal AI system performed better than US alone (AUC: 0.89-0.96 vs. 0.81-0.84) and, surprisingly, also performed better than the core needle biopsy (AUC: 0.89-0.99 vs. 0.67-0.82, p < 0.05). In clinical practice, a possible mismatch between the biopsy result and the AI algorithm could lead to a rebiopsy with the sample at another site. This model also achieved excellent results (AUC: 0.934-0.970) in differentiating TNBC and non-TNBC. AI has the potential to provide a noninvasive method of assessing tumor biology before surgical or medical treatment.

Prediction of Axillary Lymph Node Metastases in Breast Cancer
Identification of lymph node metastasis in breast cancer is crucial for correct diagnostic and therapeutic planning. US has a fundamental role in determining axillary lymph node (ALN) status. The prediction of preoperative lymph node metastases can provide valuable information for determining adjuvant therapy and making surgical plans (lymph node dissection) that are often associated with complications, such as lymphedema. Several studies have described a great deal of breast US characteristics associated with lymph node metastasis and, in addition, lymphatic invasion and the size of the breast cancer are associated with the presence of metastatic cells [67]. At US examination, lymph node metastasis is characterized by unclear margins, irregular shapes, and loss of fatty hilum, but lymph nodes with micrometastases are missed [68]. In addition, all these evaluations are based on expertise and experience, which are operator-dependent. In all the analyzed studies, AI algorithms improved metastatic lymph node detection compared with standard radiological evaluation. The studies mainly focused on predicting the presence or absence of ALN metastasis, but a maximum value of sensitivity below 0.7 could be considered promising but relatively low to enter into clinical practice soon. We acknowledge that differences in cut-off values and reference standards across the different studies involved could influence the results. In addition, we do not know to what extent the relatively low specificity of US in predicting LN metastasis could lead to overtreatment or undertreatment. Prospective, well-designed, and possibly multicentric studies are needed to clarify the role of US in LN metastasis prediction. Guo et al. [50] analyzed 3049 US images of 937 patients and developed a DL radiomics-based prediction model to assess the risk of metastasis of sentinel lymph nodes (SLNs) (AUC = 0.84, sensitivity = 98.4%) and nonsentinel lymph nodes (NSLNs) (AUC = 0.81, sensitivity = 98.4%). Zhou et al. [47] used three different AI algorithms, and all three performed better than the medical radiologist (the sensitivity and specificity of radiologists were 73% and 63%). Other studies also predicted the metastatic burden of ALNs. Zheng et al. [45] showed good results in predicting 1-2 (low metastatic burden) or ≥3 (heavy metastatic burden) ALN metastasis (AUC = 0.90). The prediction of lymph node metastasis by combining the US characteristics of the tumor, US characteristics of the lymph node, and artificial intelligence features could lead to a great diagnostic effect in clinical practice if diagnostic yield above standard radiological evaluation is at least equal to standard sentinel lymph node assessment.

Prediction Response to NAC in Breast Cancer
According to several guidelines [69,70], neoadjuvant chemotherapy (NAC) is the standard of care for patients with locally advanced breast cancer (LABC), biologically aggressive tumors, or nonsurgical patients. LABC is defined as a tumor greater than 5 cm and with skin and/or chest wall involvement or a tumor with many metastatic lymph nodes [71]. Pathological complete response (pCR) is associated with better clinical outcomes compared with nonresponder or partial responder patients. Combining US imaging technology with AI to extract quantitative information from US images can provide more objective information about the response to neoadjuvant chemotherapy. All the analyzed studies give good results in predicting the response to therapy. Jiang et al. [54] used AI to establish a pCR prediction model based on breast cancer US images before and after neoadjuvant chemotherapy in locally advanced breast cancer. The model had a good predictive value (AUC = 0.94). Gu et al. [55] used US images at different NAC time points to establish a NAC response (before, after the second course, and after the fourth course of NAC), with an AUC value that increased during the course of therapy (AUC of 0.812 after the second courses and an AUC of 0.937 after the fourth courses). Gu et al. [55] also identified 90.5% of nonresponder patients. Jiang et al. [54] also demonstrated that prediction within the hormone receptor-positive/human epidermal growth factor receptor 2 (HER2)-negative, HER2+, and triple-negative subgroups also achieved good discrimination performance, with an AUC of 0.90, 0.95, and 0.93. The prediction of the NAC response before starting treatment could be extremely useful for clinicians for risk stratification and targeting treatment, and in future, this could lead to precision medicine permitting therapy adjustments.

Upstage of DCIS in IDC
A correct histological diagnosis of BC before surgery is fundamental. In fact, sentinel lymph node biopsy is not usually performed in the case of partial mastectomy for DCIS [72]. This patient could undergo SLN biopsy if DCIS is upgraded postoperatively to invasive cancer. This could lead to a second surgery, with increased costs for the health system. One of the main limitations of microhistological biopsy is the possible underestimation of the sample because of the heterogeneity of the tumor [65]. For this reason, we sometimes observe an upgrade of DCIS to IDC. AI could also have an important role in this kind of prediction. Qian et al. [73] used a DL algorithm to predict whether simple DCIS diagnosed with core needle biopsy would be upgraded to invasive cancer after surgical excision. The proposed model achieved good sensitivity, specificity, and accuracy (0.733, 0.750, and 0.742).

Predicting BC Prognosis
Classifications of molecular subtypes have a key role in the management strategy of BC. Several studies investigated the role of AI in BC prognosis, especially for TNBC. TNBC had a worse overall survival because of its higher nuclear grade, larger tumor size, and more aggressive proliferative index. Extracting 460 radiomic characteristics, Wang et al. [74] established an ML model for predicting disease-free survival in TNBC. In addition, Yu et al. [75], in a large multicenter study, assessed that radiomics is a promising biomarker for risk stratification for TNBC patients

Conclusions
US-based AI has great potential and research value in the field of breast cancer diagnosis, treatment, and prognosis. AI is considered a "hot topic" in radiology, and it has the potential to carry us into the era of personalized medicine. AI looks promising in every field of study evaluated in this review. More prospective and multicentric studies are needed to assess the potential impact of AI in breast ultrasound and to understand how to insert AI into the clinical workflow of radiologists. Data Availability Statement: The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest:
The authors declare no conflict of interest.