Quantitative Measurements of Breast Density Using Magnetic Resonance Imaging: A Systematic Review and Meta-Analysis

Breast density, a measure of dense fibroglandular tissue relative to non-dense fatty tissue, is confirmed as an independent risk factor of breast cancer. Although there has been an increasing interest in the quantitative assessment of breast density, no research has investigated the optimal technical approach of breast MRI in this aspect. Therefore, we performed a systematic review and meta-analysis to analyze the current studies on quantitative assessment of breast density using MRI and to determine the most appropriate technical/operational protocol. Databases (PubMed, EMBASE, ScienceDirect, and Web of Science) were searched systematically for eligible studies. Single arm meta-analysis was conducted to determine quantitative values of MRI in breast density assessments. Combined means with their 95% confidence interval (CI) were calculated using a fixed-effect model. In addition, subgroup meta-analyses were performed with stratification by breast density segmentation/measurement method. Furthermore, alternative groupings based on statistical similarities were identified via a cluster analysis employing study means and standard deviations in a Nearest Neighbor/Single Linkage. A total of 38 studies matched the inclusion criteria for this systematic review. Twenty-one of these studies were judged to be eligible for meta-analysis. The results indicated, generally, high levels of heterogeneity between study means within groups and high levels of heterogeneity between study variances within groups. The studies in two main clusters identified by the cluster analysis were also subjected to meta-analyses. The review confirmed high levels of heterogeneity within the breast density studies, considered to be due mainly to the applications of MR breast-imaging protocols and the use of breast density segmentation/measurement methods. Further research should be performed to determine the most appropriate protocol and method for quantifying breast density using MRI.


Introduction
Breast density, a measure of dense fibroglandular tissue relative to non-dense fatty tissue, is an independent risk factor for breast cancer [1][2][3]. Consistent with this risk relationship, women who have dense breasts have a likelihood of developing breast cancer that is fourfold higher than those with fatty breasts [4,5]. Most of the information regarding breast density has been acquired with two-dimensional imaging, which is mammography. However, the evaluation of breast density based on mammograms is limited due to the overlapping of tissues, variations in breast compression, and inappropriate positioning that lead to artefacts (skin folder) and inclusion of insufficient breast tissue [6,7]. These factors could affect mammography's performance for precise, reliable measurements of small changes in breast density over brief timespans [8,9].
Magnetic resonance imaging (MRI), an alternative imaging modality in breast imaging can estimate the actual breast density value because it provides a three-dimensional volume assessment of breast tissue, with excellent contrast resolution in the differentiation between fibroglandular and fatty tissues [10][11][12][13]. Conventionally, breast density is assessed qualitatively using the American College of Radiology (ACR) Breast Imaging-Reporting and Data System (BI-RADS) atlas, which is a classification system commonly used for mammography, according to which density has four categories based on the amount of fibroglandular tissue: "(1) almost entirely fat, (2) scattered fibroglandular tissue, (3) heterogeneous fibroglandular dense and (4) extreme fibroglandular tissue" [14,15]. The interpretations of these four categories are also applied for MRI. Despite its long clinical success, the BI-RADS scoring atlas is subjective and varies between readers, even within the same reader [16]. To overcome a subjective assessment of breast density and to reduce inter-and intra-reader variability, different methods for quantitative breast density have been proposed, with a range of algorithms or methods reported in the literature [17][18][19][20][21][22]. Each of these methods were shown to have advantages and limitations through the use of semi-automatic thresholding and segmentation approaches for quantitative assessment of breast density.
There is no doubt that MRI is one of the most useful modalities for breast imaging and that the analysis of breast density in quantitative synthesis is a well-established approach. In spite of the fact that extensive research has been carried out on breast density measurements, no consensus has been reached about the optimal approach to quantify breast density using MRI. Therefore, the purpose of this review is to analyze the current methods for the quantitative assessment of breast density using MRI over the past decade of publications. Due to the expected heterogeneity of MRI scanning protocols, both systematic review and meta-analysis were performed to analyze the available studies.

Materials and Methods
This systematic review and meta-analysis were performed according to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) criteria [23,24]. No ethics committee approval was required.

Search Strategy and Eligibility Criteria
A systematic literature review was conducted of studies that analyzed breast density in a quantitative pattern using MRI. Briefly, a search for studies published between 1 January 2009 and 31 December 2018 was conducted in different databases: PubMed (MEDLINE, U.S. National Library of Medicine and National Institutes of Health, Bethesda, MD, USA), EMBASE (Elsevier, Amsterdam, The Netherlands), ScienceDirect (Elsevier, Amsterdam, The Netherlands), and Web of Science (Clarivate Analytics, Philadelphia, PA, USA) using the search terms detailed below.
Systematic search expressions were employed using MeSH (medical subject headings) in PubMed and the thesaurus in EMBASE, ScienceDirect, and Web of Science. A search structure was based on combining three main terms as follows: "breast density," "quantitative analysis," and "MRI." The exact search expressions were "Breast Density" (MeSH term) OR "fibroglandular tissue" (Text word) OR "breast densit*" (Text word) OR "FGT" (Text word) OR "FT" (Text word) OR "fibroglandular densit*" (Text word) AND "Quantitative analysis" (Subject heading) AND "Magnetic Resonance Imaging" (MeSH term) OR "nuclear magnetic resonance imaging" (Text word) OR "MRI" (Text word) OR "magnetic resonance imaging" (Text word). The criteria for selecting the studies for eligibility were based on their title, abstract, and subsequently the full text, this was performed independently by two reviewers (R.S. and Z.S.). Studies addressing the quantitative analysis of breast density using MRI were considered eligible for inclusion and also studies on human subjects since 2009 had to be published in peer-reviewed journals and written in English. For study inclusion, the subjects must have undergone breast MRI studies and the breast density measurement method is known. Eligible studies were retrieved, and full manuscripts were read. No restricted conditions have been applied in terms of study characteristics, the purpose of study, and the results. Publications were only included in the analysis if the measurement of breast density had been performed in a quantitative manner regardless of the MRI technique or breast density segmentation/measurement method.

Data Extraction
On completing the eligibility screening, the process of data extraction from the included studies was carried out manually by the same two reviewers. Descriptive data were extracted for all variables as follows: the first author's surname; year of publication; journal of publication; study type; total number of participants/patients; mean age; age range of participants/patients; MRI technique (pulse sequence/breast-imaging protocol and static magnetic field strength); and breast segmentation/measurement method. For each study analyzed, estimates of breast volume, fibroglandular-tissue volume and percentage breast density were recorded using descriptive statistics, arithmetic means and standard deviations, whenever appropriate. Due to the heterogeneous nature of this analysis, some of the included studies produced their results in a median and interquartile range (IQR). Accordingly, the researchers decided to stratify results and excluded them from the meta-analysis only.

Data Synthesis
The combinations of MRI techniques and the applied breast segmentation/measurement methods encountered in the studies were considered to be technologically heterogeneous. To address this issue and acquire more reasonable estimates, the analyses were stratified by breast segmentation method into three discrete groups (fuzzy c-mean clustering (FCM), FCM and nonparametric nonuniformity normalization (N3), and signal intensity thresholding). In each sub meta-analysis, the number of the included studies were selected on the basis of a degree of homogeneity of their breast density segmentation/measurement results.

Statistical Analysis
The measurement of breast density as ascertained by MRI using semi-or fully-automated segmentation method was assessed. The primary outcome was the percentage breast density (%BD). Data input for each study within a group consisted of the study size (N), the 'raw' study mean (i.e., with no re-scaling or standardization), and the study standard deviation. The data was analyzed by the "metamean" function in the "meta" package in the R system, Version 3.4.1 (http://www.r-project.org/). This facilitates the meta-analysis of a single arm trial, as opposed to the traditional two arm trial with a control group and a treatment group, equivalent to a one-way analysis of variance. A forest plot was generated, displaying the individual study (%BD) means with 95% confidence interval (CI) limits, inverse variance study weights, and the pooled mean and confidence limits. Heterogeneity of study means was assessed using Cochran's Q-test, and heterogeneity of study variances was assessed with Bartlett's test. A conclusion to pool studies requires both heterogeneity tests to be non-significant at the 5% level.
As an alternative to grouping the studies on a technological basis, a cluster analysis was run to investigate any similarities between studies with respect to two attributes, namely study mean and study standard deviation. The International Business Machines Statistical Package for the Social Sciences (IBM SPSS) Statistics software Version 25.0 was used for cluster analysis. The procedure provides for a wide selection of combinations of distance measures and clustering methods, but for the current application, the simplest of these was chosen, namely Euclidean distance and nearest neighbor agglomeration. This algorithm calculates a proximity matrix of distances between all possible pairs of studies and allocates the closest pair into a cluster, then examines the remaining clusters to identify which is the next nearest or whether there is a pair that are closer to one another, and so on. The complete search yielded 941 studies. After removing duplicates (n = 70), 871 were screened, based on their titles, which resulted in 765 being excluded, followed by 27 of the remaining studies being excluded on the basis of their abstracts. Of the remaining 79 studies, the full manuscript was retrieved and reviewed. Forty-one studies did not meet the selection inclusion criteria: no adequate breast density data (n = 20), qualitative analysis (n = 12), editorials (n = 4), conference abstracts (n = 3), post-mortem study (n = 1), and phantom study (n = 1). Finally, 38 studies attained the inclusion criteria [1][2][3]5,11, and were included in the analysis as shown in Table 1. possible pairs of studies and allocates the closest pair into a cluster, then examines the remaining clusters to identify which is the next nearest or whether there is a pair that are closer to one another, and so on. Figure 1 presents an overview of the systematic search of the literature through different databases. The complete search yielded 941 studies. After removing duplicates (n = 70), 871 were screened, based on their titles, which resulted in 765 being excluded, followed by 27 of the remaining studies being excluded on the basis of their abstracts. Of the remaining 79 studies, the full manuscript was retrieved and reviewed. Forty-one studies did not meet the selection inclusion criteria: no adequate breast density data (n = 20), qualitative analysis (n = 12), editorials (n = 4), conference abstracts (n = 3), post-mortem study (n = 1), and phantom study (n = 1). Finally, 38 studies attained the inclusion criteria [1][2][3]5,11, and were included in the analysis as shown in Table 1.   Table 1 demonstrates some of the main characteristics of the 38 included studies, while Figure 2 shows details of the study design and MRI system used in these studies. Several MRI sequences were used to enable the precise differentiation between adipose and fibroglandular tissues; of these, non-contrast-enhanced T1-weighted was widely used either with 2D spin echo or 3D gradient echo. In fact, 16 studies (41.03%) used non-contrast-enhanced T1-weighted [1,2,[26][27][28][29]31,33,35,44,45,[48][49][50][51]53], while in 12 studies (30.77%) non-contrast-enhanced images were integrated with contrast-enhanced images [25,[36][37][38][39][40][41][42][43][44]47,49]. In terms of breast density segmentation/measurement, the majority of the studies (20 studies; 51.28%) used FCM clustering algorithm [1,2,11,[25][26][27][28][29][31][32][33][34][35][36][37][38][39][40][41][42], while 7 studies (17.95%) used FCM and N3 algorithm [45][46][47][48][49][50][51], 4 studies (10.26%) interactive thresholding algorithm [3,5,52,53], 4 studies (10.26%) in-house customized software [29,[53][54][55], one study (2.56%) manual software [57]; however, two studies did not provide the information [43,44].

Subgroup Analyses
The final inclusion consisted of a total of twenty-one studies in the meta-analyses; the forest plots and pooled results are shown in Figure 3.

Subgroup Analyses
The final inclusion consisted of a total of twenty-one studies in the meta-analyses; the forest plots and pooled results are shown in Figure 3.  The FCM subgroup consisted of 13 studies, of which, 10 studies reported breast density as a percentage breast density (% BD) [1,11,[25][26][27][28][29][30][31][32], whereas 3 studies as a percentage of the dense breast volume (% DBV) [36][37][38]. On one hand, 10 studies with inclusion of 634 patients were included, as Figure 3A shows, there is a wide range of mean values as well as standard deviation (SDs) from those studies, which indicated enormous heterogeneity among study means (Cochran's Q test: X 2 = 86.93, P < 0.0001). Indeed, there is a substantial heterogeneity among study variances (Bartlett's test: X 2 = 110.59, P < 0.0001). On the other hand, three studies with inclusion of 528 patients were analyzed. Figure 3B shows there is a high level of homogeneity among study means (Cochran's Q test: X 2 = 0.13, P = 0.94), and a high level of homogeneity among study variances (Bartlett's test: X 2 = 0.12, P = 0.94), which would be expected as those studies used an identical combination of MR technique and breast density segmentation/measurement approach.

FCM and Nonparametric Nonuniformity Normalization (N3)
The FCM and N3 subgroup included 4 studies with inclusion of 126 patients [45,[48][49][50], as Figure  3C shows, there is a wide range of mean values as well as SDs from those studies, which indicated tremendous heterogeneity among study means (Cochran's Q test: X 2 = 99.94, P < 0.0001). Indeed, there is a substantial heterogeneity among study variances (Bartlett's test: X 2 = 45.41, P < 0.0001), which would be expected as those studies used different MR breast-imaging protocols. The FCM subgroup consisted of 13 studies, of which, 10 studies reported breast density as a percentage breast density (% BD) [1,11,[25][26][27][28][29][30][31][32], whereas 3 studies as a percentage of the dense breast volume (% DBV) [36][37][38]. On one hand, 10 studies with inclusion of 634 patients were included, as Figure 3A shows, there is a wide range of mean values as well as standard deviation (SDs) from those studies, which indicated enormous heterogeneity among study means (Cochran's Q test: X 2 = 86.93, P < 0.0001). Indeed, there is a substantial heterogeneity among study variances (Bartlett's test: X 2 = 110.59, P < 0.0001). On the other hand, three studies with inclusion of 528 patients were analyzed. Figure 3B shows there is a high level of homogeneity among study means (Cochran's Q test: X 2 = 0.13, P = 0.94), and a high level of homogeneity among study variances (Bartlett's test: X 2 = 0.12, P = 0.94), which would be expected as those studies used an identical combination of MR technique and breast density segmentation/measurement approach.

FCM and Nonparametric Nonuniformity Normalization (N3)
The FCM and N3 subgroup included 4 studies with inclusion of 126 patients [45,[48][49][50], as Figure 3C shows, there is a wide range of mean values as well as SDs from those studies, which indicated tremendous heterogeneity among study means (Cochran's Q test: X 2 = 99.94, P < 0.0001). Indeed, there is a substantial heterogeneity among study variances (Bartlett's test: X 2 = 45.41, P < 0.0001), which would be expected as those studies used different MR breast-imaging protocols.

Interactive Semi-Automated Threshold
Two studies [3,54] comprising of 58 patients were included in the analysis, which indicated a considerable heterogeneity among study means (Cochran's Q test: X 2 = 10.26, P = 0.0014). In contrast, there was no evidence of heterogeneity among study variances (Bartlett's test: X 2 = 1.61, P = 0.2072).
On the other hand, two studies with inclusion of 67 patients [53,55] were analyzed as shown in Figure 3E, there was no evidence of heterogeneity among study means (Cochran's Q test: X 2 = 3.01, P = 0.0825), which would be expected as those studies used the same MRI technique and breast density measurement. However, there is a substantial heterogeneity among study variances (Bartlett's test: X 2 = 18.84, P < 0.0001).

Cluster Analysis
The results obtained from the clustering analysis "Dendrogram using Single Linkage" are shown in Figure 4. From this data, it can be seen that a hierarchical diagram showing various distances (0-25) at which studies joined various groups. On that basis, six clusters were identified. A list of cluster membership, study means, SDs, and coefficient of variations (CVs) (expressed as a percentage) is shown in Table 3. A scatter plot of the study means versus SDs is shown in Figure 5, the legend in the scatter plot indicates the number of studies in each cluster. Cluster markers with solid fill indicate clusters with two or more studies, whereas open markers indicate singletons. Cluster 1 included nine studies that analyzed breast density with a combination of contrast and non-contrast T1-weighted either with 2D spin echo or 3D gradient echo; however, Choi [49] used diffusion-weighted scanning technique. From the data in Table 3 (Cluster 1), it is apparent that the CVs are varied in value, but in Choi's study [49] the CV is almost 100% because of the mean and SD are almost identical. In contrast, the CVs for Chan [48] and Chen [53] are much lower than the rest of the included studies, largely because of the small SDs and the breast segmentation methods being used which are FCM and N3 and interactive semi-automated threshold algorithms, respectively. because of the small SDs and the breast segmentation methods being used which are FCM and N3 and interactive semi-automated threshold algorithms, respectively.  In contrast, cluster 2 consisted of 8 studies that assessed breast density with a combination of contrast-and non-contrast-enhanced T1-weighted with 3D gradient echo, however, Chen [31] and Chen [45] used non-contrast-and contrast-enhanced T1-weighted with 2D spin-echo, respectively. Indeed, Chen [45] analyzed breast density using FCM and N3 algorithms. From the data in Figure 5   In contrast, cluster 2 consisted of 8 studies that assessed breast density with a combination of contrast-and non-contrast-enhanced T1-weighted with 3D gradient echo, however, Chen [31] and Chen [45] used non-contrast-and contrast-enhanced T1-weighted with 2D spin-echo, respectively. Indeed, Chen [45] analyzed breast density using FCM and N3 algorithms. From the data in Figure 5 and Table 3 (Cluster 2), it is apparent that the CVs are almost within a closed range except for Chen [45] where the CV is much lower than the remaining studies because of the small SD and the breast segmentation method that previously mentioned. Also, Wengert [55] used Dixon method as a technical protocol for breast-imaging, although they measured the breast density using in-house customized software, the mean and SD are not different to the other included studies. The most striking result to emerge from the data in Figure 5 and Table 3 (Cluster 3-6) is the Chen [1] study (i.e., Cluster 3), although it used non-contrast-enhanced T1-weighted with 3D gradient echo and analyzed breast density by FCM algorithm, the CV (11.67%) is much lower than the remaining studies, mainly because of the small SD.  Cluster 4 included two studies Chan [48] and Chen [50] that assessed breast density using FCM and N3 algorithms and non-contrast-enhanced T1-weighted with 3D gradient echo. As can be seen from the data in Figure 5 and Table 3 (Cluster 3-6) the study means and SDs are not different. In contrast to this Cluster 5, the Tagliafico study [3] used 3D contrast-enhanced T1-weighted gradient echo sequence and analyzed the breast density by semi-automated interactive threshold, in particular, (MedDensity). As Figure 5 and Table 3 (Cluster 3-6) show, the study mean is much higher than the remaining studies, largely because of the technical method used. Finally, cluster 6 consisted of Lodger [54], this is the only study that used proton density weighted sequence. Detailed information of clustering membership, study means, SDs, and CVs is shown in Table 3, Figures 4 and 5.
Switching from technology groupings of studies to groupings identified by the cluster analysis, meta-analysis of cluster 1 revealed that the study means, and study variances are both heterogeneous (Cochran's test for heterogeneity of study means, X 2 = 22.26, P = 0.0045, and Bartlett's test for heterogeneity of study variances, X 2 = 21.47, P = 0.0060, see Figure 6A). When Choi [49] was excluded (because of the very large CV), the cluster has improved somewhat, which would be expected as this study used different protocols (i.e., diffusion-weighted imaging). It can be seen from the data in Figure 6B that the study variances are no longer heterogeneous (X 2 = 8.84, P = 0.2641), although the study means remain heterogeneous (X 2 = 19.54, P = 0.0066). In contrast, meta-analysis of cluster 2 indicated that the study means are not heterogeneous (X 2 = 4.77, P = 0.6874), while the study variances are mildly heterogeneous (X 2 = 15.54, P = 0.0206, see Figure 7).

Discussion
The present systematic review and meta-analysis was performed to analyze the current studies on quantitative breast density using MRI and to determine the most appropriate technical/operational protocol. Through reviewing 38 studies from the literature, despite many methods and protocols available, no gold standard has been established with a wide range of heterogeneous methods or protocols used in these studies. To the best of our knowledge, this is the first comprehensive systematic review and meta-analysis of pooling the results of all breast density segmentation/measurement methods using MRI data. The analysis indicated that the non-contrast-enhanced T1-weighted acquisition was commonly utilized among all MR breast-imaging protocols. Another important finding of this analysis was that the FCM is the most frequently used algorithm amongst the breast density segmentation/measurement methods. Also, the results showed that a high level of heterogeneity was mainly associated with the breast-imaging protocols and the breast density segmentation/measurement methods.
Further attempts have been made by using clustering methods and meta-analysis to identify groups of studies which are as homogeneous as possible within groups and as heterogeneous as possible between groups. The included studies were grouped together into clusters based on their nearest neighbor Euclidean distances. On that basis, clusters 1 and 2 were considered as the most valuable results. Briefly, cluster 1 consisted of 9 studies [25][26][27][28][29][30]48,49,53], as shown from the data in Table 3 and Figure 6A that the CVs are varied in value, but in Choi [49] the CV is almost 100% because of the mean and SD are almost identical. This result may be explained by the fact that among the 8 studies [11,31,32,[36][37][38]45,55], the breast-imaging protocol was a combination of contrast-and non-contrast-enhanced T1-weighted either with 2D spin echo or 3D gradient echo, while in Choi [49] the MRI protocol used was diffusion-weighted imaging. Consequently, it is advisable to exclude it from the meta-analysis to reduce the heterogeneity within cluster 1. Consistent with this hypothesis, the results have improved in somewhat, even though the study variances are not heterogeneous (P > 0.05), the study means are heterogeneous (P < 0.05) ( Figure 6B). Although exclusion of Choi [49] did not reduce the heterogeneity, these results should be interpreted with caution. The discrepancy could be largely attributed to that although the MR breast-imaging protocols are not dissimilar (i.e., contrast-and non-contrast-enhanced T1-weighted), the breast segmentation/measurement methods are vice versa (i.e., FCM, FCM and N3, and in-house customized software). In contrast, cluster 2 included 8 studies, in 3 of these studies the breast density was reported as a (%DBV), while the remaining as a (% BD). Among these studies, the contrast-and non-contrast-enhanced T1-weighted was often used. From the data in Table 3 and Figure 7, it is apparent that the study means are not dissimilar (P > 0.05), although the study variances are heterogeneous (P < 0.05). Among the 21 studies included in the cluster analysis, although the fixed effect meta-analysis of cluster 2 has improved slightly, the heterogeneity within group still exist. There are two likely causes for this heterogeneity: the applied MR breast-imaging protocol and the used breast density segmentation/measurement methods.
Although the study has successfully confirmed the variation in the breast density segmentation/measurement methods using MRI data, the findings are subject to several limitations. First, the heterogeneity of study aims, the study design utilized, and the technical/operational methods applied, for instance, the MR breast-imaging protocol, MR scanner manufacturer, and the static magnetic field strength present challenges for performing the meta-analysis. Second, the breast density segmentation/measurement algorithm used is another limitation. Although we classified the included studies into discrete subgroups (i.e., FCM, FCM and N3, and interactive semi-automated threshold), and applied stratified analyses, the heterogeneity remains. Third, the definition of the breast density was inconsistent because some studies reported it as a percentage of dense breast volume, while the others as a percentage of breast density. Fourth, among the 38 studies included in this analysis, only 21 studies were eligible for meta-analysis due to the statistical requirements for the input values that should be in identical expression of measurement and dispersion. In addition, some of the included studies used the same set of the subject multiple times for different purpose and feature. Even though we decided to rectify the issue by selecting one of the results of data at random, or by any meaningful clinical criterion, the heterogeneity continues to exist. Notwithstanding these limitations, the study further supports the idea of developing a standard MRI protocol for the quantitative assessment of breast density.
Future research can be suggested according to findings of this review. A recent study has reported the feasibility of creating a realistic 3D printed breast phantom for quality control purpose [58]. Thus, we consider 3D printing technique can be used to develop a patient-specific 3D printed breast phantom with different amounts of breast composition to quantify the volume of FGT. Further, the 3D printed model can be used to examine several MR breast-imaging protocols not only to measure the breast density but also to assess the impact of implementing various image quality parameters (i.e., FOV, matrix size and slice thickness) on the segmentation/measurement of breast density. Finally, the accuracy of different breast density/FGT segmentation methods can be determined.

Conclusions
This systematic review and meta-analysis confirms and substantiates the variation among the breast density segmentation/measurement methods using MRI. Furthermore, subgroup meta-analyses and further clustering methods indicated that a significant heterogeneity within and between groups exist. The analysis confirmed that the non-contrast-enhanced T1-weighted acquisition was commonly utilized among all MR breast-imaging protocols and the FCM is the most frequently used algorithm amongst the breast density segmentation/measurement methods. Future work will need to determine the most appropriate protocol and method for quantifying breast density using MRI.