A Cross-Machine Comparison of Shear-Wave Speed Measurements Using 2D Shear-Wave Elastography in the Normal Female Breast

: Quantitative measures of radiation-induced breast stiffness are required to support clinical studies of novel breast radiotherapy regimens and exploration of personalised therapy, however, variation between shear-wave elastography (SWE) machines may limit the usefulness of shear-wave speed ( c s ) for this purpose. Mean c s measured in four healthy volunteers’ breasts and a phantom using 2D-SWE machines Acuson S2000 (Siemens Medical Solutions) and Aixplorer (Supersonic Imagine) were compared. Shear-wave speed was measured in the skin region, subcutaneous adipose tissue and parenchyma. c s estimates were on average 2.3% greater when using the Aixplorer compared to S2000 in vitro. In vivo, c s estimates were on average 43.7%, 36.3% and 49.9% signiﬁcantly greater ( p << 0.01) when using the Aixplorer compared to S2000, for skin region, subcutaneous adipose tissue and parenchyma, respectively. In conclusion, despite relatively small differences between machines observed in vitro, large differences in absolute measures of shear wave speed measured were observed in vivo, which may prevent pooling of cross-machine data in clinical studies of the breast.


Introduction
Breast-conserving surgical local excision, followed by radiotherapy (RT) to improve local control and survival, is a successful treatment for early breast cancer. Approximately 31,000 women receive adjuvant breast RT in the UK per year with local relapse rates as low as~6% at 10 years [1]. Given that many women are now surviving decades beyond their breast cancer treatment, a priority now is to reduce the long-term side effects of radiotherapy to the breast. The overall cosmetic outcome has been shown to be an important factor influencing patient psychosocial morbidity after treatment [2]. Evaluation of novel RT regimens must weigh the positive outcomes (including local control, disease-free survival, and overall survival) against treatment-related toxicity and the impact on the patient's quality of life. Currently, morbidity-related endpoints are primarily derived from subjective measures of function, patient comfort and cosmesis, as evaluated by clinicians or patients, which can be subject to variation due to psychological and social factors [3,4]. Such variation may preclude the pooling of results from multiple clinical studies of radiotherapy [5], required to support studies of genetic predisposition to radiation toxicity [6]. Quantitative, reproducible measures of morbidity could remove these sources of variation allowing more objective assessment of treatment toxicity and facilitating exploration of personalised treatment.
Shear-wave elastography (SWE) has the potential to quantify tissue stiffness for the purpose of disease diagnosis, prognosis or staging, and to monitor the response of tissues to treatment [7][8][9][10][11][12]. SWE is expected to be able to quantify properties related to breast tissue stiffness using shear-wave speed c s , which under a number of assumptions is related to Young's modulus (E) of the tissue using E = 3ρc s 2 , where ρ is tissue density [10]. Absolute or relative measures of c s or E may be useful. One such relative measure could be the spatial distribution of c s , which can be compared with the radiation dose distribution delivered to the patient, potentially helping to further increase understanding of the relationship between dose and radiation-induced morbidity. Two-dimensional SWE (2D-SWE) which provides the spatial distribution, or map, of c s may be used for this purpose. Here, we adopt the convention for describing different forms of elastography outlined in Bamber et al. [13].
In the context of large multi-centre clinical trials, required to provide the clinical evidence to change practice [1], measures of morbidity must be reproducible between centres. There is evidence to suggest that c s estimates vary between machines provided by different vendors, transducer types and imaging depth [14][15][16]. This may detrimentally affect the application of SWE to the assessment of normal tissue toxicity because it is unlikely that the same SWE equipment will be available across different clinical centres.
Investigations that compare c s estimates made with various machines, transducers and system settings have been largely limited to in vitro studies [17][18][19], albeit with some in the liver in vivo [20,21]. No study has compared c s estimates between machines in normal breast tissues. The purpose of the current study was to compare absolute and relative measures of breast c s between 2D-SWE machines in vivo to determine if these could be used as reliable metrics in planned cross-centre and cross-machine clinical studies of radiation-induced fibrosis. Previous work has shown that the skin thickens post radiotherapy [22,23] and other work has indicated that women with greater breast size are more likely to experience breast hardening [24,25]. Larger breast size is associated with a greater proportion of adipose tissue. It may therefore be important to measure hardening in component breast tissues, i.e., skin, adipose and breast parenchyma. It is expected that different breast tissues will have different c s [26][27][28] and that these differences are consistent between machines. Previous studies of point SWE (pSWE) suggest that c s estimates are influenced by transducer orientation with respect to Cooper's ligaments [28]. If 2D-SWE c s estimates are similarly affected, care must be taken to reproduce transducer orientation across centres.
In this study, we investigate the difference in c s measured using two machines that provide 2D-SWE, the Aixplorer ® (Supersonic Imagine, Aix-en-Provence, France) and the Acuson S2000 ® (Siemens Healthcare GmbH, Erlangen, Germany). Both machines generate shear waves in tissue using an acoustic radiation force impulse (ARFI), which creates a local tissue displacement at the focus of the ARFI beam. This small displacement of tissue generates shear waves that emanate from the focus. Although the approach to measuring c s is similar there may be technical differences that lead to a variation in the value of c s measured. For example, the Acuson builds up a 2D image by focusing the ARFI beam at multiple lateral locations at multiple depths. The shear waves are tracked using multiple tracking beams [13,29]. The Aixplorer uses a different approach, sweeping the ARFI focus over the depth of imaging faster than the speed of the shear waves, generating a coneshaped shear wave, known as a Mach cone. This scanner tracks the propagation of the shear wave using ultrafast plane wave imaging, allowing the shear waves to be tracked in 2D [30]. Furthermore, each machine uses proprietary software to generate values of c s using the tracking data. Consequently, differences in how the displayed value of c s is calculated may also exist.
Variation between absolute measures of mean c s values was investigated in different regions of the breast, specifically, the skin region, subcutaneous adipose tissue and, where it could be clearly differentiated, breast parenchyma. Relative measures investigated included tissue c s ratio, the ratio of c s , in pairs of the above tissue types and the anisotropy ratio, the ratio of c s estimates acquired in transducer orientations that were radial and anti-radial with respect to the nipple.

Machines and Machine Settings
Cross-centre and cross-machine shear-wave speed (c s ) measurements were performed on an ultrasound elasticity phantom and four healthy female volunteers at two centres, the Royal Marsden National Health Service (NHS) Foundation Trust (RM) and Cambridge University Hospitals NHS Foundation Trust (CUH), on the same day. At RM, the Aixplorer ® was used with an L10-2 linear probe and at CUH the Acuson S2000 ® with Virtual Touch Imaging Quantification (VTIQ ® ) technology and a 9L4 linear probe was used. Both machines produce a 2D elastogram, i.e., a map of c s . The machine settings for the Aixplorer for B-Mode imaging were "Gen/Med", dynamic range = 61 dB, image persistence = medium, speed of sound = 1540 ms −1 , transmit frequency = 5 MHz and, for shear-wave elastography (SWE) mode, Penetration = Med, SWE map persistence level = high, Smoothing = 5 and G = 70%. At CUH, the S2000 settings for B-mode imaging were F = 9 MHz, dynamic range = 65 dB, speed of sound = 1540 m/s, persistence setting = 3. A maximum elastogram depth of 3.5 cm was used for both machines.

In Vitro Measurements
A tissue-mimicking phantom (Model 049 Elasticity QA Phantom, Computerized Imaging Reference Systems Inc. (CIRS), Norfolk, VA, USA) was used to obtain c s estimates. The phantom was made of Zerdine®polymer and contained two groups of four spherical inclusions of four different elastic moduli different to the elastic modulus of the background material; one group was placed at a depth of 15 mm and the other at a depth of 35 mm, the former group was used in this study.
Three independent observers acquired shear-wave elastograms and c s estimates from the four shallow inclusions (depth = 1.5 cm, diameter = 1.0 cm) and background material with both machines; deeper inclusions were too deep to fully visualise in SWE mode using the linear transducers used in the current study, which are designed for breast examinations. To determine Aixplorer c s estimates within the inclusions, circular regions of interest (ROIs), called Q-Boxes, with a diameter of 1.0 cm were positioned over each inclusion and c s estimates were recorded. The S2000 c s estimates were acquired using nine 1.5 mm × 1.5 mm square (fixed by the manufacturer) ROIs positioned randomly within each inclusion and within the background material and c s estimates were recorded. For both machines, all ROIs were positioned using the B-mode image, except for Aixplorer when artefacts were excluded and the elastogram was used to indicate the position of the artefacts. The artefacts are discussed in more detail in Section 2.4. Observers independently performed measurements three times and were blinded to the results of other observers.

In Vivo Imaging
Four healthy female volunteers with no history of breast malignancy were recruited for this study. The Surrey and South East Coast National Health Service Research Ethics Committee approved the study and informed consent was obtained from all the volunteers prior to scanning. Volunteers were aged 60, 38, 28 and 25 years on the date of scanning. A single consultant breast radiologist (RS at CUH, or EO at RM) acquired elastograms of the breast at each centre. Prior to acquiring data for analysis, radiologists practised acquiring elastograms without precompression, which may produce an overestimation of c s and introduce SWE elastogram artefacts in vivo [31]. To avoid precompression, copious amounts of ultrasonic gel were used to create a stand-off. Volunteers were asked to hold their breath to make it easier to maintain this stand-off during each image acquisition and reduce the likelihood of motion artefacts.
Images were obtained first with the Aixplorer at RM, after which volunteers travelled to CUH and the imaging was repeated with the S2000. There was an average of six hours between scanning at the two centres. Eight images were acquired from each breast with the transducer in radial and anti-radial transducer orientations, as explained by Figure 1. The transducer positions employed at RM were marked using a surgical marker pen to enable interrogation of the same regions of tissue at CUH. Volunteers were scanned supine with their arm raised behind their head. B-mode images and elastograms were acquired using the technique described above.
to hold their breath to make it easier to maintain this stand-off during each image acquisition and reduce the likelihood of motion artefacts.
Images were obtained first with the Aixplorer at RM, after which volunteers travelled to CUH and the imaging was repeated with the S2000. There was an average of six hours between scanning at the two centres. Eight images were acquired from each breast with the transducer in radial and anti-radial transducer orientations, as explained by Figure 1. The transducer positions employed at RM were marked using a surgical marker pen to enable interrogation of the same regions of tissue at CUH. Volunteers were scanned supine with their arm raised behind their head. B-mode images and elastograms were acquired using the technique described above.

Shear-Wave Speed Estimates
The S2000 generated a shear-wave speed grayscale elastogram adjacent to the Bmode image. See

Shear-Wave Speed Estimates
The S2000 generated a shear-wave speed grayscale elastogram adjacent to the B-mode image. See The Aixplorer provided an elastogram superimposed on the B-mode image where red and blue indicated greater and lesser cs, respectively, as shown in Figure 2b. The smallest available ROI, 2 mm in diameter, was placed at six different locations on the skin, adipose and parenchyma, respectively. The anatomical regions of interest and the cs measurement points selected by the radiologist and annotated on S2000 images were used to guide the placement of the ROIs on Aixplorer elastograms, which was performed by EH.
Image artefacts of high cs at regularly spaced intervals were observed in Aixplorer The Aixplorer provided an elastogram superimposed on the B-mode image where red and blue indicated greater and lesser c s , respectively, as shown in Figure 2b. The smallest available ROI, 2 mm in diameter, was placed at six different locations on the skin, adipose and parenchyma, respectively. The anatomical regions of interest and the c s measurement points selected by the radiologist and annotated on S2000 images were used to guide the placement of the ROIs on Aixplorer elastograms, which was performed by EH.
Image artefacts of high c s at regularly spaced intervals were observed in Aixplorer elastograms. These are visible as a regular pattern of greater shear wave speed in Figure 2b and as vertical lines of greater shear wave speed illustrated using an elastogram of a homogenous region of the phantom in Figure 3a. To study the effect of this artefact, the Aixplorer analysis was performed twice, once by choosing measurement locations that avoided the artefacts using both the B-mode image and the elastogram to position the ROIs, as in Figure 2b, and once with measurement locations selected using only the Bmode image by increasing the c s scale and reducing the opacity of the on-screen display of the elastogram to zero (see Figure 2c). In some elastograms, it was not possible to avoid some overlap of ROIs with regions that appeared to contain artefacts. In some cases, minimising the inclusion of artefacts reduced the number of measurement positions available. The number of c s estimates obtained using the Aixplorer when avoiding artefacts varied between two and six.

Data Analysis
When analysing data from the elastography phantom, the agreement betw chines was assessed using the percentage difference between cs estimates measure the Aixplorer and using the S2000. The symmetric percentage difference was de the difference between estimates divided by the mean of the estimates multiplied [32] and are reported relative to Aixplorer values, i.e., a positive value indicates Aixplorer cs estimate was greater than the S2000 cs estimate. There were three v percentage differences in cs, one per observer. Interobserver variability of the cs e for each inclusion and the background was quantified using the coefficient of v

Data Analysis
When analysing data from the elastography phantom, the agreement between machines was assessed using the percentage difference between c s estimates measured using the Aixplorer and using the S2000. The symmetric percentage difference was defined as the difference between estimates divided by the mean of the estimates multiplied by 100 [32] and are reported relative to Aixplorer values, i.e., a positive value indicates that the Aixplorer c s estimate was greater than the S2000 c s estimate. There were three values of percentage differences in c s , one per observer. Interobserver variability of the c s estimates for each inclusion and the background was quantified using the coefficient of variation (CV).
For each in vivo elastogram, mean c s estimate were determined across all measurement locations (up to 6). Median, interquartile range (IQR), maximum and minimum values of these mean c s estimates were determined for all breasts, for radial and anti-radial transducer orientations and for skin, adipose and parenchyma, and for individual subjects. For the Aixplorer this was done twice, including and excluding artefacts. Agreement between machine estimates of c s was assessed using symmetric percentage differences in c s estimates for each of the variables described above. The correlation between c s estimates measured using S2000 and Aixplorer was determined using Pearson's correlation coefficient. In addition, the ability of the two machines to record the same contrast in mean c s between the skin, adipose and parenchyma tissues was determined using tissue c s ratios: skin to parenchyma, skin to adipose and adipose to parenchyma. Anisotropy of c s was assessed per breast quadrant using the ratio of the mean c s estimated from the anti-radial elastogram to that estimated from the corresponding radial elastogram.
Statistical analysis was performed with MatLab software (Version 2017a, The Mathworks Inc., Natick, MA, USA). Kolmogorov-Smirnov tests were used to test for normality. Wilcoxon signed-rank tests were used to perform a pair-wise analysis to test for agreement in median of the mean c s estimates, tissue c s ratios, and anisotropy ratios, between the machines. Pair-wise analysis was also performed to test for agreement between mean c s estimates obtained using different transducer orientations in the skin, adipose and parenchyma.

Difference in c s Estimates between Machines In Vitro
Representative images of the phantom inclusions acquired using both machines are shown in Figure 3b. Table 1 lists the c s estimates in the phantom measured by each observer, symmetric percentage differences c s estimates in between machines, and the nominal values provided by the phantom manufacturer. Aixplorer gave between 0.2% to 3.06% greater c s estimates than the S2000. The CVs of the c s estimates were <0.1 for all regions of the phantom and both machines (see Table 1). Figure 2 shows example images of the lesions contained in the phantom acquired using both machines Table 1. Individual and mean shear-wave speed (c s ) estimates from the Aixplorer and the S2000 and the coefficient of variation (CV) of c s estimates across three observers. Individual estimates were the mean of three repeat measurements. The means and standard deviations of the symmetric percentage differences are given for each observer in the last column. The manufacturer's nominal c s values for background and inclusions are given.

Difference in c s Estimates between Machines In Vivo
A total of 128 elastograms were acquired, 64 for each machine. Ultrasound imaging revealed that one volunteer had small dense breasts comprising of mainly parenchymal tissue, this volunteer had a maximum imaging depth (skin to pectoral muscles) of approximately 2 cm. The other three volunteers had breasts of various sizes with differing amounts of adipose and parenchymal tissues, the maximum imaging depths in these volunteers were greater than 4 cm. Artefacts of ill-defined vertical bands of higher c s than the average, appearing at approximately regularly spaced lateral intervals (as in Figure 3a, were observed in 100% of Aixplorer elastograms. In two Aixplorer images, artefacts covered the entire skin and adipose regions, and c s estimates from regions without artefacts could not be obtained. In five Aixplorer elastograms, the elastogram-overlay box was incorrectly positioned below the outermost layer of the skin and c s in the skin could not be obtained. A gel stand-off was clearly visualised on 96% of Aixplorer images and 100% of S2000 images. In two volunteers, the B-mode image did not clearly differentiate adipose tissue from the parenchyma. In these volunteers, c s estimates were obtained in skin and parenchyma only. Figure 4a illustrates that shear-wave speed estimates were significantly greater when measured using Aixplorer (avoiding artefacts) compared to S2000 in all tissue types (p-values < 0.001). Symmetric percentage differences in median c s estimates between machines were 40.0%, 50.0% and 51.4% for skin, adipose and parenchyma tissues, respectively. The variance of Aixplorer mean c s estimates was also statistically significantly greater than the variance of S2000 estimates (p-values < 0.001) in all tissues. The median and IQR of the mean c s estimates were significantly less when measured excluding artefacts in the skin and adipose (p-values < 0.001) and gave better agreement with S2000 c s estimates. All subsequent Aixplorer data presented were determined using c s estimates measured avoiding artefacts. Pearson correlation coefficients between c s estimates measured using individual elastograms acquired using S2000 and Aixplorer in the skin, adipose and parenchyma were 0.25, 0.15 and 0.33 and were not statistically significant (p > 0.05). When c s estimates were averaged over the whole breast (eight elastograms) Pearson correlation coefficients were 0.16 (p = 0.5), 0.75 (p = 0.057) and 0.75 (p << 0.01), respectively. All data are tabulated in Table 2. measured avoiding artefacts. Pearson correlation coefficients between cs estimates meas ured using individual elastograms acquired using S2000 and Aixplorer in the skin, ad pose and parenchyma were 0.25, 0.15 and 0.33 and were not statistically significant (p 0.05). When cs estimates were averaged over the whole breast (eight elastograms) Pearso correlation coefficients were 0.16 (p = 0.5), 0.75 (p = 0.057) and 0.75 (p << 0.01), respectively All data are tabulated in Table 2.

Comparison of Tissue c s Ratios between Machines
Figure 5 provides a box and whiskers plot of tissue c s ratios per elastogram for both machines. For both machines, all median tissue c s ratios were statistically significantly greater than unity, indicating that, on average, the skin had greater c s than parenchyma and adipose tissues, and parenchyma had greater c s than adipose tissue. There was no statistically significant difference in median tissue c s ratios between machines.  Figure 5 provides a box and whiskers plot of tissue cs ratios per elastogram for both machines. For both machines, all median tissue cs ratios were statistically significantly greater than unity, indicating that, on average, the skin had greater cs than parenchyma and adipose tissues, and parenchyma had greater cs than adipose tissue. There was no statistically significant difference in median tissue cs ratios between machines. Aixplorer data were acquired using ROIs that did not include artefacts. The IQR and range may be influenced by variation between volunteers, breast quadrants and transducer orientation. ** or *** indicates where median cs estimates for different tissue types were significantly different with pvalue < 0.01 or < 0.001, respectively.  Figure 2 caption for full explanation of box plot). Aixplorer data were acquired using ROIs that did not include artefacts. The IQR and range may be influenced by variation between volunteers, breast quadrants and transducer orientation. ** or *** indicates where median c s estimates for different tissue types were significantly different with p-value < 0.01 or < 0.001, respectively. Figure 6 gives the median, IQR and range of anisotropy ratios for individual breast quadrants. Mean c s estimates (per elastogram) were statistically significantly greater when the transducer was placed in an anti-radial orientation for all tissues and both machines except for estimates measured using Aixplorer in skin. When c s estimates were averaged over the whole breast for anti-radial and radial positions, only for the S2000 did anti-radial c s estimates remained statistically significantly higher than radial c s estimates. Figure 6 gives the median, IQR and range of anisotropy ratios for individual breast quadrants. Mean cs estimates (per elastogram) were statistically significantly greater when the transducer was placed in an anti-radial orientation for all tissues and both machines except for estimates measured using Aixplorer in skin. When cs estimates were averaged over the whole breast for anti-radial and radial positions, only for the S2000 did anti-radial cs estimates remained statistically significantly higher than radial cs estimates. Figure 6. Anisotropy ratios (ratio of mean anti-radial cs to mean radial cs in individual pairs of elastograms obtained at the same location in the same breast quadrant on the same breast in the same individual) measured in skin, adipose and parenchyma for Aixplorer and S2000. *, ** or *** indicates where median cs estimates for anti-radial and radial transducer orientations were significantly different with p-value < 0.05, < 0.01 or < 0.001, respectively.

Difference in cs Estimates between Machines In Vitro
Two studies have compared Siemens S2000 (or S3000) point SWE (pSWE) or 2D-SWE and Aixplorer cs estimates in vitro using the same transducers that were used in the current study [15,18]. Using a phantom with nominal cs ≈ 2.3 ms −1 Shin et al. [15] found S2000 pSWE gave statistically significantly lower cs estimates than Aixplorer by 0.1 ms −1 and 0.4 ms −1 at depths of 3.0 cm and 4.0 cm, respectively. Dillman et al. [18] compared S3000 pSWE with S3000 2D-SWE and Aixplorer 2D-SWE, at depths of 1.0, 2.5 and 4.0 cm in soft (cs ≈ 0.9 ms −1 ) and hard (cs ≈ 2.1 ms −1 ) phantoms and found small (<0.2 ms −1 ) statistically significant differences between them, however, no consistent bias between machines was observed.
Hall [14] compared S2000 pSWE and Aixplorer 2D-SWE using curvilinear transducers at depths of 3.0, 4.5 and 7.0 cm across multiple machines at multiple sites. The Aixplorer gave significantly greater cs estimates than the S2000, by up to ~0.2 ms −1 in a soft phantom (cs ≈ 1.0 ms −1 ). However, the use of a phased array transducer with the S2000 at one of the test sites may have biased the results; the S2000 cs estimates at that site were significantly lower than the S2000 estimates at the other 4 sites. There was no statistically significant difference between the two machines in mean cs estimates in a harder phantom (cs ≈ 2.1 ms −1 ). Similar to the above studies, the phantom data obtained in this study revealed only small differences in cs between the machines, which were on the order of the interobserver variation. Estimates were obtained at a depth of 1.5 cm using linear transducers for both systems and therefore no bias due to depth or transducer geometry was expected. Figure 6. Anisotropy ratios (ratio of mean anti-radial c s to mean radial c s in individual pairs of elastograms obtained at the same location in the same breast quadrant on the same breast in the same individual) measured in skin, adipose and parenchyma for Aixplorer and S2000. *, ** or *** indicates where median c s estimates for anti-radial and radial transducer orientations were significantly different with p-value < 0.05, < 0.01 or < 0.001, respectively.

Difference in c s Estimates between Machines In Vitro
Two studies have compared Siemens S2000 (or S3000) point SWE (pSWE) or 2D-SWE and Aixplorer c s estimates in vitro using the same transducers that were used in the current study [15,18]. Using a phantom with nominal c s ≈ 2.3 ms −1 Shin et al. [15] found S2000 pSWE gave statistically significantly lower c s estimates than Aixplorer by 0.1 ms −1 and 0.4 ms −1 at depths of 3.0 cm and 4.0 cm, respectively. Dillman et al. [18] compared S3000 pSWE with S3000 2D-SWE and Aixplorer 2D-SWE, at depths of 1.0, 2.5 and 4.0 cm in soft (c s ≈ 0.9 ms −1 ) and hard (c s ≈ 2.1 ms −1 ) phantoms and found small (<0.2 ms −1 ) statistically significant differences between them, however, no consistent bias between machines was observed.
Hall [14] compared S2000 pSWE and Aixplorer 2D-SWE using curvilinear transducers at depths of 3.0, 4.5 and 7.0 cm across multiple machines at multiple sites. The Aixplorer gave significantly greater c s estimates than the S2000, by up to~0.2 ms −1 in a soft phantom (c s ≈ 1.0 ms −1 ). However, the use of a phased array transducer with the S2000 at one of the test sites may have biased the results; the S2000 c s estimates at that site were significantly lower than the S2000 estimates at the other 4 sites. There was no statistically significant difference between the two machines in mean c s estimates in a harder phantom (c s ≈ 2.1 ms −1 ). Similar to the above studies, the phantom data obtained in this study revealed only small differences in c s between the machines, which were on the order of the interobserver variation. Estimates were obtained at a depth of 1.5 cm using linear transducers for both systems and therefore no bias due to depth or transducer geometry was expected.

Rationale for the Exclusion of Artefacts from Aixplorer Elastogram Analysis
All Aixplorer shear-wave elastograms obtained in vivo contained artefacts that appeared to correspond with the position of the acoustic radiation force beams; they moved with the lateral translation of the transducer and their number varied with the lateral size of the elastogram box. Artefacts were also observed in vitro and were most clearly visualised in the two stiffer inclusions. Across all volunteers, Aixplorer c s estimates (including artefacts) in adipose tissues were significantly greater than c s estimates in parenchyma (Figure 4b, which was contrary to what has been reported in the literature. When artefacts were excluded the findings of the current study agreed with three large (minimum number of women 89) clinical studies that measured c s in normal breast tissue using S2000 2D-SWE and found that breast parenchyma had statistically greater c s than adipose tissue [26][27][28]. In addition, Athanasiou et al. [33] found similar results using Aixplorer in 46 women.
On visual examination of the elastograms, artefacts appeared most intense (greater c s ) below the gel-skin and skin-adipose interfaces. 'Artefactual vertical bands of stiffness' were observed by Berg et al. [34] in their study of breast lesions using Aixplorer 2D-SWE. Berg et al. [34] attributed these artefacts to tissue compression and noted they were difficult to avoid at superficial depths despite using large amounts of ultrasonic gel and a nocompression scanning protocol. We concluded that c s estimates obtained in regions with artefacts were less reliable and only data from regions that were free from artefacts were used for subsequent analysis of anisotropy ratios.

Difference in c s Estimates between Machines In Vivo
This study found that in vivo, S2000 c s estimates were on average less than c s estimates obtained using the Aixplorer. The differences between machines were greater in vivo than in vitro, with a maximum cross-machine c s -ratio of 1.70 in vivo compared to 1.07 in vitro.
To the authors' knowledge, this is the first study comparing c s estimates of different elastography machines in the normal breast in vivo. Furthermore, c s estimates between machines were uncorrelated except for c s estimates for parenchyma averaged across the whole breast. The observed bias and poor correlation between machines exclude the use of an absolute measure of breast stiffness such as c s or Young's Modulus, i.e., a simple value of breast stiffness would not be reproducible between centres. Further studies comparing these machines are required to determine if these biases can be corrected to provide consistent measures of breast tissue between centres. It is important to note that there were large differences in the variation between machines observed in vitro and in vivo. It appears from the data presented here that phantom studies would not be adequate to provide data for cross-calibrations of machines.
Several studies have compared liver c s measured using 2D-SWE and pSWE in vivo [17,[35][36][37] using a range of different commercially available pSWE and 2D-SWE machines and found that 2D-SWE and pSWE could not be used interchangeably to grade the severity of liver fibrosis in patients with liver disease. Ferraioli et al. [20] also compared two 2D-SWE machines (Aixplorer and Aplio 500 (Canon/Toshiba, Japan)) in 21 patients with liver disease. The mean difference between the two 2D SWE machines was −0.6 ms −1 and 95% LOA −1.4 ms −1 and 1.2 ms −1 ; which of the machines gave the greatest c s estimate was not reported.
Differences between c s measured by different machines may, in part, be due to differences in the frequency content of the shear-waves. This would be consistent with our observation that such differences are greater for measurements in tissue than in phantoms since tissue is likely to possess increased shear wave dispersion due to viscoelastic damping or scattering of the shear wave relative to that in the phantom. Furthermore, it is possible that the way in which the ARFI pushing beams are executed, as described in the introduction, may result in different shear wave frequencies. Even with similar shear-wave bandwidth, if, to estimate c s , vendors rely on a model of shear-wave propagation that assumes the spatial and temporal characteristics of the acoustic radiation force impulse (ARFI) push and the elastic or viscoelastic properties of the tissue, differences in models may result in differences in estimated c s . It is also feasible that differences between machines arise due to differences in the ways in which their respective measurements of shear-wave speed are affected by the region within the lateral boundaries of the ARFI pushing beam, where a direct and immediate displacement from the ARFI push, whose magnitude decays with off-axis distance according to the ARFI intensity beam profile, is summed with the spatiotemporally varying displacement due to the propagating shear wave.
Further work is required to improve our understanding of the cause of machine bias and why it appears to be greater in tissue than in current phantoms. Nevertheless, even if standardisation of the frequency content of shear wave pulses used in SWE is not feasible, it would be helpful if manufacturers were able to provide the shear-wave frequency spectrum generated in typical tissues at a number of imaging depths, as part of the machine specification, so that authors could incorporate the information into publications such as this, as called for by Dietrich et al. [38].

Comparison of Tissue c s Ratios and Anisotropy Ratios between Machines
For both machines, tissue c s ratios were greater than unity indicating that skin had higher c s estimates than parenchyma, and parenchyma had higher c s estimates than adipose tissue. No significant differences in any tissue c s ratios between machines were detected, suggesting that whilst the machines differed in absolute values of c s estimates, relative values between tissue types are similar.
Zhou et al. [28] reported anisotropy ratios (radial to anti-radial) of 1.28 and 1.30 in adipose and parenchyma tissue in the normal breasts of 137 patients, respectively, measured using S2000 pSWE. These findings were consistent with the radial orientation of ducts and Cooper's ligaments and that material properties of the tissue are dependent on the orientation of tissue fibres. For example, Kruse et al. [39] and Gennisson et al. [40] in observations of muscle, and Coutts et al. [41] and Coutts et al. [42] in observations of skin, found that shear modulus and stiffness vary according to the direction of shear wave propagation and strain respectively, with respect to the muscle or dermal collagen fibres, and that c s and stiffness were greatest when measured parallel to the fibres.
The current study detected a difference between c s estimates measured in the radial and anti-radial orientations. The effect, however, was the reverse of that observed by Zhou et al. [28], with greater c s measured with the transducer in the anti-radial orientation. The reason for this is unknown, although it may be significant that Zhou et al. [28] measured the c s only at a single point with the probe in both orientations. The current study, which used a much smaller number of volunteers, compared average c s across many measurement points in each image plane. The exact same volume of tissue was not being interrogated in each orientation, nor was the shear wave propagation direction truly anti-radial for any points other than at the central beamlines of the anti-radial scan, which may account for the different findings. For a direct comparison with the findings of Zhou et al., [28], a modification of our experimental method would be required, but this would be contrary to the objective of our study as discussed below.
The differences in c s estimates between transducer orientations indicate that one or both transducer orientations should be used to acquire c s estimates. Care should be taken to acquire data in the correct orientation, and perhaps this can be performed with reference to anatomical landmarks such as the nipple. The focus of the current study is the comparison of the machines for their ability to pool data, and the observed anisotropy in c s estimates, although not consistent with the findings of Zhou et al. [28], was consistent between machines. This may suggest bias in the measurement technique, for example, a change in gel stand-off between the different transducer orientations, however, no systematic differences in gel stand-off between the images acquired using radial and ant-radial orientations was observed.
In the context of the future measurement of radiation toxicity to the breast and the use of the unirradiated breast as a reference measure of c s , large variation in relative c s estimates between normal breasts between machines would reduce the sensitivity of 2D-SWE to detect a change in stiffness in the breast due to radiation-induced fibrosis, i.e., any radiation-induced change in c s , whether absolute or relative to the unirradiated breast, should be significantly greater than the equivalent natural variation in the population of women receiving radiotherapy for breast cancer. The magnitude of the effect of radiation on c s in the tissues of the breast is as yet unknown. This cohort of volunteers is too small to investigate the underlying natural variation between breasts, however, we have observed large variation in shear wave speeds between samples ( Figure 4) and patients. Encouragement that it may be possible to use c s for this purpose can be gained from the observations that semi-quantitative strain elastography using a calibrated stand-off pad demonstrated in all three of three patients who had received radiotherapy to the breast some years earlier, the irradiated breast had a higher Young's modulus that the unirradiated breast [43], and even subjective palpation is capable grading the level of breast stiffness changes that are caused by radiation damage and of distinguishing between the irradiated and unirradiated breast [1]. If the effect is smaller than the average variation between patients, it may be possible to use the contralateral breast as a reference or acquire a baseline (pre-treatment image) to measure the change in stiffness.

Limitations of This Study
A limitation of this study is the small number of volunteers. To compare SWE machines a cross-centre study had to be performed, which limited the number of volunteers available. Despite the small number of volunteers, differences in c s estimates between machines were significant and consistent for all breasts and all tissue types. This study compared c s estimates between machines in the skin. In skin, and othered layer tissue, such as the cornea, shear waves may be guided modifying the c s and introducing dispersion [44] and the relationship between c s and the elastic modulus can no longer be simply approximated by E = 3ρc s 2 . Variation in the skin boundary conditions may also affect c s [40]. Previous studies of 2D-SWE in skin have controlled for this effect. To investigate the relative stiffness of sclerotic and normal skin the relative c s of sclerotic and normal skin was normalised by the ratio of skin thicknesses [45][46][47]. Similarly, in a study of the corneal collagen crosslinking using the Aixplorer, direct comparison of the stiffness of corneas with and without collagen cross-linking was justified by monitoring the central cornea thickness and ensuring there was no difference in cornea thickness between paired measurements [48]. In this work, cross-machine comparisons of c s with the transducer in the same position on the breast were made and therefore skin thickness will be the same for both measurements. For future studies of radiation-induced breast toxicity using 2D-SWE, relative measures of radiation-induced skin toxicity that rely on comparison with the contralateral breast should consider variations in skin thickness and boundary conditions [45]. Another limitation is the use of a different operator to acquire in vivo data from each machine, which may introduce bias in c s estimates. We also expect that interoperator variation may contribute to the poor correlation overserved between the elastograms acquired with different machines but of the same tissue sample. A further source of variation may be the placement of the probe on the breast, including changes in orientation. We used a surgical marker pen to mitigate against this variation, however, we expect that the heterogeneous nature of the breasts means that a small variation in probe placement could result in a small variation in c s between scans. However, our use of two operators was by design, because it mimics the way that data would be acquired in a future multicentre study, it was expected that interoperator variation, which includes variations in probe placement, to be a small effect. Studies report good agreement between operators both in vitro and in vivo [14,19,20,37] and care was taken to give the two radiologists the same instructions and to allow them to practice data acquisition using a phantom.

Conclusions
Differences in c s estimates between 2D-SWE machines were much greater in vivo than in vitro. Aixplorer gave a greater estimate of c s than S2000 in all tissues. Differences in c s estimates limit the use of absolute measures of c s as a quantitative biomarker in cross-centre studies unless this bias can be reliably corrected. Shear-wave speed estimates must be acquired using the same transducer orientation across centres and across machines.