Inﬂuence of Examiners’ Experience and Region of Interest Location on Semiquantitative Elastography Validity and Reliability

: Semi-quantitative elastography is a promising imaging technique to evaluate tissue stiffness differences, providing data regarding relative stiffness differences between two targets. The aims of this study were to assess the validity, inter-examiner reliability and variability of semi-quantitative elastography for calculating strain ratios (SR) in a homogeneous gel phantom in different locations within the image. A diagnostic accuracy study was performed in a homogeneous stiffness phantom. Four examiners participated (two novice and two experienced). Each examiner assessed the SR in two locations. Difference between examiners, variability of measurements, SR error and absolute error, mean error of the measurements and coefﬁcient of variation were calculated. The agreement between examiners, validity and variability of measurements were higher in the central area than the lateral areas of the images. Thus, the experience of the examiner was relevant for the concordance of the measurements in the lateral areas of the images (SR difference of 0.14 ± 0.05; p < 0.001), but not for the central area (SR difference of 0.05 ± 0.02; p > 0.05). Our data suggested that semi-quantitative elastography is an accurate tool for assessing small magnitude stiffness differences within the same image in central areas, but the experience of the examiner is a determinant factor.


Introduction
Ultrasound imaging (US) is a safe, portable, low-cost imaging method for assessing soft tissues, including skeletal muscle or viscera, and is widely used by different specialties (e.g., physiotherapists, cardiologists, radiologists, hepatologists or gynecologists) [1]. During the last years, several studies have developed technical reports to assess the validity and/or reliability of different imaging procedures [2] and imaging methods [3,4].
Elastography is a US physics-based imaging technology sensitive to tissue stiffness. It has been further developed and refined in recent years to make quantitative assessments of tissue stiffness [5]. Although the first elastography method was "strain imaging" (which consists of manual compression on the tissue with an ultrasound transducer), the most recent technology is the "shear wave imaging" method, which measures physical tissue displacement generated by shear waves perpendicular to the direction of the force produced with the transducer [6].
Recent developments have made strain elastography more accurate by providing real-time feedback regarding the optimal pressure needed, with an indication bar scaled from 1 to 6 (where 1 is not appropriate and 6 is the most appropriate) [7]. Thus, strain elastography provides qualitative information based on a color map and semi-quantitative information expressed as the stiffness comparison between two areas within the same image. This strain ratio (SR) is calculated as the mean strain in the reference divided by the mean strain in the target.
Prior evidence assessing SR accuracy by using both strain and shear wave elastography in calibrated phantoms presented greater accuracy in the shear wave method [8,9]. One of the reasons explaining these findings is that strain elastography applies an unknown compression to the tissue and the deformation obtained is relative. Although shear wave elastography does not depend on the manual compression of the operator, as in strain elastography, it is more expensive, less accessible and it should be considered that all US methods are susceptible to be biased by shadowing, reverberation, clutter artifacts or operator experience [10].
Therefore, considering (1) the accessibility advantages of strain elastography and (2) inconsistency in semi-quantitative SR results (evaluated methods, the system used and the position of the reference region of interest) [8], an important preliminary step for semi-quantitative elastography to be used and interpreted correctly in research or clinical practice is establishing validity, reliability and variability of measurements.
The rationale for conducting this study is based on two reasons. Firstly, the accuracy of SR is directly dependent on the magnitude of the stiffness difference between both targets [11]. Although strain elastography has been widely assessed in phantoms under optimal conditions previously, these reports compared the stiffness difference between targets and backgrounds in big magnitudes ranging from 11-55 kPa [8][9][10][11]. However, a recent study assessing stiffness differences between active and latent myofascial trigger points (MTrP) with control points within the same muscle showed smaller differences than those assessed in vitro (ranging from 0.04-2.76 kPa) [12]. Therefore, further research assessing the accuracy of SR calculation in tissues with similar stiffness is needed. Secondly, the influence of the examiners' experience for assessing SR is not consistent [8,11,13].
The aims of this study were to determine semi-quantitative elastography SR validity, inter-examiner reliability and variability of measurements considering different locations within the images and the operators' experience using a phantom with homogeneous stiffness (since homogeneous tissues are characterized by zero stiffness difference) under optimal conditions.

Study Design
This is a diagnostic accuracy study which was conducted between September 2020 and November 2020 in a private university located in Madrid (Spain), which consists of a construct validity, variability of measurement and inter-examiner reliability analysis. This type of study focuses on judgement based on the accumulation of evidence using a specific measuring instrument (e.g., semi-quantitative elastography SR).
This methodology requires assessing the relationship between the evaluated target and a variable score (an homogeneous stiff gel phantom with known SR (SR = 1)) to be known to be related to the construct measured by the instrument for calculating the construct validity, performing repeated measurements to the same target at different points in time to calculate the variability of measurement and analyzing the equivalence of ratings obtained by different observers with different experience levels to calculate the inter-examiner reliability [14].
This study followed the Standards for Reporting of Diagnostic Accuracy Studies (STARD) guidelines and checklist [15]. No ethics committee approval was needed since neither animals nor humans were involved for this research.

Imaging Acquisition Procedures
A SonoSite Blue Phantom TM (Sarasota, FL, USA) Vascular Access BPO100 with homogeneous stiffness was placed on a rigid table. A single model was chosen since SR = 1 for all homogeneous phantoms. All images were acquired with an Alpinion eCube i8 (Anyang-si, Gyeonggi-do, Ltd., Korea) with a 4 cm width linear transducer E8-PB-L3-12T (frequency bandwidth 3-12 MHz). A linear transducer was chosen since convex probes generally have higher intra-observer variability [11].
Room light, temperature and all the US parameters were set under the same conditions for both examiners. Frequency was set to 12.0 MHz, gain to 55 dB, dynamic range to 85, brightness to 17 and depth to 4 cm. To ensure optimal sound wave incidence, the transducer was placed perpendicular to the surface of the phantom locating a long-axis image of the internal cylindrical structure while avoiding its inclination and capturing the lumen of the cylinder with the maximum amplitude throughout the image (Figure 1a).
This US equipment shows a 1 to 6 scale bar in the upper-left part of the image regarding the quality of the applied pressure (uniformity of the compression in all areas) to optimize the elastography measurement. Therefore, the transducer pressure was carefully calculated according to this scale to apply the optimal pressure during the obtainment of all the images ( Figure 1b). A total of four examiners participated in this procedure; two were experienced (10 years of practice in the use of US imaging) and two were novice (1 year of practice). All examiners performed the transducer placement and each captured 50 images as described. Acquisitions were performed in 5 series of 10 captures with 1 min difference between captures and 30 min difference between series.

Measurement Assessment Procedures
Once captured, all the images were assessed using offline measurement tools of the US equipment to calculate the SR. Relative SRs were calculated as the stiffness of the reference area divided by the stiffness of the comparator area. Two different SRs were calculated as follows ( Figure 2): (1) Lateral areas of the image: First, the caliper was used to measure 1 cm from the top right corner of the image to the left. Then the area selector tool was utilized to contour a rectangle with 1 cm width and a height equal to the distance between the most superficial limit of the phantom to the most superficial limit of the cylindrical structure. Finally, another rectangle with same measurements (height and width) was placed in the top left corner of the image to obtain the SR. (2) Central areas of the image: Within the central 2 cm that were not included in the previous measurement, the distance from the surface of the phantom and the upper limit of the cylindrical structure was divided by 2. Following this, a rectangle was contoured with a width of 2 cm and the upper half of the distance previously calculated from the surface to the fake vessel. Finally, the SR between the upper rectangle and the lower rectangle was calculated.
All the measurements were performed by the same operator with 10 years of experience. Every image was coded to blind the rater using alphanumerical codes in a randomized order.
US equipment to calculate the SR. Relative SRs were calculated as the stiffness of the reference area divided by the stiffness of the comparator area. Two different SRs were calculated as follows ( Figure 2): (1) Lateral areas of the image: First, the caliper was used to measure 1 cm from the top right corner of the image to the left. Then the area selector tool was utilized to contour a rectangle with 1 cm width and a height equal to the distance between the most superficial limit of the phantom to the most superficial limit of the cylindrical structure. Finally, another rectangle with same measurements (height and width) was placed in the top left corner of the image to obtain the SR. (2) Central areas of the image: Within the central 2 cm that were not included in the previous measurement, the distance from the surface of the phantom and the upper limit of the cylindrical structure was divided by 2. Following this, a rectangle was contoured with a width of 2 cm and the upper half of the distance previously calculated from the surface to the fake vessel. Finally, the SR between the upper rectangle and the lower rectangle was calculated.
All the measurements were performed by the same operator with 10 years of experience. Every image was coded to blind the rater using alphanumerical codes in a randomized order.

Statistical Analysis
Data analysis was conducted with the Statistical Package for the Social Science (SPSS) Version 21 for Mac OS. Normal distribution of the SR data was verified using the Shapiro-Wilk test. Inter-examiner reliability of SR calculation was assessed by calculating the mean

Statistical Analysis
Data analysis was conducted with the Statistical Package for the Social Science (SPSS) Version 21 for Mac OS. Normal distribution of the SR data was verified using the Shapiro-Wilk test. Inter-examiner reliability of SR calculation was assessed by calculating the mean of the measurements with the upper and lower limits (95% CI) in terms of examiner experience and taking the mean difference between examiners (DBE = SR scored by the experienced examiner − SR scored by the novice examiner). Variability of the measurements (SoM% = standard deviation of the error × 100) and validity was also assessed, taking the examiner experience into account by calculating the mean error (E = known SR − SR obtained by the examiner = 1 − SR obtained by the examiner), the mean absolute error (AE = absolute value of E), the mean error of the measurements (MEM = (mean AE of the experienced examiner + mean AE of the novice examiner)/2), the mean percent error (PE% = AE/mean error of the measurements ×100) and the mean coefficient of variation (CV% = Standard Deviation/mean). All the analyses were performed for both SRs calculated in the lateral areas and center of the images. Student's t-test for independent samples was utilized to determine examiners, location areas and reference differences. All tests were two-tailed and p-values < 0.05 considered significant.

Results
A total of 200 images were captured and included for analysis, 50 per experienced examiner and 50 per novice examiner. From these 200 images, 200 measurements were obtained from the lateral areas and 200 measurements from the central area of the images. Table 1 shows inter-examiner reliability (agreement between two examiners expressed as the difference between their measurements to obtain information regarding the extent to which the results can be reproduced under the same conditions) and instrument variability data of SR calculations (considered as the extent to which measurements diverge from the average value). In general, the agreement between examiners and variability of measurements was higher in the central area than in the lateral areas of the images (the difference between examiners for each location ranged between 0.00-0.13 and 0.00-0.01, respectively). Thus, the experience of the examiner was relevant for the concordance of the measurements in the lateral areas of the images (p < 0.05), but not for the central area (p > 0.05). Table 1. Inter-examiner reliability and variability of semi-quantitative elastography stiffness ratio measurements. Like reliability, validity is another concept used to evaluate the quality research. Even if an instrument shows excellent reliability and results are reproducible, they might not be correct. Therefore, validity provides information regarding the accuracy of one instrument comparing a known value with the obtained value. Validity estimates of semi-quantitative elastography for calculating SR within the same image are reported in Table 2. In general, the experience level of the examiner does not seem to be an influential factor since the error and absolute error differences showed no significant differences between novice and experienced examiners (p > 0.05). However, the mean error of measurements was significantly higher (p < 0.001) in the lateral areas compared to the central area of the image (0.14 ± 0.05 and 0.05 ± 0.02 respectively). SR calculations of the experienced and novice examiners in the central area showed no statistically significant differences with the known reference (p > 0.05). Both novice examiners showed significant differences between the known SR and their measurements (p < 0.001), unlike the experienced examiners (p > 0.05).

Discussion
This study assessed the reliability, variability of measurements and validity of semiquantitative elastography SR calculation under optimal conditions (controlling the 90º transducer position and pressure by using as reference a cylindrical mimicking vessel) considering the experience level of the examiners. In general, reliability, variability of measurements and validity of semi-quantitative elastography SR calculation were acceptable. Elastography could be one of the most important technological breakthroughs in the field of ultrasound imaging (since development of Doppler imaging or Panoramic US) including the main advantages of US compared to other imaging techniques (e.g., low cost, short examination time, noninvasiveness and accessibility) [16].
After an extensive literature search, several recently published studies were found that assessed the reliability estimates for elastography in different soft tissues structures, Ref. [17] including nerves [18], muscles [19], tendons [20,21] and arteries [22]. Features were assessed with quantitative elastography (including transverse velocity, transverse stiffness, cranio-caudal velocity and cranio-caudal stiffness, showing a poor repeatability of measurements with wide limits of agreement) and semi-quantitative elastography assessing SRs [8,9].
Previous phantom studies [8,11] reported that qualitative, semi-quantitative and quantitative data collected with strain and shear wave elastography can classify the targets as harder or softer than backgrounds properly. However, SRs were more accurate in shear wave elastography compared to strain. Aligned with this study, our results showed central areas of the images to be more accurate, reliable and stable than lateral areas for calculating SR. In addition, the influence of the examiner experience was controversial [11,13]. We found lower SR inter-examiner reliability and validity for novice examiners once the region of interest was focused on lateral areas. Two possible reasons explaining these phenomena could be the higher number of cross-sound waves in the middle of the transducer, yielding a greater accuracy in the greyscale and elastography quality and a more uniform pressure.
There is also evidence supporting better SR accuracy and CVs (CV = 0.08-0.65 for SR ranging 1.57-2.47) when the stiffness difference between a target and a control point is big, whereas CVs are higher in small differences (CV = 1.22-1.7 for SR ranging 0.40-0.60) [8].
This could be explained by relativity of SRs. Therefore, if the difference between two points is small, the equipment might not be sensitive enough for this difference, whereas greater ranges can be detected easily. In this study, our phantom was homogeneous and the known was SR = 1. Our results showed smaller CVs in the center of the image compared with the lateral areas (0.05-0.06 and 0.14-0.15, respectively). Thus, experience does not seem to be a limitation for obtaining reproducible and valid measurements in central areas of the image.

Clinical Implications
This study could be useful considering the calculation of SR for developing specific protocols in musculoskeletal tissues for both research and clinical practice by using semiquantitative methodologies in the future. The most assessed musculoskeletal tissue with elastography is probably the tendon, based on the hypothesis of altered stiffness in the presence of tendon injury [23]. Although a previous study reported patterns in healthy tendon elastography, describing them as a uniformly firm structure or heterogeneous tissue with interwoven longitudinal or spindle-shaped soft tissue strands [24], it is still controversial how the changes in B-mode US with no abnormalities in elastography or changes in elastography with no altered B-mode image [25] should be interpreted.
In addition, elastography imaging has been used to assess many muscle pathologies, including muscular dystrophy [26] or myositis [27], stiffness differences after exercise [28], or between patients and controls [29]. Furthermore, myofascial trigger points (MTrPs) have been assessed with different US imaging methods, since the manual identification of MTrP shows a poor reliability [30] and imaging techniques to visualize MTrP twitch response and changes in the stiffness are needed. Previous studies observing MTrPs using B-mode US were consistent in describing MTrPs as hypoechoic regions [31] and using Doppler US found specific puslatility index response in MTrPs [32].
Furthermore, one study conducted by Jafari et al. [33] assessed MTrP stiffness by using elastography imaging to quantitatively distinguish MTrP from normal tissue and obtained significant differences between MTrPs and the normal part of the muscle. However, a recent study found no stiffness differences between active and latent MTrPs or control points with MTrPs [12]. The methodology assessed on this study could be applied to calculate the SR between an MTrP control point or two control points (since it showed a good validity and reliability), its correlation with pain pressure thresholds and SR changes after treatment.
Like in tendons, a previous study reported poor reproducibility of elastography in skeletal muscles [34], probably due to non-standardized contraction/relaxation state of the muscle or image soft tissue anisotropy.

Limitations
Finally, this study has some limitations. This study was performed assessing an artificial material under optimal conditions. We do not know if similar results would be observed in real subjects (e.g., image anisotropy due to round morphology of muscle bellies or vessels, or non-standardized state of the muscle and subject). Although reliability of SR calculation using strain elastography can be calculated in real subjects, validity cannot be analyzed with semi-quantitative elastography as all data obtained are relative and the real stiffness difference is unknown. Therefore, it is important to emphasize the need for further research comparing these semi-quantitative analyses with quantitative results expressed in metric units in real subjects.
In addition, just one homogeneous material, one ultrasound machine and one transducer were used. There is a need of evaluation in more homogeneous materials using different machines/probes to confirm our findings.

Conclusions
Although previous studies have assessed utility aspects of semi-quantitative strain elastography to calculate SR, most studies tackled references and targets with significant stiffness differences [8,9,11,13]. However, recent research focusing on MTrPs demonstrated that the magnitudes of the stiffness differences in real tissues are smaller than the ones analyzed in phantoms [12]. As previous reports demonstrated that the accuracy of strain elastography is directly proportional to the stiffness difference between two points, the main novelty of this study lies in the selection of a homogeneous material to calculate the reliability between examiners and the instrument validity considering the region of interest location and the experience of the examiners. We found that semi-quantitative elastography SR calculation shows an acceptable inter-examiner reliability, validity and variability of measurement. Reliability, validity and variability were similar independent of the examiner ultrasound and transducer use experience. However, our results suggested that central areas are more reliable and accurate than lateral areas of the image. This paper proposes technical considerations regarding the experience of the examiners and the most accurate region of interest for future studies assessing SRs of soft tissues with small stiffness differences.

Data Availability Statement:
The data that support the findings of this study are available from the corresponding author (JA Valera-Calero), upon reasonable request.