The Influence of Sonographer Experience on Skeletal Muscle Image Acquisition and Analysis

The amount of experience with ultrasonography may influence measurement outcomes while images are acquired or analyzed. The purpose of this study was to identify the interrater reliability of ultrasound image acquisition and image analysis between experienced and novice sonographers and image analysts, respectively. Following a brief hands-on training session (2 h), the experienced and novice sonographers and analysts independently performed image acquisition and analyses on the biceps brachii, vastus lateralis, and medial gastrocnemius in a sample of healthy participants (n = 17). Test–retest reliability statistics were computed for muscle thickness (transverse and sagittal planes), muscle cross-sectional area, echo intensity and subcutaneous adipose tissue thickness. The results show that image analysis experience generally has a greater impact on measurement outcomes than image acquisition experience. Interrater reliability for measurements of muscle size during image acquisition was generally good–excellent (ICC2,1: 0.82–0.98), but poor–moderate for echo intensity (ICC2,1: 0.43–0.77). For image analyses, interrater reliability for measurements of muscle size for the vastus lateralis and biceps brachii was poor–moderate (ICC2,1: 0.48–0.70), but excellent for echo intensity (ICC2,1: 0.90–0.98). Our findings have important implications for laboratories and clinics where members possess varying levels of ultrasound experience.


Introduction
The use of ultrasonography for skeletal muscle imaging in the field of kinesiology is growing. This is likely a result of its affordability, validity, and reliability compared to advanced imaging techniques such as magnetic resonance imaging and computerized tomography [1][2][3][4]. Establishing the reliability of skeletal muscle ultrasound is critical since it is commonly used to assesses skeletal muscle adaptations with exercise training [1,2,5], muscle disuse [1,6], aging [7][8][9][10] and disease [11,12]. Considerable evidence shows that ultrasound measurements demonstrate acceptable intra and interrater reliability [11,[13][14][15][16]. However, the extent to which relative experience with ultrasound image acquisition and analyses influences its outcomes is not well defined.
A common scenario within a research laboratory or clinic is that its members possess varying levels of experience with a technique such as ultrasonography. This presents a challenge, as a critical aspect in longitudinal studies or patient evaluation relates to the feasibility of a single experimenter performing all ultrasound scans and analyses. The fact that relatively minor changes in probe orientation [17], pressure applied to the skin [18], or even scale calibration [16] during analysis can have marked effects on the outcomes illustrates the need to identify interrater reliability between vastly different ultrasound experience levels. Mayer et al. [11] have recently shown that following 8 h of expert-led ultrasound training, a small group of ultrasound-naïve physical therapy students had reliable ultrasound measurements compared to an expert sonographer. Importantly, the scans were performed on a group of patients within an intensive care unit, a group that had recovered from intensive care, and a healthy control, but the group sample sizes were small (n = 6), several raters (n = 5) of varying experience levels performed acquisition, and the influence of ultrasound analysis experience on the outcomes was not determined. Clearey et al. [19] recently show excellent interrater image analysis reliability for cross-sectional area, muscle thickness, and echo intensity in a group of novices when images were captured by the same, experienced sonographer. When examining interrater reliability for both image acquisition and analysis between novice and expert sonographers, Zaidman et al. [12] show similar outcomes between sonographers but these data were solely on the echo intensity values. Overall, the evidence suggests that ultrasound-derived measurements of skeletal muscle size and quality exhibit acceptable interrater reliability, yet there is insufficient data on how the relative experience of a sonographer influences both image acquisition and the analysis of muscle cross-sectional area, muscle thickness, subcutaneous adipose tissue thickness, and echo intensity.
The present experiment addresses how experience with ultrasonography influences image outcomes by identifying the interrater reliability for ultrasound image acquisition and the analyses between experienced and novice sonographers. An experienced and a novice sonographer performed image acquisition and the subsequent analyses were performed by an experienced and novice image analyst. Outcome measures consisted of muscle thickness, cross-sectional area, subcutaneous adipose tissue thickness, and echo intensity of three commonly studied muscles in the fields of kinesiology-the biceps brachii, vastus lateralis, and medial gastrocnemius muscles.

Study Design
A cross-sectional study design was used to examine the role of B-mode ultrasonography experience on the reliability of ultrasound-derived measurements of muscle thickness, cross-sectional area, echo intensity, and subcutaneous adipose tissue thickness. During a single visit to the University of Central Florida Institute of Exercise Physiology and Rehabilitation Science, participants underwent ultrasound imaging of the biceps brachii, vastus lateralis, and medial gastrocnemius muscles. An experienced and novice sonographer performed image acquisition and an experienced and novice image analyst performed image analysis. The order of testing between sonographers and muscles was randomized with a random number generator. Participants refrained from exercise for ≥24 h before their laboratory visit. All participants signed their Informed Consent, and this study was approved by the Institutional Review Board for Human Subjects at the University of Central Florida (IRB # STUDY00003175).

Participants
A total of 19 participants volunteered for this study and 17 were retained for analyses. The experienced and novice performed the scans together on the first two participants (one female and one male) as part of the hands-on training. Exclusion criteria were limited to neuromuscular or metabolic disease, a history of stroke, cancer, or heart attack, significant musculoskeletal pain, and use of medications that may impact physical performance.

Sonographers
An experienced (M.S.) and novice (J.C.) sonographer performed all ultrasound scans with the participants on a treatment table. The novice sonographer had never performed ultrasound measurements and was completely naïve to the methods, procedures, and requisite skills necessary to acquire ultrasound images. A brief custom-made video was crafted by an experienced sonographer (G.G.) on the research team regarding the basics of ultrasound image acquisition (Supplementary Materials). On the day of data collection, before acquisition, the experienced sonographer (M.S.) provided the novice with one-onone instruction regarding the LOGIQ-E software interface, probe orientation, and scanning tips and pitfalls for the imaged muscles. Hands-on training was then accomplished by having the experienced (M.S.) and novice (J.C.) perform the ultrasound scans together on the first two participants. Following this, the experienced (M.S.) and novice (J.C.) performed all scans independently. In total, the novice (J.C.) had less than two hours of instruction before performing the scans without guidance or instruction. At the time of the experiment, the experienced sonographer (M.S.) had approximately seven years of experience with musculoskeletal sonography in adolescents, adults, and the elderly.

B-Mode Ultrasonography Image Acquisition
All images were taken from the right side of the participants while supine for the biceps brachii and vastus lateralis imaging and prone for the medial gastrocnemius assessment. The images were recorded with a portable B-mode imaging device (GE Logiq E BT12, GE Healthcare, Milwaukee, WI, USA) and a multi-frequency linear array probe (12 L-RS, 5-13 MHz, 38.4 mm field of view, GE Healthcare, Milwaukee, WI, USA) was used for the vastus lateralis, whereas a wideband linear array probe (L8-18i-RS, 4.5-18 MHz, 25 mm field of view, GE Healthcare, Milwaukee, WI, USA) was used for the biceps brachii and medial gastrocnemius muscles. All settings were kept consistent (Frequency 10 MHz, Gain 55 dB, Dynamic range 72, Depth 5 cm) across and within participants; however, a depth of 6 cm was required for three participants to view the full muscle and was kept constant across sonographers. Once the site was identified for the respective muscle, sharpie was applied to the skin surface before image acquisition and both sonographers used the identified site for probe placement. For each muscle, still images were captured in the sagittal and transverse planes, and then panoramic images were captured with the panoramic function (LogiqView, GE Healthcare, Milwaukee, WI, USA). Three images were captured for each scan for every muscle. For the panoramic images, the probe was oriented in the transverse plane and was guided by a flexible high-density foam pad to allow steady transverse movement of the probe across the imaging areas. For the biceps brachii, cloth tape was used to identify the 50% distance from the acromion process to the antecubital space. Similarly, the 50% distance between the greater trochanter and the superior border of the patella was used for the vastus lateralis. The site for the medial gastrocnemius muscle was determined on an individual-by-individual basis due to the large heterogeneity of the lower limb [20,21]. The site was identified by scanning in the transverse and sagittal planes of the muscle and visually identifying the site with the largest muscle thickness. A considerable amount of water-soluble transmission gel (Aquasonic 100 ultrasound transmission gel, Parker Laboratories, Inc., Fairfield, NJ, USA) was applied to the skin for all imaging.

Image Analysts
An experienced (G.G.) and novice (C.V.) image analyst were not present during data collection, therefore, were blind to the image coding and were unaware of who acquired the images. A brief custom-made video was crafted by an experienced sonographer (G.G.) on the research team regarding the basics of ultrasound image analyses (Supplementary Materials). The experienced analyst (G.G.) instructed the novice on the procedures for the image analyses, addressing important steps and common challenges for the derived measurements. Hands-on training was then accomplished by having the experienced (G.G.) and novice (C.V.) perform the ultrasound analyses together on the first two participants. Following this, the experienced (G.G.) and novice (C.V.) performed all analyses independently. In total, the novice (C.V.) had less than two hours of instruction before analyzing the images without guidance or instruction. At the time of the experiment, the experienced analyzer (G.G.) had approximately six years of experience with musculoskeletal ultrasound image analyses in adolescents, adults, and the elderly.

B-Mode Ultrasonography Image Analysis
The ultrasound images were exported and analyzed with ImageJ software (version 1.53k; National Institutes of Health, Bethesda, MD, USA). The experienced analyst (G.G.) visually inspected the three images taken for each muscle and site (e.g., biceps brachii sagittal) and selected the clearest image of the three for analysis. The same selected image was then analyzed by the experienced (G.G.) and novice (C.V.) image analysts. The images were first scaled from pixels to cm using the straight-line function. Muscle thickness, in both sagittal and transverse plane, was quantified using the straight-line function at the midpoint of the muscle on the freeze-frame image. To quantify muscle cross-sectional area (cm 2 ), the polygon function was used to outline the border of each muscle without any surrounding fascia on the panoramic image. Echo intensity was determined via gray-scale analysis using the histogram function within the same polygon used for cross-sectional area analyses. Using the same image that muscle cross-sectional area and echo intensity were outlined on, subcutaneous adipose tissue thickness was quantified using the straight-line function at three sites (medial, midpoint, lateral) from the skin to the superficial aponeurosis and calculated as the average of the three values.

Statistical Analysis
Descriptive statistics have been reported as the mean ± SD for the following five variables: (1) muscle thickness in the sagittal plane, (2) muscle thickness in the transverse plane, (3) cross-sectional area, (4) echo intensity, and (5) subcutaneous adipose tissue thickness. Paired samples t-tests were performed to examine systematic variability, with an alpha level of 0.05 used to determine statistically significant differences. To provide insight into the precision and magnitude of the estimated differences, 95% confidence intervals (CIs) and Cohen's d effect sizes were computed, respectively. Cohen's d values of 0.2, 0.5, and 0.8 were used to classify small, moderate, and large differences, respectively [22]. The method of Bland and Altman [23] was used to identify the 95% limits of agreement between the experienced versus novice sonographers and image analysts. Reliability was quantified with intraclass correlation coefficients (ICCs) and computed with the 2-way random-effects model (ICC 2,1 ) on account of its generalizability to other laboratories and testers [24,25]. The ICCs were evaluated based on a reliability scale where ICCs < 0.50 indicated "poor" reliability, ICCs of 0.50-0.75 indicated "moderate" reliability, ICCs of 0.75-0.90 indicated "good" reliability, and ICCs > 0.90 were indicative of "excellent" reliability [26]. The mean square error was used to calculate the absolute standard error of the measurement (SEM [expressed in absolute units and as a percentage of the grand mean]), and the minimal difference needed to be considered real (MD) statistics [25]. Table 1 shows the mean ± SD and % mean difference values for each of the five variables across the three different muscles. Generally, the images acquired by the novice sonographer resulted in larger thickness and cross-sectional area values, with mean differences ranging from 0.38 to 26.47%. Table 2 displays the reliability statistics. Statistically significant differences (p < 0.05) were observed for images acquired by the experienced versus novice sonographer for vastus lateralis echo intensity (p = 0.002), medial gastrocnemius cross-sectional area (p = 0.035), and biceps brachii subcutaneous tissue thickness (p = 0.002), and these associated effect sizes were considered moderate or large. All non-significant differences were also associated with small or trivial effect sizes (d ≤ 0.456). Based on the ICCs, all variables were classified as demonstrating good-excellent reliability except medial gastrocnemius echo intensity (0.643) and biceps brachii echo intensity (0.437), and subcutaneous tissue thickness (0.740). Variables showing particularly poor SEMs included vastus lateralis subcutaneous tissue thickness (12.38%), biceps brachii sagittal thickness (11.97%), and subcutaneous tissue thickness (17.74%). Figure 1 shows example data, highlighting differences between sonographers.     Table 3 shows the mean ± SD and % mean difference values for each of the five variables across the three different muscles. Qualitatively, there were no consistent patterns of lower or greater values being demonstrated across variables or muscles. The mean differences ranged from 0.04 to 14.11%. Table 4 displays the reliability statistics. Statistically significant differences (p < 0.05) were observed for images analyzed by the experienced versus novice investigator for vastus lateralis cross-sectional area (p = 0.005) and biceps brachii transverse thickness (p = 0.029). Only vastus lateralis cross-sectional area demonstrated an effect size that was considered moderate (d = 0.514). All other effect sizes were considered small (d ≤ 0.393). Based on the ICCs, five of the variables demonstrated moderate reliability and one variable showed poor reliability. Eight out of the 15 variables demonstrated SEMs ≥ 10.0%. Figure 2 shows example data, highlighting differences between image analysts.

Discussion
This experiment describes how the relative experience with ultrasound image acquisition and analyses influences outcomes of skeletal muscle size and quality. The findings from this study show that experience with image acquisition and analysis generally has small effects (d < 0.30) and good-excellent (ICC 2,1 > 0.80-0.98) interrater reliability on most ultrasound outcomes. However, significant differences and large effects between experienced and novice sonographers were observed for image acquisition and analysis for some of the variables. For image acquisition, medial gastrocnemius cross-sectional area, vastus lateralis echo intensity, and biceps brachii subcutaneous adipose tissue thickness were significantly different between sonographers. Despite this, experienced-novice interrater reliability for measures of muscle size was good-excellent (ICC 2,1 > 0.82-0.98). Similarly, for analysis, vastus lateralis cross-sectional area and biceps brachii muscle thickness were significantly different between sonographers, and measures of muscle size for the vastus lateralis exhibited poor reliability (ICC 2,1 < 0.65), yet interrater reliability for echo intensity was excellent (ICC 2,1 > 0.90) for all muscles. The present data show that relative experience with ultrasound techniques has a task-specific influence on image outcomes that should be considered when designing and interpreting ultrasound-based assessments.
There have been limited attempts to quantify the influence of ultrasound experience on image outcomes [11,12,19]. The rationale for this comparison is based on the fact that a laboratory or clinic has a continual rotation of proficiency levels across time. Identifying reliability between high and low experience levels is necessary for study design and coordination. The present findings generally agree with similar reliability studies that have compared individuals with extensive versus limited ultrasound experience [11,12,19]; however, we report greater systematic variability between experienced versus novice sonographers than previously shown [11,12]. The strengths of the present data are that we show how ultrasound acquisition and analysis experience influences the outcomes for the variables used to determine muscle size and quality-muscle cross-sectional area, muscle thickness, subcutaneous adipose thickness, and echo intensity. The poor interrater reliability for echo intensity during image acquisition is likely an artifact of differences in angle, placement, and possibly the speed of the ultrasound probe during acquisition. Whereas the poor interrater reliability for vastus lateralis and biceps brachii muscle size analyses is likely explained by the inability of the novice to discern fascial borders due to their limited experience and challenging shapes of these muscles. As such, the level of skill that is required to accurately acquire ultrasound images may be muscle and variable dependent.

Muscle Cross-Sectional Area
Ultrasound-derived measurements of muscle cross-sectional area have been crossvalidated against MRI and CT imaging and show good agreements [2][3][4]. It has been suggested that sonographer proficiency is needed when collecting panoramic images with the extended field of view technique to acquire high-quality images [2]. Indeed the degree of muscle curvature can present challenges and data show that reliability weakens at regions with greater relative curvature, such as the distal portion of the thigh [2,3]. The present data support these observations as the medial gastrocnemius shows greater systematic variability compared to the biceps brachii and vastus lateralis. Nevertheless, excellent interrater reliability is shown for the vastus lateralis (ICCs > 0.90) [2][3][4], the medial gastrocnemius (ICCs > 0.90) [5], and the biceps brachii (ICCs > 0.90) [11] muscles. Our findings extend these observations by showing that a novice sonographer can acquire reliable extended field of view images in these muscles, but experienced-novice disparity increases with technical demand likely due to greater anatomical contour. Interestingly, the experienced-novice comparison for image acquisition versus image analysis shows that reliability was substantially weaker for image analysis. This is an important finding because it demonstrates that extended field of view image analyses requires sonographer proficiency in addition to the skills required for high-quality image acquisition. The present data show that the minimal differences for image acquisition (MD = 2.94, 2.54, 2.76 cm 2 ) were smaller than those of analysis (MD = 12.97, 7.85, 3.08 cm 2 ) for the vastus lateralis, biceps brachii, and medial gastrocnemius, respectively. There are three critical points to consider based on these data. One, the SEM for experienced-novice image acquisition (0.92-1.06 cm 2 ) is similar to that shown for the SEM of ultrasound compared to MRI (0.87 cm 2 ) [4] and CT (0.1-1.1 cm 2 ) [2]. Two, the minimal differences for experienced-novice image acquisition are likely small enough to detect resistance training-induced muscle hypertrophy and disuse induced atrophy for the lower limb [1,2,4]. Finally, three, the low level of interrater reliability for image analysis would have been unable to detect those effects. Similar outcomes have recently been shown for image analysis interrater reliability in novice sonographers [19]. Despite excellent ICC values, systematic variability was evident for measures of muscle size for the vastus lateralis, rectus femoris, and first dorsal interosseus among three different novice sonographers [19]. Collectively, it seems that experience level affects extended field of view image analysis more than image acquisition.

Echo Intensity and Subcutaneous Adipose Tissue Thickness
The grayscale analysis that determines skeletal muscle echo intensity values is affected by relatively minor alterations in probe positioning mechanics [17]. Given this, it is unsurprising that interrater reliability for echo intensity during acquisition was generally poor-moderate and weaker than the interrater reliability values for image analysis. Zaidman et al. [12] suggest that minimal training is necessary to acquire reliable and clinically valid measures of echo intensity. The authors show that following only a 20-min expert-led training session, interrater reliability for echo intensity between a novice and expert was highly reliable (ICC ≥ 0.85). It is important to point out that echo intensity was derived from polygon tracing of panoramic images in the present study [19], not the rectangle function from still images [11,12]. This distinction likely explains the differing levels of interrater reliability for image acquisition between experience levels in the present study and others [11,12]. The interrater reliability values of the present study for image analysis are similar to those reported for three novice sonographers [19]. The experienced-novice difference in subcutaneous adipose tissue thickness for the biceps brachii during image acquisition is additional evidence that probe mechanics differed between sonographers, emphasizing the importance of probe pressure and alignment during image acquisition. Although the reliability statistics for echo intensity during image acquisition were weaker than those for analysis, the values for the vastus lateralis are comparable to test-retest reliability measures performed by the same, experienced sonographer [6]. Nevertheless, combining the observations of experienced-novice differences in subcutaneous thickness with the comparatively higher SEM and MD values for echo intensity during image acquisition versus analysis, it seems that a novice sonographer requires more than minimal training to acquire, but not analyze, reliable measures of echo intensity.

Muscle Thickness
Jenkins et al. [14] suggest that given the greater skill required for extended field of view imaging, transverse imaging may be a more convenient option for measurements of muscle size. In support, the authors show excellent reliability for cross-sectional area, muscle thickness, and echo intensity determined from panoramic and a single transverse image for the biceps brachii with strong a strong association between muscle cross-sectional area and muscle thickness (r = 0.93). The interrater image acquisition reliability values for muscle thickness obtained in the transverse and sagittal planes show good-excellent reliability with minimal difference values sensitive enough to detect resistance training-induced increases in muscle thickness following longer (>6 weeks) training interventions [5,27,28], but likely not short-term training induced hypertrophy [29]. The SEM for muscle thickness measurements during acquisition are similar to those shown by a single experienced sonographer [30] and those by Mayer et al. [11] with raters of different experience level. Like the issues encountered with muscle cross-sectional area, the interrater reliability for image analysis was considerably weaker for the vastus lateralis and biceps brachii muscles compared to image acquisition. This was not the case for the medial gastrocnemius muscle, likely due to the brightness and clarity of the fascial borders for this muscle compared to the others in this study. Figure 2 shows how the image analysts differed in their muscle size measurements for vastus lateralis cross-sectional area and biceps brachii muscle thickness. The inability of the novice image analyst to consistently identify muscle boundaries and trace faint fascial borders is likely a major factor contributing to the poor interrater reliability for muscle size during image analyses.

Ultrasound Considerations for Novice Onboarding
We recommend that laboratories use structured onboarding procedures for the novice sonographer. We recommend the following considerations for the novice sonographer during onboarding.
Laboratory standards: a video demonstrating image acquisition and analysis procedures for the specific equipment provides an accessible and convenient means to stan-dardize laboratory procedures, techniques, and instructions for the novice (Supplementary Materials). This should be followed up with hands-on experiential learning practices on different skeletal muscles and persons with an experienced sonographer.
Demonstrations: how probe mechanics [16][17][18] influence the ultrasound image should be emphasized during acquisition training. Representative images demonstrating the fascial borders should be provided to the novice during analysis training for the skeletal muscles of interest and their surrounding anatomy, particularly challenging images with faint fascial borders. Since extended field of view scanning and analyses are more technically demanding, these scans should be emphasized and integrated.
Time: the total training time for novice onboarding is challenging to recommend. The formal training time for the present study was~2 h, whereas~8 h of training was performed by Mayer et al. [8] who show greater interrater reliability for image acquisition in a more heterogeneous sample. Regardless, recording the amount of time spent training for the novice provides an objective way to monitor their experience. To this point, keeping a formal time log for everyone may be a convenient method to quantify experience levels within a laboratory or clinic.

Limitations
The present data have some important limitations to consider. First, although the sonographers performed the scans independently, the scans were performed on the same site that was determined (and marked) by the first sonographer. It may be that individuals of differing experience would acquire images from different relative muscle locations, yet this was not entirely the question we were attempting to answer. Another limitation is that we did not capture and compare other ultrasound-derived measurements such as fascicle length and pennation angle between novice and experienced sonographers. However, with the rise in automated analysis methods for these measurements, comparisons between an experienced sonographer and the automated analysis are needed [31]. Lastly, although the present sample of participants varied in their anthropometry, training status, and ethnicity, they were all young adults (<27 years) free of disease, illness, and injury. Future studies are encouraged to identify interrater reliability across a longitudinal intervention (i.e., resistance training, disuse, aging) to describe whether different sonographers capture the same magnitude of the respective effect.

Conclusions
The present data show how experience with ultrasound image acquisition and analysis influences measurements of skeletal muscle size and quality. Since ultrasound imaging is a relatively simple procedure, it offers a lower barrier to entry for skeletal muscle measurements compared to other techniques. This has spurred interest in the question, how much training is required for ultrasound proficiency? Despite the level of convenience for ultrasound image acquisition and analysis, our data show that a tradeoff exists for interrater reliability between experienced and novice sonographers during image acquisition versus analyses. The experienced-novice comparisons for image acquisition show measures of muscle size can be reliably acquired, but measurements of muscle quality cannot. Whereas experienced-novice comparisons for image analyses indicate that measures of muscle quality are reliably analyzed, but measurements of muscle size for the vastus lateralis and biceps brachii are not. Many authors have shown that skill is required for highquality ultrasound image acquisition, yet a critical interpretation from the present study is that ultrasound image analyses are not trivial procedures. Comparatively speaking, the experienced-novice differences were more severe for image analyses than image acquisition. Collectively, these findings suggest that ultrasound image analysis experience has a greater influence on the derived outcome variables than acquisition experience. The findings of this study have implications for laboratories that use ultrasonography and possess members of varying experience levels.