1. Introduction
Neck pain is the second most common musculoskeletal condition and the fourth leading cause of disability worldwide [
1]. The annual prevalence in Europe is 6.5% [
2]. Of these cases, 20% progress to chronic neck pain. Chronic neck pain is more frequently observed in women, with the most affected age group being 45–54 years [
3]. Altered muscle function and morphological changes in these muscles are recognized features of painful neck disorders [
4]. The assessment of muscles in the cervical region presents unique anatomical and technical challenges, including smaller muscle size, deeper location, and close proximity to vascular and neural structures, all of which may impact measurement precision. Understanding the morphology of deep cervical muscles such as the Longus Colli, Multifidus Cervicis, and Semispinalis Capitis is essential for evaluating movement dysfunction and pain in the cervical spine. These muscles belong to the local stability system, which is responsible for segmental control and maintaining neutral joint position [
5,
6,
7].
Accurate and reliable examination of muscle morphology is essential for both clinical decision-making and research. Musculoskeletal ultrasound imaging has emerged as a widely used modality for assessing muscle quantity due to its non-invasive nature, accessibility, and cost-effectiveness. It allows for real-time visualization of muscle structure and is particularly useful for repeated assessments over time [
8,
9,
10,
11]. However, the utility of ultrasound is contingent on the consistency of its measurements, making the establishment of intra- and inter-rater reliability a critical prerequisite for its application. Recently, ultrasound automation and AI-based segmentation represent interconnected advances transforming medical imaging [
12]. AI-driven solutions now facilitate automated image analysis, enabling precise lesion detection across various tissues. Javanshir et al. [
13] reviewed 16 studies investigating the use of ultrasonography for assessing cervical muscles, which showed great potential despite identifying significant methodological considerations. They suggested using consistent landmarks, knowledge of anatomy and the function of target muscles, and the proper definition of muscular borders to help obtain a clearer image. Furthermore, they highlighted the use of standardized subject positioning, the correct placement of the transducer, and the use of multiple images for statistical analysis to improve results.
Normalization improves the validity of comparisons by controlling for body size, enabling clearer interpretation of muscle health or function. The normalization of ultrasonographic measurements with body mass index (BMI) is crucial, particularly when assessing neck muscles between healthy and pathological populations. However, there is a need to explore new methods of normalization, as current approaches may not adequately account for the unique characteristics of the cervical region.
Therefore, the aim of this study was to (a) determine the intra-rater and test–retest reliability of ultrasonographic measurements of anatomical characteristics of the Longus Colli, Sternocleidomastoid, Multifidus Cervicis, and Semispinalis Capitis in healthy individuals and (b) test the reliability of the same measurements after their normalization with BMI and neck circumference.
4. Discussion
The aim of this study was to examine the reliability of musculoskeletal ultrasound in measuring the size of the Longus Colli, Sternocleidomastoid, Semispinalis Capitis, and Multifidus Cervicis. The intra-rater reliability was good for measuring the CSA of the Longus Colli and excellent for the CSA of the Sternocleidomastoid, Multifidus Cervicis, and Semispinalis Capitis. The APD showed moderate reliability for the Multifidus Cervicis, good reliability for the Longus Colli, and excellent reliability for the Sternocleidomastoid and Semispinalis Capitis. The LD was good for the Longus Colli, Multifidus Cervicis, and Semispinalis Capitis, whereas it showed excellent reliability for the Sternocleidomastoid. On the other hand, the test–retest reliability showed excellent reliability for the Semispinalis Capitis and Sternocleidomastoid, but good reliability for measuring the CSA of the Longus Colli and Multifidus Cervicis. The APD was found to be good for the Longus Colli, Multifidus Cervicis, and Semispinalis Capitis, and it was excellent for measuring the APD of the Sternocleidomastoid muscle. LD measurements were good for all muscles.
The observed variations in test–retest reliability may be attributed to inconsistent probe placement, as the retest process did not include the use of an inclinometer. Normalization shows minimal effects on intra-rater reliability for most of the morphological characteristics tested. However, normalization using BMI tends to provide better overall consistency for measurements of the CSA among the deep neck muscles, whereas neck circumference normalization had a more pronounced effect on the examination of dimensions, especially on the Multifidus Cervicis muscle. To our knowledge, this was the first time that an ultrasonographic study used neck circumference for normalization, and it seems that it can be successfully applied to the assessment of the size of neck muscles, as it appears to be more relevant to this specific anatomical region.
Previous research has examined the reliability of ultrasound measurements for the Longus Colli (LCo) muscle, with varying results across studies. Cross-sectional area (CSA) measurements have shown reliability ranging from moderate to excellent. One study of 27 healthy participants [
17] reported moderate intra-rater reliability (ICC = 0.71, 95% CI: 0.57–0.81) when measuring at the C5–C6 level. However, other studies have found stronger reliability. Javanshir et al. [
24] demonstrated good reliability (ICC = 0.90–0.93) in 15 subjects, while their subsequent study [
25] of 20 subjects showed excellent reliability (ICC = 0.95). Additional research has supported these positive findings, with one study [
26] reporting good reliability (ICC = 0.90, 95% CI: 0.94–0.98), and Nagai et al. [
27] finding good reliability (ICC = 0.84, 95% CI: 0.63–0.94) in 24 participants. All reviewed studies measured the LCo muscle at the C5–C6 cervical level with participants in the supine position, matching our study’s methodology. None of the reviewed studies normalized their ultrasonographic findings; therefore, direct comparisons to our data could not be made.
Although previous studies have examined the APD of muscles in a supine position, they used a different procedure and performed the scanning at a different cervical level. In a study by Pirri et al. [
9], the APD of the Sternocleidomastoid was found by using the landmarks of the carotid artery and internal jugular vein in the middle of the neck, under the mandibular bone. In total, 16 healthy participants were assessed, and the results for intra-rater reliability showed good reliability (ICC = 0.89, 0.85–0.92). In another study on healthy participants [
28], the intra-rater reliability of measuring the APD of the Sternocleidomastoid was good (ICC = 0.88) from a resting position. The procedure used to find the cervical level for scanning ran halfway along the distance from the mastoid bone to the clavicular margin, referring to the C5-C6 level. In another study [
29] on 30 healthy participants, the intra-rater reliability of measuring the APD of the Sternocleidomastoid was moderate for the right (ICC = 0.65) and good for the left side (ICC = 0.85) at the C5 level.
For the measurement of the Semispinalis Capitis, two studies have examined its intra-rater reliability. The first study [
30] found excellent reliability (ICC = 0.93) when the scanning was performed at C2 and examining the APD of SSCap at rest, while the second study [
8] found good reliability between the first three scans (ICC = 0.89, 0.75–95). For the intra-rater reliability of morphological characteristics of the Multifidus Cervicis, one study [
8] used the C5 level and found good reliability for measuring CSA (ICC = 0.81, 0.70–0.89) in healthy participants, whereas another study [
31] used the C4 level and found excellent reliability (ICC = 0.97, 0.92–0.98) in healthy participants. Both studies used the prone position. On the other hand, one study by Rahnama et al. [
32] used a sitting position for measuring the thickness of the Multifidus Cervicis and found good reliability (ICC = 0.89) in healthy participants. The lack of agreement between some of these studies and our data was due to a loss of clarity of the fascial layer between the Semispinalis Cervicis and Multifidus muscles; most of the studies also recognize this challenge.
In the intra-rater reliability analysis for cervical muscle characteristics, the SEM values exhibited variability across different muscles. In the Longus Colli, the SEM for CSA was relatively low at 0.1 cm2, indicating a stable measurement, whereas the lateral dimension showed higher variability, with an SEM of 0.15 cm. The Sternocleidomastoid displayed relatively consistent values, with SEMs for CSA and lateral dimension at 0.1 cm2 and 0.08 cm, respectively. The Semispinalis Capitis and Multifidus Cervicis also demonstrated moderate variability in the lateral dimensions, with SEMs ranging from 0.16 cm in the Semispinalis Capitis to 0.11 cm in the Multifidus Cervicis. When muscle characteristics were normalized by BMI, the SEM values generally decreased, reflecting improved reliability. For example, the Longus Colli CSA showed a significant reduction in SEM, with a value of 0.003 cm2 compared to the non-normalized SEM of 0.1 cm2 (9.61%). This trend was consistent across other muscle characteristics. On the other hand, when the muscle characteristics were normalized using neck circumference, the SEM values generally increased compared to BMI normalization. For instance, the SEM for the Longus Colli CSA increased to 0.005 cm2 (18.51%) from 0.1 cm2 (9.61%). Similar patterns were observed for the Multifidus Cervicis and Semispinalis Capitis. This suggests that while neck circumference standardization provides some adjustments for muscle size variability, it introduces greater variability.
When applying our predefined thresholds (SEM < 10% and SDD < 30% for adequate reliability; SEM < 5% and SDD < 15% for exceptional reliability), distinct patterns emerged across muscles and measurement conditions. Exceptional reliability (SEM < 5%, SDD < 15%) was achieved exclusively by the Sternocleidomastoid muscle measurements, specifically for the anteroposterior dimension (SEM: 1.78%, SDD: 4.85%), lateral dimension (SEM: 1.76%, SDD: 4.84%), CSA (SEM: 2.77%, SDD: 7.69%), and APD/LD ratio (SEM: 3.04%, SDD: 8.38%). When standardized using BMI, these measurements maintained exceptional performance. Adequate reliability (SEM < 10%, SDD < 30%) was met by multiple measurements. All Sternocleidomastoid measurements, Semispinalis Capitis (CSA: SEM 6.24%, SDD 16.76%; APD: SEM 6.04%, SDD 16.58%; LD: SEM 4.41%, SDD 11.70%), and most of the Longus Colli measurements, except the APD/LD ratio, exceeded adequate thresholds. The Multifidus Cervicis showed mixed results, with the lateral dimension achieving adequate reliability (SEM: 4.39%; SDD: 11.33%) while the anteroposterior dimension failed both thresholds (SEM: 10.72%; SDD: 28.42%). These results show that most ultrasonographic measurements had low measurement errors and can be applied in clinical practice as a reliable method for examining neck muscle size.
4.1. Limitations
One limitation of this study is the use of a non-blind examiner, which may introduce bias in the data processing of the measurements. The same examiner who performed the data collection also conducted the data processing using the US system measurement tools, such as the region of interest and a caliper. To limit this bias, all data processing work took place at least one month after the data collection phase and was stored under code names. Another limitation of the present study was that test–retest reliability was performed on the same day, 30 min after the initial measurements. Although test–retest reliability over one or more days is generally preferable due to the reduction in possible testing effects such as memory of previous responses, practice effects, and fatigue, we decided to perform the retest session on the same day for several practical and methodological reasons. First, this approach minimized the possibility of potential history effects, including changes in participants’ physical or psychological states or exposure to external events that could influence their responses. Second, same-day testing reduced the risk of participant drop-out, which could have introduced attrition bias and reduced our sample size. While we acknowledge that this same-day approach may have inflated our reliability estimates due to recall bias and insufficient time for true stability assessment, we believe this decision represented the best balance between methodological rigor and practical feasibility given our study constraints. Furthermore, the ultrasound procedure was very passive and therefore no significant learning or fatigue effects are expected. Another limitation was the lack of an inter-rater reliability examination. The authors acknowledge that inter-rater reliability is crucial when multiple raters make subjective judgments. High inter-rater reliability would demonstrate that different raters reach similar conclusions, thus strengthening the validity of the results. Another limitation of this study is that the sample was restricted to healthy young adults, which may limit the generalizability of the findings to clinical populations, such as those with chronic musculoskeletal pain. However, the examination of the reliability of musculoskeletal ultrasound in measuring the muscle size of cervical muscles in healthy people was the necessary first step, as the resulting error in the measurement can be attributed exclusively to the procedure and the raters and is free from the error induced by the clinical condition of musculoskeletal pain in patients.
4.2. Clinical Implications
The findings from this study highlight the value of musculoskeletal ultrasound in assessing the morphological characteristics of cervical muscles, providing a reliable tool for time-efficient and easy clinical applications. Its high intra-rater and test–retest reliability, particularly for recording CSA and dimensions, suggest that ultrasound can be used effectively to monitor muscle size changes in clinical settings, such as in rehabilitation programs for neck pain or dysfunction. Additionally, the inclusion of normalization techniques, particularly using neck circumference, provides the opportunity to assess the relative size of cervical muscles, which is very useful when comparing the muscle size of cervical muscles between individuals or populations with different anthropometric characteristics. To improve reliability, the authors recommend consistent use of anatomical landmarks, a thorough understanding of the structure and function of the target muscles, and clear delineation of muscle borders to enhance image clarity. They also emphasize the importance of standardized subject positioning, correct transducer placement, and acquiring multiple images to strengthen the robustness of statistical analyses.
4.3. Recommendations for Future Research
Our study provides clinicians with a recommended ultrasound scanning protocol for assessing three deep neck muscles—the Longus Colli, Semispinalis Capitis, and Multifidus Cervicis—as well as a practical method for examining the local CSA of the Sternocleidomastoid. This approach may be helpful in future studies aiming to compare deep versus superficial anterior neck muscles. On the other hand, tools such as the inclinometer, which facilitate optimal probe placement, and practical training on a digital scale for applying consistent force during probe positioning, were two new approaches that proved helpful during the data collection phase. In terms of reliability, both intra-rater reliability and test–retest reliability were comparable. Based on this, we recommend that for future studies, the same examiner perform all ultrasound examinations, particularly if repeated scans are required for each individual muscle. Additionally, we recommend using more anatomical landmarks to identify cervical levels. Future research should further verify the reliability of ultrasound for measuring the morphological characteristics (CSA, APD, LD, and ratios) of cervical muscles, especially in clinical populations such as patients with chronic conditions. Building upon current studies, it is crucial to extend the exploration of reliability to other key cervical muscles to further understand the generalizability and robustness of musculoskeletal ultrasound in this context. Additionally, studies that incorporate normalized muscle size measurements, particularly in randomized controlled trials, will offer valuable insights. These approaches may reveal different patterns or trends that could inform clinical practice. Investigating these aspects in depth could lead to more nuanced conclusions regarding the role of ultrasound in assessing cervical muscle health, with implications for improving diagnostic accuracy and treatment protocols.