1. Introduction
Magnetic resonance arthrography (MRA) of the shoulder is the non-invasive gold standard for assessing intra-articular disorders, particularly glenoid labral lesions associated with glenohumeral instability [
1,
2,
3,
4,
5,
6]. The technique involves direct injection of diluted gadolinium into the joint space, distending the capsule and enhancing visualization of labral tears, cartilage defects, and ligamentous abnormalities. Conventional multiplanar turbo spin-echo (TSE) sequences provide excellent sensitivity for labral pathology but are limited by prolonged acquisition times (20–30 min), motion artifact susceptibility, and restricted anatomical coverage [
7,
8,
9]. These limitations may become critical when detecting subtle sub-millimetric labral abnormalities, while extended scan times reduce patient comfort and scanner throughput. Recent advances in deep learning (DL) algorithms have enabled high-quality image reconstruction from undersampled datasets, facilitating substantial scan time reductions [
10,
11,
12,
13,
14,
15,
16]. These algorithms utilize convolutional neural networks to map undersampled k-space data to fully sampled images, effectively recovering missing information while suppressing aliasing artifacts. In musculoskeletal imaging, DL reconstruction has shown promise in maintaining diagnostic quality despite acceleration. Reschke et al. [
17] demonstrated that DL-based knee MRI sequences achieved up to 4-fold acceleration while preserving diagnostic performance for meniscal and cartilage pathology. Three-dimensional imaging offers theoretical advantages for shoulder arthrography, including isotropic resolution enabling multiplanar reconstructions without quality loss, complete joint coverage eliminating inter-slice gaps, and improved detection of obliquely oriented tears [
18,
19,
20]. However, 3D sequences traditionally require longer acquisition times than 2D protocols. DL acceleration could overcome this limitation by enabling rapid 3D acquisitions with submillimeter isotropic resolution [
18,
19,
20]. Despite these potential benefits, no studies have systematically evaluated DL-reconstructed isotropic 3D sequences for labral pathology assessment in shoulder MRA. Furthermore, accelerated 3D acquisitions may compromise tissue contrast compared to standard TSE sequences, potentially affecting diagnostic accuracy for subtle intra-articular pathology where contrast enhancement is critical. The primary aim was to compare the diagnostic performance of DL-accelerated isotropic 3D MRA sequences against standard 3 mm TSE and PD-FS protocols for detecting glenoid labral lesions. Secondary objectives included assessing rotator cuff abnormalities, bone marrow edema detection, inter-observer agreement, and subgroup analysis based on available sequence combinations.
2. Materials and Methods
2.1. Study Population
This prospective study was conducted at a tertiary referral institution between June 2023 and April 2025. Of the 133 consecutive patients referred for shoulder MR arthrography, 128 were included after excluding 4 with incomplete datasets and 1 with severe metal artifacts. Inclusion criteria comprised a clinical indication for shoulder arthrography and acquisition of at least one DL-accelerated 3D sequence with complete standard 2D sequences. The institutional review board approved the study (Prot. n.33264; Prog. 4287CESC), and all patients provided written informed consent.
2.2. Imaging Protocol
All patients underwent fluoroscopy-guided intra-articular injection of 12–15 mL diluted gadolinium (Gd-DTPA, 1–2 mmol/L) using an anterior approach. MR imaging was performed within 10 min on a 3.0 T system (uMR Omega, United Imaging Healthcare Shanghai, Shanghai, China) with a 16-channel shoulder coil.
Standard 2D protocol included:
TSE T1-weighted: TR/TE = 650/18 ms, 3 mm slices, acquired in axial, coronal oblique, and radial planes (3:42 per plane)
PD-FS: TR/TE = 4800/46 ms, 3 mm slices, acquired in axial and coronal oblique planes (4:52 per plane)
Total acquisition time: 24–28 min
DL-accelerated 3D sequences using Accelerated Compressed Sensing (ACS) with proprietary deep learning reconstruction (uAI):
3D TSE T1: TR/TE = 450/17.58 ms, 0.8 mm isotropic voxels, FOV = 160 × 160 × 120 mm
3D PD-FS: TR/TE = 1500/30 ms, 0.84 mm isotropic voxels, FOV = 160 × 160 × 120 mm
Acceleration factor: 2.5×, acquisition time: 3:26 per sequence
The DL reconstruction algorithm employs a multiscale convolutional neural network trained on paired full/undersampled datasets, providing real-time reconstruction (<1 min) through iterative k-space consistency and image domain refinement.
2.3. Image Analysis
Two musculoskeletal radiologists (15 and 4 years of experience) independently evaluated all studies, blinded to clinical data. Standard 2D sequences served as the reference standard. It should be noted that the reported sensitivity and specificity values reflect agreement with standard 2D MRA rather than absolute diagnostic accuracy, as arthroscopic correlation was not available. After ≥1 month washout, readers assessed 3D sequences using multiplanar reconstructions. Findings were scored 1–4 (definitely absent to definitely present). Readers were aware of the study design, including that 2D sequences would serve as the reference standard. However, several measures were implemented to minimize bias: (1) 3D and 2D interpretations were performed in separate sessions with a minimum 4-week washout period to reduce recall bias; (2) readers documented their findings using a standardized structured reporting template before any comparison; (3) cases were presented in randomized order for each reading session; and (4) readers were instructed to interpret each protocol independently based solely on the imaging findings without reference to their prior assessments.
The glenoid labrum was evaluated in three regions:
Labral tears were diagnosed by a linear hyperintense signal, irregular morphology, or displacement. Bone marrow edema required increased PD-FS signal or corresponding T1 hypointensity. Rotator cuff assessment included tendinopathy, partial, and full-thickness tears. Each reader independently evaluated both 2D (reference) and 3D sequences, with diagnostic performance calculated against their own 2D assessments. This design evaluates within-reader consistency between modalities rather than agreement with a single consensus reference, providing a clinically relevant assessment of each reader’s ability to maintain diagnostic consistency when transitioning to accelerated 3D protocols.
2.4. Subgroup Analysis
Patients were stratified by available 3D sequences:
Group A (n = 46): Both T1 and PD-FS available
Group B (n = 40): T1 only
Group C (n = 42): PD-FS only
The distribution of 3D sequences across patients resulted from practical constraints during the study period. Group A (n = 46) received both 3D T1 and PD-FS sequences when scanner time permitted complete protocol acquisition. Group B (n = 40, 3D T1 only) and Group C (n = 42, 3D PD-FS only) resulted from time limitations when patient scheduling required abbreviated examinations, technical factors necessitating sequence repetition that precluded completion of both 3D acquisitions, or early protocol development phases when sequence parameters were being optimized. This pragmatic approach reflects real-world clinical implementation where complete ideal protocols are not always achievable, and the resulting subgroup analysis provides clinically relevant insights into single-sequence performance when time constraints mandate protocol abbreviation.
2.5. Statistical Analysis
Diagnostic performance metrics were calculated with 95% confidence intervals using the Wilson method. ROC analysis determined optimal thresholds via Youden’s index. Inter-observer agreement used weighted Cohen’s kappa. Subgroup comparisons employed chi-square tests with Bonferroni correction (α = 0.017). Sample size calculation indicated 124 patients needed for 90% expected sensitivity with 5% margin of error. Analyses were performed using SPSS 28.0 and R 4.3.2.
3. Results
3.1. Study Population Characteristics
The initial cohort comprised 133 consecutive patients referred for shoulder MR arthrography. Following application of exclusion criteria, 128 patients (79 males, 49 females) were included in the final analysis. The mean age was 38.4 years (range 17–73 years, SD ± 12.3 years). Clinical indications for arthrography included chronic shoulder pain (n = 67, 52.3%), suspected labral tear following trauma (n = 41, 32.0%), and recurrent instability (n = 20, 15.6%).
3.2. Reference Standard Findings
Analysis of standard 2D MRA sequences revealed a high prevalence of pathology in this referred population. Glenoid labral lesions were identified in 98 of 128 cases (76.6%), distributed as follows: isolated superior labral (SLAP) lesions in 34 cases (34.7%), anteroinferior labral tears (Bankart and variants) in 38 cases (38.8%), posterior labral tears in 8 cases (8.2%), and combined/complex tears in 18 cases (18.4%). The remaining 30 patients (23.4%) showed no significant labral signal alterations. Rotator cuff evaluation revealed abnormalities in 43 of 128 cases (33.6%), including full-thickness tears (n = 12, 9.4%), partial-thickness tears (n = 19, 14.8%), and tendinopathy without a discrete tear (n = 12, 9.4%). The supraspinatus tendon was most commonly affected (n = 31, 72.1% of rotator cuff pathology), followed by subscapularis (n = 8, 18.6%) and infraspinatus (n = 4, 9.3%). Bone marrow edema was present in 50 patients (39.1%), with the following distribution: Hill-Sachs lesions with associated edema (n = 22, 44.0%), glenoid rim edema associated with labral pathology (n = 18, 36.0%), and greater tuberosity edema (n = 10, 20.0%).
3.3. Diagnostic Performance of 3D ACS Sequences
The DL-accelerated 3D sequences demonstrated excellent diagnostic performance for glenoid labral lesion detection (
Table 1). In the overall population, Reader 1 achieved sensitivity of 94.7% (89/94, 95% CI: 88.1–97.8%) and specificity of 100% (34/34, 95% CI: 89.8–100%), yielding an accuracy of 96.1% (123/128, 95% CI: 91.1–98.5%). Reader 2 showed comparable performance with sensitivity of 95.1% (78/82, 95% CI: 88.0–98.4%) and specificity of 100% (46/46, 95% CI: 92.3–100%), resulting in accuracy of 96.9% (124/128, 95% CI: 92.2–98.9%). The difference in lesion identification between readers (94 vs. 82 labral lesions on 2D reference) reflects expected inter-reader variability for subtle labral pathology. Importantly, both readers demonstrated excellent internal consistency between their respective 2D and 3D assessments. Inter-reader agreement for 3D interpretation was excellent (κ = 0.812). ROC curve analysis confirmed the excellent discriminative ability of 3D ACS sequences with an AUC of 0.894 (95% CI: 0.842–0.946,
p < 0.001). The optimal diagnostic threshold identified by Youden’s index was a Likert score ≥ 3, which corresponded to “probably present” or a higher confidence level (
Figure 1). For bone marrow edema detection, 3D sequences achieved an overall sensitivity of 82.9% (34/41, 95% CI: 68.7–91.5%) with perfect specificity of 100% (87/87, 95% CI: 95.8–100%). The slightly lower sensitivity for edema detection likely reflects the inherent contrast characteristics of accelerated 3D sequences compared to dedicated fluid-sensitive 2D imaging. Rotator cuff assessment revealed limitations of 3D sequences for subtle tendon pathology. Overall sensitivity was 75.0% (21/28, 95% CI: 56.6–87.3%) for Reader 1 and 63.0% (17/27, 95% CI: 44.2–78.5%) for Reader 2, though specificity remained at 100% (95% CI: 96.3–100%) for both readers. The lower sensitivity was primarily attributable to missed small partial-thickness tears and subtle tendinopathy. Accuracy results are displayed in
Figure 1 and
Figure 2.
3.4. Subgroup Analysis by Protocol Type
Comparative analysis across the three protocol groups revealed statistically significant differences in diagnostic performance (χ
2 = 8.74,
p = 0.013). Group A, with access to both 3D sequences (T1 and PD-FS), demonstrated superior performance with a sensitivity of 96.8% (30/31, 95% CI: 83.8–99.4%) and accuracy of 97.8% (45/46, 95% CI: 88.7–99.6%). This represented the highest diagnostic yield among all groups (
Table 2). Group B (3D T1 only) maintained high performance with sensitivity of 95.0% (19/20, 95% CI: 76.4–99.1%) and accuracy of 97.5% (39/40, 95% CI: 87.1–99.6%). Group C (3D PD-FS only) showed significantly lower performance with a sensitivity of 87.5% (28/32, 95% CI: 71.9–95.2%) and accuracy of 90.5% (38/42, 95% CI: 77.9–96.2%). The difference in accuracy between Groups A and C was statistically significant (difference 7.3%, 95% CI: 1.2–13.4%,
p = 0.012 after Bonferroni correction). For secondary endpoints, Group A demonstrated a trend toward superior performance in bone marrow edema detection with a sensitivity of 88.9% (16/18, 95% CI: 67.2–96.9%) compared to 78.3% (18/23, 95% CI: 58.1–90.3%) for Group C, though this difference did not reach statistical significance (
p = 0.382). Interestingly, for rotator cuff evaluation, Group C showed paradoxically higher sensitivity of 83.3% (10/12, 95% CI: 55.2–95.3%) compared to Group A at 63.6% (7/11, 95% CI: 35.4–84.8%), suggesting that PD-FS weighting may be particularly valuable for tendon assessment (
Table 3).
3.5. Inter-Observer Agreement
Analysis of inter-observer agreement demonstrated excellent concordance between readers for all evaluated structures. For glenoid labral assessment, weighted Cohen’s kappa was 0.834 (95% CI: 0.754–0.914) using standard sequences and 0.812 (95% CI: 0.728–0.896) with 3D ACS sequences, both indicating excellent agreement. The minimal decrease in agreement with 3D sequences suggests that these acquisitions provide consistent diagnostic information across readers with different experience levels. Bone marrow edema evaluation showed the highest inter-observer agreement with κ = 0.887 (95% CI: 0.812–0.962) for standard sequences and κ = 0.865 (95% CI: 0.785–0.945) for 3D sequences. Rotator cuff assessment demonstrated κ = 0.845 (95% CI: 0.759–0.931) with standard sequences and κ = 0.798 (95% CI: 0.701–0.895) with 3D sequences, the latter at the boundary between substantial and excellent agreement. Overall diagnostic concordance, assessed by Kendall’s coefficient of concordance (W), was 0.912 (
p < 0.001), indicating excellent agreement between readers regardless of protocol type. McNemar’s test revealed no significant differences in diagnostic performance between readers of different experience levels (
p = 0.453), suggesting that 3D ACS sequences can be reliably interpreted by radiologists with varying expertise. Agreement results are shown in
Figure 3.
Figure 4,
Figure 5 and
Figure 6 show three explicative cases.
3.6. Analysis of Diagnostic Confidence
Reader confidence scores, based on the 4-point Likert scale, were analyzed to assess the subjective interpretability of 3D sequences. For definite diagnoses (scores 1 or 4), readers assigned confident scores in 78.3% of cases with 3D sequences compared to 85.7% with standard sequences. The difference was most pronounced for subtle pathology, where intermediate confidence scores (2 or 3) were more frequent with 3D imaging.
3.7. Time Efficiency Analysis
The implementation of 3D ACS sequences resulted in substantial time savings. Standard protocol acquisition required 24–28 min, while a complete 3D protocol (both T1 and PD-FS) required only 6 min 52 s, representing a 75.5% reduction in scan time. Even accounting for post-processing and reconstruction time, the total time from acquisition start to image availability was reduced by approximately 70%.
4. Discussion
This prospective study demonstrates that deep learning-accelerated 3D sequences significantly reduce acquisition time in shoulder MR arthrography while maintaining high diagnostic accuracy for glenoid labral pathology. The principal finding of 94.7–95.1% sensitivity with perfect specificity validates the clinical applicability of these advanced reconstruction techniques. Our findings confirm the excellent diagnostic performance of AI-accelerated sequences in assessing RC tears, BM lesion and glenoid labral pathology during shoulder MRI [
9,
15,
21]. Specifically, both readers achieved high sensitivity and specificity and a large area under the ROC curve (AUC 0.894); these findings are in line with those of a recently published paper focused on the shoulder MRI [
15]. In the above-mentioned paper, as concerns the rotator cuff evaluation, the 2-fold abbreviated protocol showed a sensitivity of 98–100% and specificity of 99–100%, while the 4-fold protocol maintained a sensitivity of 95–98% and specificity of 99–100%. However, this paper focused on standard FSE sequences, and the whole protocol was repeated with 2 different compression levels; also, the DL protocols achieved a very high accuracy concerning the BME, but data regarding labrum abnormalities are limited in number [
15]. The excellent diagnostic performance for labral lesion detection can be attributed to the isotropic voxel size (0.8 mm), enabling high-quality multiplanar reconstructions without step-ladder artifacts. This capability proves particularly valuable for the curved glenoid labrum geometry and obliquely oriented tears incompletely visualized on standard orthogonal planes [
18,
19,
20]. The T1 shortening effect of diluted gadolinium creates a high signal within the joint space, providing excellent labral delineation that remains preserved despite accelerated acquisition and DL reconstruction. The superior performance of combined 3D protocols (Group A: 96.8% sensitivity) compared to single sequences (Group C: 87.5%,
p = 0.012) demonstrates that T1 and PD-FS weightings provide complementary information. T1-weighted sequences optimize arthrographic effect and labral morphology, while PD-FS enhances bone marrow edema detection and extra-articular tissue contrast. This 7.3% accuracy improvement justifies acquiring both sequences despite the modest 3.5 min time penalty. The observed limitations for rotator cuff evaluation (75% sensitivity) align with theoretical expectations. Reduced contrast resolution between pathological and normal tendon substance, combined with the inability of arthrographic contrast to outline bursal-sided tears, likely explains the lower sensitivity. The paradoxically superior performance of PD-FS alone (Group C: 83.3%) for tendon evaluation confirms that fluid-sensitive contrast remains paramount for detecting subtle tendinopathy and partial tears. The near-real-time DL reconstruction (<1 min) represents a critical workflow advantage. The proprietary uAI algorithm’s multiscale convolutional neural network, specifically trained on shoulder imaging data, preserves relevant anatomical details despite 2.5× acceleration factors. This rapid reconstruction ensures immediate image availability, maintaining clinical efficiency while achieving 75% total time reduction compared to standard protocols. The excellent inter-observer agreement (κ > 0.8) between readers of different experience levels (15 versus 4 years) suggests that established diagnostic criteria translate well to DL-accelerated 3D sequences. Our hypothesis of using 3D sequences either to complement or replace 2D FSE sequences is also supported by recent trends in the literature. A growing number of studies—particularly those focused on the knee—highlight the use of 3D imaging, with some of them exploring advanced technologies such as compressed sensing to significantly reduce acquisition times [
22,
23,
24].
A critical consideration for clinical implementation concerns the implications of the slightly reduced sensitivity (95.1%) observed with DL-accelerated 3D sequences compared to conventional protocols. While this represents excellent overall performance, the 5–6% rate of missed labral tears translates to approximately 4–5 undetected lesions per 82–94 cases in our cohort. In the context of MR arthrography, where direct intra-articular gadolinium injection has been performed, repeating the examination using a conventional protocol if a tear is suspected but not detected is clinically impractical. Therefore, we advocate for a nuanced approach to clinical implementation. In high-risk instability patients or those with strong clinical suspicion for labral pathology, conventional multiplanar 2D protocols may remain preferable to minimize diagnostic uncertainty. Conversely, DL-accelerated 3D imaging may be most appropriate for: (1) screening examinations where clinical suspicion is lower, (2) patients with claustrophobia or difficulty tolerating prolonged scan times, (3) follow-up assessments where baseline lesions have been established, and (4) high-volume practices where the 75% time reduction significantly improves patient access. A hybrid approach—combining rapid 3D screening with selective targeted 2D sequences based on 3D findings—warrants prospective evaluation as a potential optimization strategy.
Study limitations include the absence of arthroscopic correlation, preventing the determination of absolute diagnostic accuracy. The reported sensitivity and specificity values therefore reflect agreement with standard 2D MRA interpretation rather than true diagnostic performance against surgical findings. The single-center design using one vendor’s platform may limit generalizability, as different DL algorithms might yield varying results. Deep learning reconstruction algorithms are vendor-specific and trained on proprietary datasets, meaning that performance characteristics may differ substantially across platforms. Multi-center validation studies across different vendors are essential before these results can be generalized to broader clinical practice. Institutions considering the implementation of DL-accelerated protocols should ideally conduct site-specific validation to confirm diagnostic equivalence with their particular hardware and software configurations. The lack of comparison with non-accelerated 3D sequences prevents isolation of the DL acceleration effect from inherent 2D versus 3D differences. As a result, it is difficult to disentangle the diagnostic contribution of isotropic 3D imaging itself from that of the deep learning reconstruction.
Furthermore, our study design focused on the detection (presence/absence) of pathology rather than characterization of lesion extent or grading. Although the extent of bone marrow edema may not be critical for clinical decision-making, the extent of labral and rotator cuff tears is highly relevant for determining surgical candidacy. Future studies should incorporate structured grading systems for lesion extent to determine whether DL-accelerated 3D sequences can provide equivalent characterization to conventional protocols for pre-operative planning.
The difference in lesion identification between readers (94 vs. 82 labral lesions, 14.6% discrepancy) reflects expected inter-reader variability for subtle labral pathology, consistent with literature reports (κ typically 0.60–0.85 for labral assessment). Our per-reader analytical design, where each reader’s 3D performance was compared to their own 2D reference, provides a clinically meaningful assessment of diagnostic consistency when transitioning to accelerated protocols. However, definitive determination of whether the 12 discordant lesions represent true pathology would require arthroscopic correlation. This inter-reader variability underscores the inherent subjectivity in labral assessment and the importance of reader experience, particularly for borderline cases. The relatively lower sensitivity for bone marrow edema (82.9%) likely reflects compromised T2 weighting from shorter TR/TE values in accelerated sequences. This limitation should be considered when ordering examinations where bone marrow edema detection is clinically important, such as occult fracture evaluation or evaluation of rotator cuff abnormalities with associated findings.
The 75% time reduction improves patient comfort, reduces motion artifacts, and increases scanner throughput, potentially improving access to advanced imaging.
Future research should address arthroscopic correlation, multi-center validation across different platforms, and investigation of higher acceleration factors. Application to non-arthrographic shoulder MR and integration with automated diagnostic algorithms warrant exploration. The high-quality isotropic datasets could serve as ideal inputs for machine learning systems designed to detect shoulder pathology.