Inter-Rater Agreement for Diagnosing Adenomyosis Using Magnetic Resonance Imaging and Transvaginal Ultrasonography

Our aim was to compare the inter-rater agreement about transvaginal ultrasonography (TVS) with magnetic resonance imaging (MRI) with regard to diagnosing adenomyosis and for assessing various predefined imaging features of adenomyosis, in the same set of women. The study cohort included 51 women, prospectively, consecutively recruited based on a clinical suspicion of adenomyosis. MRIs and TVS videoclips and 3D volumes were retrospectively assessed by four experienced radiologists and five experienced sonographers, respectively. Each rater subjectively evaluated the presence or absence of adenomyosis, as well as imaging features suggestive of adenomyosis. Fleiss kappa (κ) was used to reflect inter-rater agreement for categorical data, and the intraclass correlation coefficient (ICC) was used to reflect the reliability of quantitative data. Agreement between raters for diagnosing adenomyosis was higher for TVS than for MRI (κ = 0.42 vs. 0.28). MRI had a higher inter-rater agreement in assessing wall asymmetry, irregular junctional zone (JZ), and the presence of myometrial cysts, while TVU had a better agreement for assessing globular shape. MRI showed a moderate to good reliability for measuring the JZ (ICC = 0.57–0.82). For TVS, the JZ was unmeasurable in >50% of cases, and the remaining cases had low reliability (ICC = −0.31–0.08). We found that inter-rater agreement for diagnosing adenomyosis was higher for TVS than for MRI, despite the fact that MRI showed a higher inter-rater agreement in most specific features. Measurements of JZ in the coronal plane with 3D TVS were unreliable and thus unlikely to be useful for diagnosing adenomyosis.


Introduction
Adenomyosis is a benign uterine disease characterized by the presence of ectopic endometrial glands and stroma surrounded by hypertrophic myometrium [1]. The symptomatology of adenomyosis includes heavy menstrual bleeding and pelvic pain [2][3][4]. Traditionally, the diagnosis of adenomyosis was only possible through histological examination after hysterectomy. However, advancements in medical imaging techniques, specifically magnetic resonance imaging (MRI) and transvaginal ultrasonography (TVS), have enabled the non-invasive diagnosis of adenomyosis.
MRI has been used as a diagnostic tool for adenomyosis, with the increased thickness of the junctional zone (JZ) with a suggested cut-off of 12 mm [5], and the presence of myometrial cysts is a common criterioin [6]. A principal limitation of MRI is the absence of a definable junctional zone on imaging, which occurs in 20% of premenopausal women [5].
Later studies have suggested additional features, such as an irregular appearance of the JZ, and its relationship with the thickness of the entire myometrium [7][8][9][10]. Another diagnostic feature is the presence of small punctate cystic foci located within the JZ [7,11]. The diagnosis of adenomyosis by MRI has been reported to be highly reproducible [9,12], with high inter-rater agreement for various JZ measurements [9].
TVS, on the other hand, utilizes two-dimensional (2D) and three-dimensional (3D) imaging techniques to assess adenomyosis. The sonographic features used for diagnosing adenomyosis are described well using TVS [13][14][15][16]. A recent consensus paper by the MUSA (Morphological Uterus Sonographic Assessment) consortium distinguishes between direct features of adenomyosis [17]. Direct features include myometrial cysts, hyperechogenic islands, and echogenic subendometrial lines and buds, indicating the presence of ectopic endometrial tissue in the myometrium, and indirect features, which features reflecting changes in the myometrium secondary to the presence of endometrial tissue in the myometrium, i.e., globular uterus, asymmetrical myometrial thickening, fan-shaped shadowing, translesional vascularity, irregular junctional zone, and interrupted junctional zones [17]. The JZ is poorly visualized with 2D TVS. Three-dimensional (3D) TVS offers the possibility of assessing the coronal plane and of using VCI (Volume Contrast Imaging, i.e., thick slice) to improve visualization of the JZ, which facilitates the evaluation of its thickness and irregularity [14,18,19]. Reproducibility studies have shown a high level of agreement when evaluating the presence or absence of adenomyosis using TVS [17,[20][21][22] but less agreement when assessing different ultrasound features of the disease [20,21].
The aim of this study was to compare the inter-rater agreement of TVS with MRI regarding the diagnosis of adenomyosis and regarding the assessment of predefined imaging features. By assessing both modalities in the same set of women, we aim to provide insights into their respective diagnostic capabilities and agreement levels.

Study Design and Eligibility Criteria
This prospective study includes consecutive fertile women with heavy regular menstrual bleeding and suspected adenomyosis on clinical examination (including TVS), who were recruited in a private gynecologic clinic in Stockholm. All women were examined and recruited by the same gynecologist (JA) between January 2014 and December 2016 and referred to an MRI and expert TVS as part of the diagnostic workup. Women with adenomyoma, known endometriosis, uterine leiomyomas >4 cm, multiple leiomyomas (>3), or the current use of intrauterine devices or hormonal contraception were not eligible for the study.
In total, 67 women were recruited and referred to MRI and TVS. MRI was missing in 3 women (declined examination, n = 2; wrong identification number, n = 1) and TVS in 13 women (images were not pseudonymized, n = 2; 3D volumes, n = 6; or 2D video sequences, n = 5, were not recorded or of insufficient quality). Both MRI and TVS images were available in 51 women and included in the study. After the patient recruitment was finalized, the MRIs and TVS videoclips and the 3D volumes were retrospectively assessed by four experienced radiologists and five experienced sonographers, respectively. The assessment of the images was performed between December 2019 and February 2020. The raters were blinded to the clinical history, physical examination, and the evaluation of MRI and TVS images made by other raters.

MRI Examination and Assessment
MRIs of the pelvis were performed in an outpatient facility on a 1.5 T system (Optima MR450w, GE Healthcare, Waukesha, WI, USA, or Siemens Magnetom Symphony Tim, Siemens Healthineers, Erlangen, Germany). The minimum protocol included the following sequences: T2-weigthed Fast Relaxation Fast Spin Echo (FRFSE) or Turbo Spin Echo (TSE) in the axial, sagittal and coronal plane (slice thickness 4-5 mm; gap: 10-20%); T1-weigthed Fast Spin Echo (FSE) or a Gradient Echo (GRE) in the axial and coronal plane (slice thickness 5 mm; gap 10-20%). All examinations were performed with a phased array coil. The women were asked to fast for 4 h before the examination. Antispasmodic drugs were not administered.
The images were pseudonymized and evaluated on a Picture Archiving and Communication System (PACS; IDS7 version 21.1, Sectra AB, Linköping, Sweden) at the Karolinska University Hospital by four experienced radiologists. Raters could save their assessments and resume them at a later time to reduce the risk of fatigue. With regard to the presence or absence of adenomyosis, each rater based the assessment on their subjective evaluation of the radiological features. There were no standardized criteria given on when to make the diagnosis. The predefined MRI features assessed are listed in Table 1 and shown in Figure 1. Table 1. MRI features included in the assessment.

JZ 1 max
Maximal JZ thickness (mm). Thickest part of the JZ in the midsagittal plane.

JZmin
Minimal JZ thickness (mm). Thinnest part of the JZ in the midsagittal plane.

Myometrial thickness
The thickness (mm) of the uterine wall at the same level as JZmax.
JZmax/Myometrial thickness The extent of the uterine wall that is affected by adenomyosis (%), JZmax divided by myometrial thickness measured at the same level Diagnostic confidence Examiner confidence in their assessment. Low/Medium/High munication System (PACS; IDS7 version 21.1, Sectra AB, Linköping, Sweden) at the Karolinska University Hospital by four experienced radiologists. Raters could save their assessments and resume them at a later time to reduce the risk of fatigue. With regard to the presence or absence of adenomyosis, each rater based the assessment on their subjective evaluation of the radiological features. There were no standardized criteria given on when to make the diagnosis. The predefined MRI features assessed are listed in Table 1 and shown in Figure 1.

TVS Examination and Assessment
All women underwent ultrasound examination by a single expert examiner (EE) at the Karolinska University Hospital using a high-end ultrasound system Voluson E10 or E8, GE Healthcare (GE Medical Systems, Zipf, Austria) with a 5-9 MHz 3D transvaginal probe. Two-dimensional grayscale videoclips and 3D-VCI grayscale volumes including the whole uterine body were saved. The GE 4D View software (GE Healthcare, Wood Dale, IL, USA) was used to assess the 3D-VCI volumes. The pseudonymized videoclips and 3D volumes for each case were downloaded to memory sticks and sent to five experienced ultrasonographers. The raters used their own personal computers to assess the volumes. They were encouraged to use high-resolution computer screens and to perform the assessments in a dark room to avoid glare on the screen. The raters could save their assessments and resume later to reduce the risk of fatigue. The volumes were saved in the VCI format with a 2 mm thickness and with a grey mix of 70% X-ray/30% surface smooth. The raters could modify the volume during the analysis to optimize the assessment (remove the VCI function, change slice thickness or grey mix, and rotate the volume in any plane). With regard to the presence or absence of adenomyosis, each rater based the assessment on their subjective evaluation of different ultrasonographic features using pattern recognition [13]. The measurement of the anterior and posterior wall thickness was carried out in a longitudinal plane and the measurement of the junctional zone (JZmax and JZmin) in the reconstructed coronal plane from the 3D volume. The predetermined TVS features assessed are listed in Table 2 and shown in Figure 2A,B.

Statistical Analysis
Each rater entered the assessments into the Research Electronic Data Capture (RED-Cap ® , Vanderbilt University) data entry and management program [23,24] hosted at the Karolinska Institutet. REDCap is a secure, web-based software platform designed to support data capture for research studies, providing automated export procedures for data

Statistical Analysis
Each rater entered the assessments into the Research Electronic Data Capture (REDCap ® , Vanderbilt University) data entry and management program [23,24] hosted at the Karolinska Institutet. REDCap is a secure, web-based software platform designed to support data capture for research studies, providing automated export procedures for data downloads to common statistical packages. Statistical Data Analysis was performed using the Software (SPSS), version 26, IBM Corporation, Armonk, NY, USA.

Results
The inter-rater agreement results for diagnosing adenomyosis and for the individual imaging features are presented in Tables 3 and 4.  The five raters of ultrasound images classified adenomyosis as present in 51%, 49%, 39%, 37%, and 76% of the cases, and the four raters of MRI classified the disease as present in 88%, 43%, 49%, and 61%, respectively.
The For other continuous data assessed with MRI, ICC showed good reliability for myometrial thickness and JZ differential (JZdiff) and a moderate reliability for JZmin and Ratio JZmax/Myometrium.

Discussion
We found that the inter-rater agreement for diagnosing adenomyosis was higher for TVS than for MRI, despite the fact that the inter-rater agreement for most individual imaging features was higher for MRI than for TVS. Moreover, MRI showed clearly higher reliability than TVS for continuous variables. Since MRI had a higher agreement for most individual images' features that were assessed by both TVS and MRI, it is remarkable that the inter-rater agreement for diagnosing adenomyosis was lower for MRI than for TVS. The lack of standardized criteria for diagnosing adenomyosis for MRI is the most likely reason. Since ICC for JZmax was good, the agreement for diagnosing adenomyosis with MRI may have improved if a cut-off for JZ max had been used as the criterion. However, using the JZ as the only criterion for adenomyosis is questioned [9,10]. Also, the absence of cysts may have affected the result. Intramyometrial cysts are pathognomonic for the disease but are found in only one-third to half of affected women [7,8,10]. Some of the participating radiologists may have excluded the diagnosis when no cysts were found, even when altered JZ features were present. The limited agreement for cysts may have negatively affected the agreement for diagnosis. A recent meta-analysis and review on the performance of various objective criteria diagnosing adenomyosis, using MRI, concluded that most parameters have a relatively low sensitivity and a relatively high specificity [28]. JZ characteristics remain the most widely used and investigated with acceptable diagnostic accuracy. Specific research is needed into how these objective measures of adenomyosis can be correlated to clinical outcomes.
For TVS, the inter-rater agreement was fair or moderate for most of the individual ultrasound features and moderate for diagnosing adenomyosis. This is in line with the results of other studies [18,20] showing a good reproducibility in the diagnosis of adenomyosis using 2D TVS pattern recognition. When an ultrasound diagnosis of adenomyosis is made using pattern recognition, all the different ultrasound features are taken into account. This may explain why agreement regarding diagnosis was higher than agreement for most individual features. When combined, the subjective overall agreement for adenomyosis is present or not becomes higher than the agreement for various variables. The inter-rater agreement for cysts in the myometrium was poor. Small cysts in the myometrium may be difficult to detect with TVS, especially when other features, such as shadowing, are present. In this study, the globular uterus shape and wall asymmetry showed the highest agreement between raters, whereas other studies have reported the highest agreement for irregular JZ [20,21].
With 3D TVS, it is possible to visualize the JZ. However, JZ measurements with 3D TVS have been shown to have limited reproducibility [20][21][22]. In our study, there were missing data from >50% of the TVS cases since cases were excluded from the ICC analysis if one rater classified the JZ as "not assessable", even though the other raters measured the JZ. Moreover, results were unreliable in the cases where all the TVS raters measured the JZ, indicating low diagnostic value in clinical practice. One reason may be the low quality of 3D volumes in women with adenomyosis, as the abundant scattering of the ultrasound beam results in poor image quality in the reconstructed coronal plane, making assessment difficult. Moreover, the coronal plane may not be the correct plane to measure JZ. A recent update to the MUSA consensus paper suggests assessing the regularity of the junctional zone in multiple planes (transversal, longitudinal, coronal) using 3D ultrasound, since a regular JZ can rule out adenomyosis, while measurements of the JZ was dismissed because of a lack of evidence of the clinical relevance of this measurement [29].
A recent meta-analysis showed that MRI and TVS had an adequate performance with regard to diagnosing adenomyosis, with a pooled sensitivity and specificity of 75% and 81% for TVS and 69% and 80% for MRI, p = 0.75, when hysterectomy was used as a gold standard [30]. Still, both modalities have shown unsatisfactory inter-rater agreement for the diagnosis, indicating that adenomyosis is challenging even for experts. Clear definitions and criteria, along with more standardized interpretation models for imaging are needed for both MRI and TVS to make the diagnosis more reliable.
A strength of the study is the large number of cases examined with both imaging modalities and the enrollment of raters from different centers. For TVS, expert raters from different centers in Europe were included, which increases the generalizability of the study. Unfortunately, this was not possible for MRI.
Although the TVS images were assessed offline and a dedicated "state-of-the-art" MRI was not applied, the majority of raters considered the quality of the images to be medium or good and were confident in their evaluation. Although the image quality was suboptimal, it still appears acceptable for assessing the required parameters and reflects the reality in a clinical set-up.
The use of the RedCap database reduced human error in the handling of data to a minimum and ensured complete datasets. All questions were mandatory and thus had to be answered before the raters moved on to the next case. Missing data were only present for JZ measurements, in cases where it was not possible to identify the JZ, and thus it was classified as "not assessable" and consequently not measurable.
A limitation of the study is the use of stored offline TVS videos and volumes instead of real-time examinations. It is well known that 2D TVS image quality is often reduced by the presence of adenomyosis, as it gives rise to shadowing and distorts the normal endometrial-myometrial border. The effect of poor image quality in the 3D rendering plane further reduces the quality in the reconstructed coronal plane, hampering assessment in low-quality 3D volumes. However, it would be impossible to carry out reproducibility studies with multiple raters in TVS without using recorded material. Even for MRI, the quality of the images was not optimal. The examinations were performed in a private radiological outpatient center where neither an abdominal belt nor an antispasmodic agent was used to reduce motion artefacts caused by small-bowel peristalsis. Several MRIs were affected by artefacts, thus hampering the quality of the images. However, consensus guidelines suggesting technical protocols for MR imaging of endometriosis were published after the enrollment of the study subjects [31,32]. Furthermore, no oblique axial T2-weighted sequence perpendicular to the long uterine axis was included in the MRI protocol, which is useful for the assessment of adenomyosis [8,10]. Finally, it is important to point out that we included consecutive women with a clinical suspicion of adenomyosis, representing a real-life setting where the histological outcome was not available in the majority of women, as they underwent medical treatment. We find it acceptable not to have a gold standard for comparison, as our aim was to assess inter-rater agreement and not to correlate TVS and MRI to histology.

Conclusions
The inter-rater agreement for diagnosing adenomyosis was higher for TVS than for MRI despite MRI manifesting higher inter-rater agreement in most variables, in particular variables related to measuring the junctional zone. The measurement of JZ thickness in the coronal plane with 3D TVS could only be performed in fewer than half of the women and was found to be unreliable in the rest, and it is therefore unlikely to be useful for diagnosing adenomyosis. Institutional Review Board Statement: The study was approved by the Swedish Ethical Authority "Etikprövningsmyndigheten" (Dnr 2016/1751-32).
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study. Data Availability Statement: Database available on request; images not available.