Discrepancies between Radiology Specialists and Residents in Fracture Detection from Musculoskeletal Radiographs

(1) Background: The aim of this study was to compare the competence in appendicular trauma radiograph image interpretation between radiology specialists and residents. (2) Methods: In this multicenter retrospective cohort study, we collected radiology reports from radiology specialists (N = 506) and residents (N = 500) during 2018–2021. As a reference standard, we used the consensus of two subspecialty-level musculoskeletal (MSK) radiologists, who reviewed all original reports. (3) Results: A total of 1006 radiograph reports were reviewed by the two subspecialty-level MSK radiologists. Out of the 1006 radiographs, 41% were abnormal. In total, 67 radiographic findings were missed (6.7%) and 31 findings were overcalled (3.1%) in the original reports. Sensitivity, specificity, positive predictive value, and negative predictive value were 0.86, 0.92, 0.91 and 0.88 respectively. There were no statistically significant differences between radiology specialists’ and residents’ competence in interpretation (p = 0.44). However, radiology specialists reported more subtle cases than residents did (p = 0.04). There were no statistically significant differences between errors made in the morning, evening, or night shifts (p = 0.57). (4) Conclusions: This study found a lack of major discrepancies between radiology specialists and residents in radiograph interpretation, although there were differences between MSK regions and in subtle or obvious radiographic findings. In addition, missed findings found in this study often affected patient treatment. Finally, there are MSK regions where the sensitivity or specificity is below 90%, and these should raise concerns and highlight the need for double reading and should be taken into consideration in radiology education.


Introduction
Health care is based on high-quality patient treatment, and to ensure this quality, the competence of health-care professionals needs to be systematically evaluated [1].In medical imaging, the radiological report plays an important role in patient treatment [2] and helps general practitioners treating the patient.Radiographs are important in evaluating patients with upper-or lower-extremity trauma [3,4].Thus, the radiology report based on radiographs has an important role in patient treatment.
Extremity fractures are the second-most missed diagnosis when reporting on radiographs [5].This is especially relevant now that increased cross-sectional imaging represents Diagnostics 2023, 13, 3207 2 of 11 a growing proportion of the teaching material during radiology residency training.Missed findings in radiographs may result in several complications for the patient [6].Identifying mistakes made in radiograph interpretation is an important way to improve interpretation competence [7].Up to 80% of diagnostic errors in radiology are classified as perceptual errors where the abnormal finding is not seen [2,8].These errors are more frequent during evening and nighttime [9][10][11].In skeletal radiology, most of malpractice claims towards radiologists are related to errors in fracture interpretation [12][13][14].
In summary, radiographs are still used as first-line studies to evaluate patients with possible fractures.Therefore, interpretation competence should constantly be evaluated.Interpretation errors in radiographs are frequently related to worse patient outcomes.There are still limited data on the diagnostic performance in MSK radiograph interpretation between specialists and residents, especially with regard to time of day and subtle vs. obvious findings.In this study, we evaluated only different upper and lower MSK regions due to their frequency and the limited number of imaging outcomes (e.g., fracture or no fracture).
The purpose of this study was to determine radiology specialists' and residents' performance in radiograph interpretation and the rate of discrepancy between them.We hypothesized that (1) radiology specialists' performance is superior compared to residents' performance, (2) residents have more missed findings in subtle radiology findings compared to specialists, and (3) missed findings increase during evening and night.

Materials and Methods
This retrospective cross-sectional study received ethical approval from the Ethics Committee of the University of Turku (ETMK Dnro: 38/1801/2020).This study complied with the Declaration of Helsinki and was performed according to ethics committee approval.Because of the retrospective nature of the study, need for informed consent was waived by the Ethics Committee of the Hospital District of Southwest Finland.
This retrospective study reviewed appendicular radiographs (N = 1006) interpreted by radiology specialists (n = 506) and residents (n = 500) between 2018 and 2021.This type of study design allowed us to collect the reports at one study point and was less time-consuming than a longitudinal or prospective study design.Different MSK body parts were included and the same amount of patient cases were included in every MSK region for both radiology specialists and trainees.Cases were selected with the following inclusion criteria: (a) trauma indication, (b) original radiology report made by either radiology specialists or residents, and (c) primary radiographs.The exclusion criteria were (a) nontrauma indication, (b) no original report found in PACS system, and (c) control study.All radiographs were interpreted by two subspecialty-level MSK radiologists with 20 and 25 years of experience.Double (dual) reading was used, which has been shown to be an effective but also time-consuming way of finding discrepancies in radiology reports [15].The radiologists did not know the original report or whether the original report was made by radiology specialists or residents.Consensus between the two radiologists was evaluated against the original report.All radiographs were viewed in a picture archiving and communication system and with diagnostic monitors.To improve the generalizability of the results, data from various imaging devices were included.
Interpretation error was defined as disagreement between the original report and the two subspecialty-level MSK radiologists.In the case of interpretation error, it was evaluated and subcategorized.In addition, interpretation errors and their implications for patient treatment were classified based on the severity of the interpretation error.Implications were classified based on the consensus of the two subspecialty-level MSK radiologists as follows: Grade 1, no clinical importance; Grade 2, unable to know whether the error had clinical importance; and Grade 3, clear clinical effect on patient treatment.In addition, all abnormal radiographs were labeled as being subtle (n = 103) or obvious (n = 310) based on the two subspecialty-level MSK radiologists' consensus.Patient age, sex, time of interpretation, and date of interpretation were recorded.Data were collected and managed using REDCap (Research Electronic Data Capture) electronic data capture tools hosted at Turku University.
Patients were divided into three age groups (Table 1) to represent pediatric (1-16), adult  and elderly (>65).There were no statistically significant differences between patient age groups (p = 0.66) or sex (p = 0.53) when radiology specialists' and residents' interpretations were compared.In addition, time of interpretation was classified to present morning, evening, and night shifts.Categorical variables were summarized with counts and percentages and continuous age with means together with range.Associations between two categorical variables were evaluated with chi-squared or Fisher's exact test (Monte Carlo simulation used if needed).p-values less than 0.05 (two-tailed) were considered statistically significant.Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated together with their 95% confidence intervals (CIs).
The data analysis for this paper was generated using SAS software version 9.4 for Windows (SAS Institute Inc., Cary, NC, USA).

Overall Findings
Out of the 1006 radiographs, 41% were abnormal.In total, 67 radiographic findings were missed (6.7%) and 31 findings were overcalled (3.1%).Among the missed fractures, 18% were found in children, 60% in adults, and 22% in elderly.Among the overcalls, 29.0% were found in children, 48.4% in adults, and 22.6% in elderly.The most common reason for interpretation error was fracture (58%).Interpretation error was most likely to happen in wrist (18%) or foot (17%) interpretation.

Discrepancies between Radiology Specialists and Residents
No statistically significant differences (p = 0.44) were found in interpretation erro between radiology specialists and residents.Radiology specialists missed 5.7% and re dents 7.6% of findings.On the other hand, radiology specialists made 2.8% overcalls an residents 3.4% overcalls.Sensitivity, specificity, positive predictive value, and negati predictive value were 0.86, 0.92, 0.91 and 0.88, respectively (Table 3).Patient age w

Discrepancies between Radiology Specialists and Residents
No statistically significant differences (p = 0.44) were found in interpretation errors between radiology specialists and residents.Radiology specialists missed 5.7% and residents 7.6% of findings.On the other hand, radiology specialists made 2.8% overcalls and residents 3.4% overcalls.Sensitivity, specificity, positive predictive value, and negative predictive value were 0.86, 0.92, 0.91 and 0.88, respectively (Table 3).Patient age was similar (p = 0.29) in the correct diagnosis group and in the interpretation error group.However, there were variations in competence between different MSK regions and radiology specialists or residents.Diagnostic accuracy in different MSK regions showed a wide range of variety (Table 4).The highest sensitivity (0.95), specificity (0.97), negative predictive value (0.97) and positive predictive value (0.95) were found in pelvis interpretation, while the lowest sensitivity (0.83), specificity (0.82), negative predictive value (0.80) and positive predictive value (0.85) were found in wrist interpretation.Overall, the lowest specificity (0.78) was found in foot interpretation.For the shoulder, radiology specialists made 95% correct diagnoses compared to 83% by residents, and in the knee, radiology specialists made 89% correct diagnoses compared to 97% by residents.However, there were no statistically significant differences between radiology specialists and residents in different MSK regions.Radiology specialists interpreted more radiographs as having subtle findings compared to residents (p = 0.04).Different age groups did not differ (p = 0.89) between subtle or obvious cases.Radiology specialists missed correct diagnoses in subtle and obvious radiographs in 33% and 4.9%, respectively.In contrast, residents missed correct diagnoses in subtle (Figure 2) and obvious (Figure 3) radiographs in 51% and 8.4%, respectively.Radiology specialists interpreted more radiographs as having subtle findings compared to residents (p = 0.04).Different age groups did not differ (p = 0.89) between subtle or obvious cases.Radiology specialists missed correct diagnoses in subtle and obvious radiographs in 33% and 4.9%, respectively.In contrast, residents missed correct diagnoses in subtle (Figure 2) and obvious (Figure 3) radiographs in 51% and 8.4%, respectively.From all missed findings in radiographs, 70% (n = 44) were interpreted as having an impact on patient care (p = 0.02), but this did not differ between radiology specialists and residents.Findings missed by radiology specialists (Figures 4 and 5) affected patient care in 71% of cases and overcalls in 31% of cases.Findings missed by residents (Figure 6) affected patient care in 69% of cases and overcalls in 50% of cases.From all overcalls in   From all missed findings in radiographs, 70% (n = 44) were interpreted as having an impact on patient care (p = 0.02), but this did not differ between radiology specialists and residents.Findings missed by radiology specialists (Figures 4 and 5) affected patient care in 71% of cases and overcalls in 31% of cases.Findings missed by residents (Figure 6) affected patient care in 69% of cases and overcalls in 50% of cases.From all overcalls in From all missed findings in radiographs, 70% (n = 44) were interpreted as having an on patient care (p = 0.02), but this did not differ between radiology specialists and residents.Findings missed by radiology specialists (Figures 4 and 5) affected patient care in 71% of cases and overcalls in 31% of cases.Findings missed by residents (Figure 6) affected patient care in 69% of cases and overcalls in 50% of cases.From all overcalls in radiographs, 41% (n = 12) seemed to have an impact on patient care.The most common impact on patient care was lack of a necessary control study (40%), followed by an unnecessary control study (14%).Interpretation error rarely led to unnecessary operative treatment (1%).
radiographs, 41% (n = 12) seemed to have an impact on patient care.The most common impact on patient care was lack of a necessary control study (40%), followed by an unnecessary control study (14%).Interpretation error rarely led to unnecessary operative treatment (1%).

Overall Findings
We found similar rates of misses and overcalls in reading radiographs between radiology specialists and residents, both groups having lower specificity compared to sensitivity, yet there were differences in competence among different MSK regions.Neither day nor time of the day showed statistically significant difference in interpretation competence.These results highlights that there are no major differences between radiology specialists and residents in MSK radiograph interpretation.However, there are MSK regions that need more a ention in the future regarding competence in radiograph interpretation.This will have direct implications for resident training programs.Importantly, there were no statistically significant group differences in the age distribution between the resident and specialist groups, suggesting that the main conclusions are not biased by age.
For upper and lower extremities, we found a sensitivity of 0.92 and specificity of 0.86, which are lower than reported in previous studies [16].In contrast to previous studies [16,17], we did not find any statistically significant increase in radiology specialist or resident interpretation errors for evening or night shifts compared to daytime.However, we did find that residents, who can be more prone to fatigue-related errors [18,19], made more interpretation errors during the night shift compared to the morning or evening shift.Radiology specialists are also prone to fatigue-related problems in interpretation [17], and in this study, we found that 18% of missed diagnoses occurred between 15:00 and 17:00, which highlights the fatigue-related errors in interpretation.Most missed diagnoses in this study were related to missed fractures, similar to previous studies [20][21][22].The prevalence of abnormality in our study was 41%, which is in line with prevalence in clinical practice [23] and does not overestimate ability to detect abnormal cases [24].

Discrepancies between Radiology Specialists and Residents
We found that the overall interpretation errors for radiology specialists and residents varied from 0 to 10% and 0 to 12%, respectively, showing slightly lower competence levels

Overall Findings
We found similar rates of misses and overcalls in reading radiographs between radiology specialists and residents, both groups having lower specificity compared to sensitivity, yet there were differences in competence among different MSK regions.Neither day nor time of the day showed statistically significant difference in interpretation competence.These results highlights that there are no major differences between radiology specialists and residents in MSK radiograph interpretation.However, there are MSK regions that need more attention in the future regarding competence in radiograph interpretation.This will have direct implications for resident training programs.Importantly, there were no statistically significant group differences in the age distribution between the resident and specialist groups, suggesting that the main conclusions are not biased by age.
For upper and lower extremities, we found a sensitivity of 0.92 and specificity of 0.86, which are lower than reported in previous studies [16].In contrast to previous studies [16,17], we did not find any statistically significant increase in radiology specialist or resident interpretation errors for evening or night shifts compared to daytime.However, we did find that residents, who can be more prone to fatigue-related errors [18,19], made more interpretation errors during the night shift compared to the morning or evening shift.Radiology specialists are also prone to fatigue-related problems in interpretation [17], and in this study, we found that 18% of missed diagnoses occurred between 15:00 and 17:00, which highlights the fatigue-related errors in interpretation.Most missed diagnoses in this study were related to missed fractures, similar to previous studies [20][21][22].The prevalence of abnormality in our study was 41%, which is in line with prevalence in clinical practice [23] and does not overestimate ability to detect abnormal cases [24].

Discrepancies between Radiology Specialists and Residents
We found that the overall interpretation errors for radiology specialists and residents varied from 0 to 10% and 0 to 12%, respectively, showing slightly lower competence levels compared to previous studies [1,7,21,[25][26][27][28].Earlier studies show that when evaluated with normal and abnormal cases, interpretation errors for radiology specialists range from 0.65% [1] to 5% [29,30].There are differences between individual radiology specialists' interpretation competence, which can increase interpretation errors even to 8% [31].One of the largest studies showed a radiology specialist interpretation error rate between 3% and 4% [1].
We did not find any statistically significant differences between radiology specialists and residents, which is in contrast to previous studies, where radiology specialists showed better diagnostic accuracy compared to residents (p = 0.02) [32].However, there are also studies showing no significant differences between radiology specialists and residents [1,20,25].In addition, we did not find statistically significant differences in interpretation of subtle or obvious radiology findings, in contrast to previous studies [32].In this study, the radiology specialists had higher rates of detection and higher diagnostic accuracy for subtle findings compared to the residents, which is consistent with previous studies [18].Because we excluded reports initially signed by both a trainee and a specialist (a signal of consultation), the potential bias from specialists affecting trainee reports is probably low.In addition, we did not find statistically significant differences between radiology specialists and residents in different MSK regions, as in previous studies [33].In previous studies [16,30], ankle interpretation showed the highest sensitivity (0.98) and specificity (0.95).In this study, the ankle sensitivity (0.93) and specificity (0.83) were lower.Furthermore, in this study, specificity was lower compared to sensitivity in all MSK regions except the pelvis.This is well recognized in the field of radiology and can be related to litigation in missed findings [34].
Diagnostic accuracy in the wrist had the lowest sensitivity and specificity among MSK regions.This is worrying, because the wrist is the most often injured MSK region [35,36], and missed findings can lead to complications such as nonunion, osteonecrosis and osteoarthritis [6].Radiology specialists and residents had the same miss rate, 9.5% and 9.7%, respectively, but radiology specialists had fewer overcalls compared to residents-3.2%and 8.1% respectively.These miss and overcall rates in the wrist are higher than reported in previous studies [37].Foot injuries are also very common, and diagnostic accuracy can have serious implications on patient care [38].In our study, foot interpretation showed the lowest sensitivity and specificity in the lower extremity.These findings should prompt radiology departments to pay special attention to these MSK regions in resident training.We found that most interpretations errors affected patient care, regardless of whether the radiograph was interpreted by a radiology specialist or resident.

Limitations
First, due to the retrospective nature of the study, we were unable to verify the level of clinical competence of the radiology specialist (e.g., years in practice), or the resident (e.g., year of residency).However, we might reasonably assume that every radiology specialist or resident has the required clinical competence when they dictate radiological reports for guiding patient treatment.Second, there is a possibility of undetected selection bias.Different types of fracture tend to occur during different times of the year in Finland.To diminish this selection bias, data collection spanned several time periods.Finally, followup studies were not obtained to verify the possible missed fractures unless the patient had had follow-up assessment at the same hospital and it could be found in the PACS.Our gold standard in this study was a consensus of two MSK radiology specialists, and possible errors in their interpretations potentially affect the results of this study also.

Conclusions
In conclusion, this study found a lack of major discrepancies between radiology specialists and residents in radiograph interpretation, although there were differences between MSK regions and in subtle or obvious radiographic findings.In this study, the interpretation of pelvic imaging yielded the most notable outcomes in terms of sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV), whereas the interpre-tation of wrist radiographs demonstrated the most modest results in these performance metrics.Moreover, it is worth noting that no statistically significant distinctions were observed between the interpretations made by radiology specialists and trainees during evening or night shifts, despite radiology specialists showing a reduced incidence of interpretational errors.In addition, missed findings found in this study often affected patient treatment.Finally, there are MSK regions where the sensitivity or specificity are below 90%, and these should raise concerns and highlight the need for double reading and be taken into consideration in radiology education.Further prospective studies are needed in these specific MSK regions.In addition, future studies where artificial image interpretation is compared between radiology specialists and residents could be undertaken to highlight possible differences.

Figure 1 .
Figure 1.Total number and percentage of abnormal radiographs, missed diagnosis and overcal subtle and obvious findings presented in three different timeframes.

Figure 1 .
Figure 1.Total number and percentage of abnormal radiographs, missed diagnosis and overcalls, subtle and obvious findings presented in three different timeframes.

Figure 2 .
Figure 2. Subtle radiographic finding in patient with scaphoid fracture (arrow) that was initially missed by the resident.

Figure 3 .
Figure 3. Patient with ankle trauma.Multiple obvious findings (arrows) in radiographs that were all missed by the resident.

Figure 2 .
Figure 2. Subtle radiographic finding in patient with scaphoid fracture (arrow) that was initially missed by the resident.

Figure 2 .
Figure 2. Subtle radiographic finding in patient with scaphoid fracture (arrow) that was initially missed by the resident.

Figure 3 .
Figure 3. Patient with ankle trauma.Multiple obvious findings (arrows) in radiographs that were all missed by the resident.

Figure 3 .
Figure 3. Patient with ankle trauma.Multiple obvious findings (arrows) in radiographs that were all missed by the resident.

Figure 4 .
Figure 4. Posterior dislocation initially missed by the radiology specialist.The treating physician later suspected GH dislocation on clinical inspection, and a CT was ordered where posterior dislocation was detected.

Figure 5 .
Figure 5. Patient with anterior shoulder dislocation.The radiology specialist missed a Hill-Sachs lesion (arrow) that resulted in delay in patient treatment.

Figure 4 .
Figure 4. Posterior dislocation initially missed by the radiology specialist.Th later suspected GH dislocation on clinical inspection, and a CT was ordered w cation was detected.

Figure 5 .
Figure 5. Patient with anterior shoulder dislocation.The radiology specialist lesion (arrow) that resulted in delay in patient treatment.

Figure 5 .
Figure 5. Patient with anterior shoulder dislocation.The radiology specialist missed a Hill-Sachs lesion (arrow) that resulted in delay in patient treatment.

Figure 6 .
Figure 6.Patient with pelvic trauma radiographs.Two findings (arrows) initially missed by the resident were later revealed on CT done for other indications.

Figure 6 .
Figure 6.Patient with pelvic trauma radiographs.Two findings (arrows) initially missed by the resident were later revealed on CT done for other indications.

Table 1 .
Patient demographics in different subsets.

Table 2 .
Diagnostic accuracy of radiology specialists' and residents' interpretations at differe times.

Table 2 .
Diagnostic accuracy of radiology specialists' and residents' interpretations at different times.

Table 4 .
Diagnostic accuracy of radiology specialists' and residents' interpretations at different MSK regions.