The Added Value of Radiographs in Diagnosing Knee Osteoarthritis Is Similar for General Practitioners and Secondary Care Physicians; Data from the CHECK Early Osteoarthritis Cohort

Objective: The purpose of this study was to evaluate the added value of radiographs for diagnosing knee osteoarthritis (KOA) by general practitioners (GPs) and secondary care physicians (SPs). Methods: Seventeen GPs and nineteen SPs were recruited to evaluate 1185 knees from the CHECK cohort (presenters with knee pain in primary care) for the presence of clinically relevant osteoarthritis (OA) during follow-up. Experts were required to make diagnoses independently, first based on clinical data only and then on clinical plus radiographic data, and to provide certainty scores (ranging from 1 to 100, where 1 was “certainly no OA” and 100 was “certainly OA”). Next, experts held consensus meetings to agree on the final diagnosis. With the final diagnosis as gold standard, diagnostic indicators were calculated (sensitivity, specificity, positive/negative predictive value, accuracy and positive/negative likelihood ratio) for all knees, as well as for clinically “certain” and “uncertain” knees, respectively. Student paired t-tests compared certainty scores. Results: Most diagnoses of GPs (86%) and SPs (82%) were “consistent” after assessment of radiographic data. Diagnostic indicators improved similarly for GPs and SPs after evaluating the radiographic data, but only improved relevantly in clinically “uncertain” knees. Radiographs added some certainty to “consistent” OA knees (GP 69 vs. 72, p < 0.001; SP 70 vs. 77, p < 0.001), but not to the consistent no OA knees (GP 21 vs. 22, p = 0.16; SP 20 vs. 21, p = 0.04). Conclusions: The added value of radiographs is similar for GP and SP, in terms of diagnostic accuracy and certainty. Radiographs appear to be redundant when clinicians are certain of their clinical diagnosis.


Introduction
In routine clinical practice, the diagnosis of knee osteoarthritis (KOA) is usually made based on the clinician's expertise, and radiographs are frequently used to confirm clinical suspicion of KOA [1,2]. However, there are insufficient data on the necessity and the potential role of radiographs in the diagnostic process.
The European League Against Rheumatism Recommendations (EULAR) reported that three symptoms (knee pain, morning stiffness less than 30 min and functional limitation) combined with three clinical signs (crepitus, restricted range of motion and bone enlargement) could predict 99% radiographic KOA [3]. Similarly, recent studies showed that clinical manifestations, such as knee pain, crepitus, joint line tenderness, bony swelling and pain on flexion/extension could be used for identifying radiographic KOA [4][5][6]. Current recommendations advise not to use imaging in patients with typical OA presentations, but these were mainly based on expert opinion [7,8].
As a common and chronic disease [8][9][10], KOA is usually diagnosed both by general practitioners (GPs) and secondary care physicians (SPs). The added diagnostic value of radiographs can be different between the two kinds of clinicians given their different clinical expertise. However, there is no scientific literature on this aspect.
In this study, we recruited both GPs and SPs with osteoarthritis (OA) expertise to assess clinical vignettes taken from the CHECK cohort study (a longitudinal cohort study of primary care patients with knee complaints suggestive for early stage KOA, followed for 10 years) of potential KOA patients and to provide diagnoses based on either clinical data alone, or clinical combined with radiographic data. The aim of this study was to evaluate the added value of radiographs above clinical findings in diagnosing KOA and to see whether this differed between GPs and SPs.

Clinical Experts
The protocol has been approved by the Ethical Committee of UMC Utrecht (protocol number 02/017-E). We recruited experts who fulfilled one of the following criteria in this study: (i) had a degree in general practice, orthopedics, rheumatology or sports medicine for 2 or more years; (ii) were in-training for a degree in these specialties combined with a PhD in OA research.
We tested experts' characteristics by querying them on the number of OA patients treated per week, experience in OA treatment (years), and their perception on the importance of radiographs in making the diagnosis.

Clinical and Radiographic Data
For the present study, we included all patients from the CHECK cohort [11][12][13]. The CHECK cohort recruited patients between October 2002 and September 2005, and all patients were supposed to be followed for 10 years. Patients whose medical records and radiographs were available from 5 up to 10-year follow-up were included in this study.
Clinical data, obtained at a 5, 8 and 10-year follow up, consisted of demographics (including sex, age, BMI (body mass index), racial background, marital status, menopausal status, educational level, chronic diseases, occupation, smoking status and alcohol usage), physical examinations (presence of knee pain, morning stiffness in knee, knee warmth, bony tenderness, crepitus, knee pain on extension and flexion, range of motion (extension and flexion)), WOMAC (Western Ontario and McMaster Universities Osteoarthritis Index) subscales of pain, function, and stiffness, NRS (numeric rating scale) pain scores and incidence of other diseases (quadriceps tendinitis, intra-articular fracture, Baker's cyst, ligament or meniscus damage, osteochondritis dissecans, plica syndrome and septic arthritis).
Radiographic data consisted of scores from centralized reading by trained readers evaluating standard weight bearing (posterior-anterior fixed flexion view) knee radiographic films at 5, 8, and 10 years follow up (for details see [12]). The scores included information on tibial attrition, femoral/tibial sclerosis, joint space narrowing, femoral/tibial osteophytes and Kellgren and Lawrence grades. Both posterior-anterior fixed flexion and lateral films were also made available to the experts. Table S1 summarizes all clinical and radiographic data presented to the experts. All data were stored and presented in special software (built in-house) for optimal presentation. The software recorded actual access to the radiographic films.

Obtaining Diagnoses
Before starting the diagnostic process, all experts received written information and completed two example patients to get familiarized with the procedures and software. We obtained expert diagnosis between June 2018 and January 2019.
Experts were divided into pairs; each pair consisted of one GP and one SP, where possible. The diagnostic process is presented in Figure 1. Each pair assessed the same subset of knees (40-50 patients). First, longitudinal clinical data of each patient were presented. Each expert evaluated these independently and, for each knee, chose between "yes, clinically relevant OA has developed" and "no, clinically relevant OA has not developed". In addition, the experts had to provide their certainty on a 1 to 100 scale (integers with unit of 1), where 1 was "certainly no OA" and 100 was "certainly OA". If a knee was diagnosed as "OA", the certainty score had to be between 51 and 100, with a higher score expressing greater certainty; if the knee was diagnosed as "no OA", certainty score had to be between 1 and 49, with a lower score expressing greater certainty. Next, access to longitudinal radiographic data and films was activated. Experts were asked the same questions and had to provide new certainty scores. At this stage, experts had read-only access to the clinical data and their corresponding diagnoses.
recorded actual access to the radiographic films.

Obtaining Diagnoses
Before starting the diagnostic process, all experts received written information and completed two example patients to get familiarized with the procedures and software. We obtained expert diagnosis between June 2018 and January 2019.
Experts were divided into pairs; each pair consisted of one GP and one SP, where possible. The diagnostic process is presented in Figure 1. Each pair assessed the same subset of knees (40-50 patients). First, longitudinal clinical data of each patient were presented. Each expert evaluated these independently and, for each knee, chose between "yes, clinically relevant OA has developed" and "no, clinically relevant OA has not developed". In addition, the experts had to provide their certainty on a 1 to 100 scale (integers with unit of 1), where 1 was "certainly no OA" and 100 was "certainly OA". If a knee was diagnosed as "OA", the certainty score had to be between 51 and 100, with a higher score expressing greater certainty; if the knee was diagnosed as "no OA", certainty score had to be between 1 and 49, with a lower score expressing greater certainty. Next, access to longitudinal radiographic data and films was activated. Experts were asked the same questions and had to provide new certainty scores. At this stage, experts had read-only access to the clinical data and their corresponding diagnoses.
After individually finishing all these evaluations, knees assigned certainty scores >30 and <70 were defined as "uncertain", the remainder as "certain". Where the two experts agreed (yes/no OA, regardless of certainty) the diagnosis automatically became final. Each pair held a consensus meeting to re-assess the knees where the individual diagnoses were discrepant, except if both experts were "uncertain". At that meeting the expert pair evaluated both clinical and radiographic data of the discrepancies, as done when evaluating these individually, and made a final diagnosis together. Knees where no consensus could be reached and those for which the experts disagreed after the individual scoring, but both were "uncertain", were all labeled as "consensus based uncertain". After individually finishing all these evaluations, knees assigned certainty scores >30 and <70 were defined as "uncertain", the remainder as "certain". Where the two experts agreed (yes/no OA, regardless of certainty) the diagnosis automatically became final. Each pair held a consensus meeting to re-assess the knees where the individual diagnoses were discrepant, except if both experts were "uncertain". At that meeting the expert pair evaluated both clinical and radiographic data of the discrepancies, as done when evaluating these individually, and made a final diagnosis together. Knees where no consensus could be reached and those for which the experts disagreed after the individual scoring, but both were "uncertain", were all labeled as "consensus based uncertain".

Statistics
Categorical variables were presented as counts and percentages and normally distributed continuous data as mean ± standard deviation. Experts characteristics were compared by Mann-Whitney U test or Wilcoxon W test, where appropriate. The number of consistent and amended diagnoses after assessment of the radiographic data were presented for GP and SP, split for no OA and OA diagnoses obtained when evaluating the clinical data only. Chi-square tests were used for comparing diagnoses before and after viewing radiographic data in GP/SP. We calculated sensitivity, specificity, positive/negative predictive value (PPV/NPV), accuracy, positive/negative likelihood ratio (LR+/-) and their 95% CIs (confidence intervals) for the GP and SP diagnosis separately, with the consensus based final diagnosis as gold standard. Next, we split all knees into clinically "certain" (individual certainty scores ≤30 or ≥70, based on clinical data only) and clinically "uncertain" (individual certainty scores >30 and <70, based on clinical data only) and the same diagnostic indicators were calculated within both groups. The primary objective of the present study was to assess the clinically relevant value of radiographs. With the consideration that statistical tests could be too sensitive to detect minor differences in such a large sample, we did not apply statistical analysis for comparing the above described diagnostic indicators. As no comparable results have been reported before, outcomes were deemed exploratory.
For the analysis of certainty scores, the knees were divided into four subgroups: "consistent OA" (the clinical diagnosis of OA was retained after viewing radiographic data), "amended to no OA" (clinical diagnosis OA amended to no OA after viewing radiographic data); and likewise, "consistent no OA", and "amended to OA". Paired t-tests assessed whether diagnostic certainty was improved with radiographic information, in "consistent OA" and "consistent no OA" knees. To assure robustness of the results, a sensitivity analysis compared certainty scores between left knees only, with a paired t-test.
Analysis was performed with SPSS version 25.0 (IBM, Chicago, IL, USA); the significance level was 0.05 using a 2-sided p value for all tests.

Experts and Patients
A total of 36 experts were recruited, 17 GPs and 19 SPs (8 orthopedists, 9 rheumatologists and 2 sports physicians). Seventeen pairs of one GP and one SP were formed, and the remaining pair included one orthopedist and one rheumatologist.
Expert characteristics are shown in Table 1. Among all the characteristics, only the perceived importance of radiography was significantly different between GP and SP (p < 0.001). The study included 761 patients with 1185 symptomatic knees; 79% female, mean (SD) age 56 (5) years. The pairs formed by one GP and one SP evaluated 1106 knees, and the pair consisted of two SP did for 79 knees. During the diagnostic process, GP viewed the actual films in 45% of the knees and SP did this in 69% of the knees.

General Practitioners
GPs diagnosed 42% of knees as OA based on the clinical data only and 44% after viewing radiographic data. In total, 86% of diagnoses were consistent after viewing radiographic data; 6% OA knees were amended to no OA, and 8% no OA knees to OA (Figure 2). Of the 14% amended diagnoses, 8% were deemed correct, compared to the final diagnosis (Table 2). In general, the changes in diagnoses were statistically significant (p < 0.001) and all diagnostic indicators were somewhat improved after viewing radiographic data (Table 3).
For SPs, diagnostics certainty improved somewhat in the "consistent OA'" knees (70 ± 12 vs. 77 ± 15, p < 0.001). Diagnostic certainty of "consistent no OA" was minimally but significantly altered GPs were uncertain about 41% of their clinical diagnoses. They were much more likely to amend uncertain diagnoses than their certain diagnoses (23% uncertain vs. 7% of certain diagnoses).

Secondary Care Physicians
SPs diagnosed 39% of knees as OA based on clinical data only and 49% after viewing radiographic data. In total, 82% of diagnoses were consistent after viewing radiographic data; 4% OA knees were amended to no OA and 14% no OA knees to OA (Figure 2). Of the 18% amended diagnoses, 9% were deemed correct as compared with the final diagnosis ( Table 2). In general, the changes in diagnoses were statistically significant (p < 0.001) and all diagnostic indicators were somewhat improved after viewing radiographic data (Table 3).
SPs were uncertain about 36% of their clinical diagnoses. They were much more likely to amend uncertain diagnoses than certain diagnoses (27% uncertain vs. 14% of certain diagnoses). Furthermore, the rate of correct amendments in clinically uncertain diagnoses was 3 times higher than that in certain diagnoses (Table 4)

Discussion
In this study, we showed that radiographs added only to the diagnostic ability of both GPs and SPs in clinically "uncertain" diagnoses. Overall, diagnostic ability, diagnostic certainty and the added value of radiographs were very similar for GP and SP.
Both GP and SP amended some of their diagnoses after viewing the radiographic data, but the majority of diagnoses remained the same. As a time-consuming, costly and potentially radiation-hazardous examination, radiographs seem to be redundant in most cases suspicious for knee OA. The diagnostic abilities of GP and SP, without access to radiographic data, were already comparable to findings in other chronic musculoskeletal diseases, such as lumbar spinal stenosis [14] and lumbar disc herniation [15]. Therefore, for clinically "certain" knees, diagnostic abilities based on clinical data only should be considered as good enough, in contrast to clinically "uncertain" knees. Our results support expert recommendations and the results of previous studies [3,4,[6][7][8], where diagnoses based on clinical findings were found to be reliable and where radiographs were deemed unnecessary for diagnosing typical KOA.
On the other hand, after viewing radiographic data, diagnostic indicators for both GP and SP were much improved in clinically "uncertain" knees. Likelihood ratios, calculated by using sensitivity and specificity, can directly reflect the ability of diagnosing OA/no OA [16,17]. LR+ is deemed clinically meaningful if greater than 10 and LR− when lower than 0.1. In this study, radiographs helped to improve LR+ from 2 to 5 in "uncertain" knees. According to a previous literature report [17], it indicates the probability of correct diagnosis for OA knees was increased by 15%. LR-was improved from 0.4 to 0.05. The probability of a correct diagnosis of no OA knees was increased by 25%. Hence, we believe the improvements in "uncertain" knees are clinically meaningful, and radiographs could be considered in these cases.
Both GPs and SPs seemed to be more certain of their radiographically confirmed OA ("consistent OA") diagnoses. However, as other joint diseases were excluded from the CHECK cohort at baseline [11,13] and the incidence of these diseases during follow-up was quite low (3%), all the abnormalities presented in radiographic data would direct experts to an OA diagnosis, rather than to other conditions. In other words, our results could be inflated compared to real practice. Furthermore, because this is the first study of its kind, it remains unclear whether the certainty improvements are clinically relevant. On the other hand, our results did not support the strategy of using radiographs for improving certainty of no OA diagnoses. On average, the experts were already "fairly" certain (certainty scores < 30) about clinically no OA diagnoses and neither GPs nor SPs became more certain after viewing radiographic data in consistent no OA knees.
In this study, we provided standardized radiographic scores to the experts, which should be helpful to diminish the bias of image reading skill differences between different experts. Even though actual films were also available if required, not all films were viewed by experts. SP seemed to be more interested in the actual films than GP in this study. This aligns with their characteristics and also can be explained by their differences in image interpretation skills, which is correlated with image exposure in daily clinical work [18,19].
Since the major aim of this study was to evaluate the added value of radiographs above clinical findings in diagnosing KOA, we did not perform specific statistical analysis on the diagnostic results between GPs and SPs. Generally, SPs amended slightly more diagnoses than GPs after viewing radiographic data (18% vs. 14%), which could be explained by the expert characteristics as SPs place more emphasis on radiographs, but the rate of correct amendment was similar (9% vs. 8%). Furthermore, there was no obvious difference among diagnostic indicators between GPs and SPs either before or after viewing radiographic data. Similar results were also found in certainty scores. Therefore, we believe the added value of radiographs should be considered as similar for GPs and SPs.
This study has limitations. First, there is likely some incorporation bias when we compare the GPs' and SPs' diagnoses to the consensus-based final diagnoses, because the individual diagnoses which both experts agreed on were incorporated into final diagnosis automatically [20]. That means the absolute values of diagnostic indicators in these comparisons are potentially overestimated. Decary et al. reported that the amount of overestimation of sensitivity and specificity caused by incorporation bias depended on the true specificity of the test method [21]. It was impossible to quantify the overestimation in the current study, due to the lack of the true specificity of expert diagnoses. A second potential concern, 424 patients with bilateral knee complaints were included in this study. Two knees from the same patient shared the same demographic data and WOMAC scales. In principle, it is inappropriate to view them as fully independent observations. However, our sensitivity analysis limited to left knee data only yielded results similar to the main analysis, suggesting this is not a problem in our dataset. Third, standard radiographic scores (i.e., Kellgren and Lawrence grade) as well as actual films were provided to experts in this study, which differs from the scenario of routine clinical work. Even so, most clinical diagnoses remained same after viewing both the scores and films. This, to some degree, supports our conclusion that radiographs seem to be redundant in most cases. Fourth, we did not provide the skyline view to the experts, so some patellofemoral joint OA might be missed. However, we believe any influence on our conclusions is limited because the prevalence of patellofemoral joint OA in the CHECK cohort was quite low (4.6%) [22], and its presence would also have been suggested by the lateral radiograph findings as well as clinical history and physical examination, e.g., knee crepitus [23]. Fifth, the process of obtaining final diagnosis could have been influenced by authority, since SPs likely have more authority than GPs. In this case, diagnostic indicators of SPs would be higher than those of GPs. As the diagnostic indicators were quite similar between GPs and SPs, we believe this is not a big issue.
In conclusion, radiography could be of importance in cases where the clinical diagnosis of KOA is uncertain. Radiographs helped to improve the certainty of OA diagnoses, but the clinical relevance of this improvement is unclear. Overall, all results were similar for GPs and SPs.