The Detection of Dental Pathologies on Periapical Radiographs—Results from a Reliability Study

(1) Background: Caries, periapical lesions, periodontal bone loss (PBL), and endo-perio lesions are common dental findings that require an accurate diagnostic assessment to allow appropriate disease management. The purpose of this reliability study was to compare the inter- and intra-rater reliability for the detection of the above-mentioned pathologies on periapical radiographs. (2) Methods: Fourteen dentists (three with more than two years and eleven with less than two years of work experience) participated in a training workshop prior to data acquisition. A total of 150 radiographs were assessed by all raters in two rounds. Cohen’s Kappa (CK) values and a binary logistic regression were calculated. (3) Results: The reliability was found in a moderate and substantial range of agreement: caries (mean inter-rater CK value/first round 0.704/mean inter-rater CK value/second round 0.659/mean intra-rater CK value 0.778), periapical lesions (0.643/0.611/0.768), PBL (0.454/0.482/0.739) and endo-perio lesion (0.702/0.689/0.840). The regression model revealed a significant influence of the clinical experience, and furthermore, periapical pathologies and PBL were identified less reliably in comparison to caries and endo-perio lesions. (4) Conclusions: The dentist’s ability to detect the chosen pathologies was linked with significant differences. Periapical lesions and PBL were identified less reliably than caries and endo-perio lesions.


Introduction
Dental caries and periodontitis affect the majority of the world's population, and the available epidemiological data demonstrated the widespread prevalence of the diseases [1][2][3][4]. Periapical periodontitis is also among frequent dental findings, as indicated by the fact that half of the global population has at least one tooth with a periapical inflammation [5]. Therefore, an accurate detection and assessment of the mentioned dental pathologies is essential to indicate appropriate disease management [6]. Aiming at reliably detecting and evaluating the existing range of dental pathologies, dental radiography seems to be the adjunct method of choice following the clinical examination [3,7,8].
When analysing the available literature regarding inter-and intra-rater reliability among dental professionals, it appeared that several author groups investigated this issue . Interestingly, the overwhelming majority of scientific reports studied most commonly applied radiographic projection techniques for each pathology and their diagnostic accuracy without considering the inter-and intra-rater reliability. For example, bitewing radiographs are the method of choice for caries detection [6,[9][10][11][12][13], periapical and panoramic radiographs for detecting and classifying periapical lesions or PBL [14][15][16][17][18][19][20][21][22][23][24][25][26][27]. In contrast, little is known about the inter-and intra-rater reliability for detecting caries [28] and periapical lesions [14][15][16] on periapical radiographs which are also among the routines in daily dental practice on this image type. In addition, there is limited data available on A group of 14 dentists with different levels of clinical experience took part in this investigation. Three dentists who were employed at the dental clinic based in Munich had more than two years of work experience; eleven dentists were young professionals and had not worked clinically for longer than two years. Prior to the data collection, the entire study group participated in a 2-day training workshop under the guidance of Prof. Dr. Kühnisch (JK). During the workshop, the study protocol was explained, all pathologies and their detection and classification were described and numerous X-rays were assessed by the study group and each participant individually. After the workshop, the set of periapical radiographs was evaluated twice by all participants. There had to be at least four weeks in between the two assessments to reduce memory bias as best as possible [32]. All dentists were urged to complete the evaluation without assistance and independently.

Set of Periapical Radiographs
A set of completely anonymised periapical radiographs was chosen. All images were recorded at the dental school of the LMU Munich, utilising dental X-ray machines with a 203 mm tube (Heliodent DS, Sirona, Bensheim, Germany) counting an X-ray field restriction (30 × 40 mm) and charge-coupled device (CCD) sensor (Intraoral II, sensor measure 30.7 × 40.7 mm, Sirona, Bensheim, Germany). The exposure time was set at 0.06-0.08 s using a cathode voltage of 60 kV and an amperage of 7 mA. A sensor-holding device (XPP-DS Advanced Sensor Holders for Sirona, Dentsply Rinn, Elgin, IL, USA) was accessible and utilised in the event that it was applicable.
The following diagnostic criteria were chosen for the inclusion of a radiographic image: (1) Periapical radiographs from lower and upper incisors, canines and molars were selected.
(2) An image showing the full extent of the main teeth was needed. (3) All X-rays required correct levels relating to exposure, contrast and brightness. Images of teeth with quality defects such as distortions, prominent superimposing effects or others were not selected. (4) At least one radiographic finding-dentin caries, periapical pathology, PBL or endo-perio lesion-should be shown on each image. Finally, 150 periapical radiographs that met the inclusion criteria were identified, and each was assigned a unique identification number.

Diagnostic Standards
Typical examples for caries (Figure 1a Figure 1. All diagnostic standards were defined prior to the data collection as following. The criterion dentin caries was diagnosed according to international accepted recommendations [33][34][35]. When proximal or occlusal caries lesions were registered beyond the enamel-dentin junction or even up to the inner dentin and pulp area, they were registered as dentin caries (Figure 1a). Proximal caries lesions limited to the enamel were not considered. Furthermore, no distinction between different grades of progression was made.

Diagnostic Standards
Typical examples for caries (Figure 1a), periapical lesions (Figure 1b), PBL ( Figure 1c) and endo-perio lesions (Figure 1d) are shown in Figure 1. All diagnostic standards were defined prior to the data collection as following. The criterion dentin caries was diagnosed according to international accepted recommendations [33][34][35]. When proximal or occlusal caries lesions were registered beyond the enamel-dentin junction or even up to the inner dentin and pulp area, they were registered as dentin caries (Figure 1a). Proximal caries lesions limited to the enamel were not considered. Furthermore, no distinction between different grades of progression was made. Periapical lesions (Figure 1b) were recorded when a radiolucency around the tooth apex was present. Here, the periapical index by Orstavik et al. [36] classifies periapical pathologies into five different lesion stages. In the present study, periapical inflammation was detected when a radiolucency located in the area around the apex of at least one tooth with twice the width of the periodontal ligament was visible [37,38]. A differentiation between lesion stages was not performed.
To assess the PBL (Figure 1c), the cemento-enamel junction, limbus alveolaris and apex were considered. In detail, the radiographic PBL (cemento-enamel junction-limbus Periapical lesions (Figure 1b) were recorded when a radiolucency around the tooth apex was present. Here, the periapical index by Orstavik et al. [36] classifies periapical pathologies into five different lesion stages. In the present study, periapical inflammation was detected when a radiolucency located in the area around the apex of at least one tooth with twice the width of the periodontal ligament was visible [37,38]. A differentiation between lesion stages was not performed.
To assess the PBL (Figure 1c), the cemento-enamel junction, limbus alveolaris and apex were considered. In detail, the radiographic PBL (cemento-enamel junction-limbus alveolaris) was estimated in relation to the root length (cemento-enamel junction-root apex). If the vertical or horizontal radiographic PBL reached at least the second half of the coronal third of the root length (15-33%), then periodontitis was assumed to be established [39][40][41]. A detailed differentiation under consideration of the exact extent of the radiographic PBL, e.g., into coronal, middle or apical root third [39][40][41] was not made.
Endo-perio lesions ( Figure 1d) were detected when the periapical inflammation on a tooth included pulpal and periodontal structures. The classification of endo-perio lesions by Simon et al. [42] served as a reference for this project. Essential radiographic findings included the existence of an inflammation (or radiolucency) that involves the periodontal ligament from the periapical region to the sulcus gingivae.

Consensus Decision (Reference Standard)
Following the two screening assessments, the entire working group re-evaluated all radiographs in an additional meeting to reach a consensus agreement on each X-ray. Again, a yes/no decision (0 or 1) was taken for the four selected criteria. Once one participant presented a deviating result, the team members re-evaluated the concerned radiograph and discussed until agreement was achieved. This decision was set as a reference.

Data Management and Statistical Analysis
An Excel spreadsheet (Excel 2019, Microsoft, Redmond, WA, USA) was used to collect the outcome from the 14 dentists, the 2 assessments (N = 2), and the reference standard. Before analysis, the spreadsheet was checked for plausibility. Using Excel and SPSS, descriptive and exploratory data analysis was performed (SPSS Statistics 27, 2020, IBM Corporation, Armonk, NY, USA). Analysing the inter-and intra-rater reliability of all examiners and the reference standard was part of the statistical study. Cohen's Kappa (CK), a measure of exploratory analysis, was determined. To obtain a total value, the arithmetic mean of these estimates was computed. The following interpretation must be used for CK values occurring between the ranges listed below: 0.0 to 0.2-slight agreement, 0.21 to 0.40-fair agreement, 0.41 to 0.60-moderate agreement, 0.61 to 0.80-substantial agreement, and 0.81 to 1.00-(almost) perfect agreement [43][44][45]. Additionally, a backward elimination model-based binary logistic regression analysis was carried out for the data (correct/incorrect diagnostic choice with regard to the reference standard and independent variable). The analysis considered the diagnostic decision (caries, periapical lesion, PBL, endo-perio lesion, dependent variable), evaluation round (first vs. second course, dependent variable), and dentists' experience (</> 2 years of dental work experience, dependent variable) as potential confounding variables.

Results
The following tables contain the detailed inter-and intra-rater reliability CK values for all participating dentists for the identification of dentin caries (Table 1), periapical pathology (Table 2), PBL (Table 3), and endo-perio lesion (Table 4). Table 5 provides a summary of the mean CK values. In particular, the lowest CK values were found in the first evaluation round of inter-rater reliability for PBL, which was in a moderate agreement range (0.454), followed by the inter CK for periapical pathology (0.643). The mean CK values for caries (0.704) and endoperio lesion (0.702) were nearly identical. The mean inter-rater CK values for both rounds were discovered in the same value range ( Table 5). All diagnostic categories of intra-rater reliability demonstrated a substantial to excellent agreement (0.739 to 0.840; Table 5). The reliability results were found in moderate to substantial agreement for the selected categories in both evaluation rounds, in relation to the reference standard ( Table 5). The lowest values were documented for periapical lesions (0.435), followed by PBL (0.575). Both numbers represent a moderate level of agreement. Table 1. Values of inter-and intra-rater reliability for caries detection which was calculated among all dentists (N = 14) and in relation to the reference standard (green coloured boxes). Grey coloured boxes illustrate inter-rater CK values of the first assessment and blue coloured boxes for the second assessment. White fields indicate intra-rater CK values.    * Skilled study members with more than two years of experience in dental practice. Table 5. CK means of the inter-and intra-rater reliability for the four dental pathologies.

Inter-Rater Reliability
Inter-Rater Reliability  A binary logistic regression model was applied to further explore the data set ( Table 6). The results demonstrated that the variable "evaluation round" had no significant impact on reliability (aOR 0.99, p-value < 0.711). On the contrary, the detection of the investigated dental pathologies was documented with significant odds ratios when setting caries as the reference value in the model. Here, the detection of periapical inflammation (aOR 0.34, p-value < 0.001) and PBL (aOR 0.57, p-value < 0.001) on the selected X-rays was found to be associated with low reliability. In contrast, endo-perio lesions (aOR 1.54, p-value < 0.001) were diagnosed with higher reliability. Furthermore, dentists with longer clinical experience (>2 years) evaluated the set of periapical radiographs more consistently (aOR 1.98, p-value < 0.001).

Discussion
This diagnostic study evaluated the inter-and intra-rater reliability on periapical radiographs for the detection of caries, periodontitis in terms of PBL, periapical inflammations and endo-perio lesions. Generally, the observed CK values were found to be in moderate to substantial agreement (Tables 1-5). The binary logistic regression model (Table 6) showed significant deviations for the detection of the chosen dental pathologies. Additionally, the rater's clinical experience had a significant impact in the present investigation. Therefore, the initially formulated hypothesis that there is no difference between the diagnostic categories and participating dentists must be rejected.
When discussing the results in detail, caries data should be considered first. Here, we found substantial inter-rater agreement for the first and second evaluation rounds between all raters (mean CK value of 0.704 and 0.659, Table 5) and for the first and second evaluations when the individual decision was related to the reference standard (mean CK value of 0.748 and 0.724, Table 5). The same order of magnitude was registered for the intra-rater reliability (mean CK of 0.778; Table 5). The comparison of our reliability results to other studies with the same methodology [28] indicated basically similar outcomes on periapical radiographs. However, it should be noted that detailed inter-and intra-rater agreement data cannot be taken from this article [28], which hinders further thorough comparisons. Therefore, diagnostic reliability studies for caries detection on bitewing radiographs should be mentioned. Here, numerous studies were published that mostly registered similar CK values for inter-and intra-rater reliability [12,13,29,30]. This conclusion is also supported by recently published data from two systematic reviews and meta-analysis on occlusal and proximal caries detection [9,10].
The reliability results for periapical pathologies in the present study registered substantial inter-rater agreement for both evaluation rounds between all raters (mean CK value of 0.643 and 0.611, Table 5) and moderate agreement for the first and second evaluations when comparing the individual data to the reference standard (mean CK value of 0.435 and 0.442, Table 5). Substantial agreement was recorded for the intra-rater reliability (mean CK of 0.768; Table 5). Due to the fact that periapical radiographs are the projection technique of choice for detecting periapical lesions, there are few publications available that allow comparisons [14][15][16][17]. A study by Patel et al. [14] aimed at detecting periapical pathologies under different viewing conditions by 50 observers. An interesting methodological detail was the feature of obscure tooth crowns on all periapical radiographs aiming at excluding potential diagnostic bias from the rater's decision, which was not considered in our investigation. As a result, the author group documented fair inter-rater agreement (CK 0.26) among the participating dental students and dentists [14]. This finding is in line with data published by Saunders et al. [17], who found fair to moderate Kappa values for the detection of periapical lesions. A similar observation was drawn by Sebring et al. [18], who documented the greatest variability of diagnostic decisions among teeth with periapical pathologies, which also resulted in fair to moderate agreement; it should be noted that this study was carried out on panoramic X-rays. In contrast to these less-encouraging findings, Patel and co-workers [15] documented an almost perfect agreement for periapical lesion detection (CK 0.878) for two trained endodontists who assessed 30 periapical radiographs. However, the documented reliability data from this study for the detection of periapical inflammation indicate mostly a moderate level of agreement. With respect to the data from logistic regression analysis (Table 6), it needs to be pointed out that in the present study, the detection of periapical pathologies was found to be less reliable in comparison to all other included variables.
When viewing the data in the category PBL, the inter-rater CK values in the two assessments amounted to 0.454 and 0.482 (Table 5), respectively, which represent moderate agreement between raters. In contrast, the inter-rater reliability data in relation to the reference standard showed a substantial agreement for the first and second evaluation rounds (mean CK value of 0.575 and 0.598, Table 5). Additionally, the intra-rater CK value was found to be substantial (0.739, Table 5). In principle, the same reliability data were also documented for endo-perio lesions (Table 5). Surprisingly, we found no comparable investigation that evaluated the ability of dental professionals to detect PBL or endo-perio lesions on periapical radiographs; this limits the discussion and indicates future research needs. However, several publications have assessed the reliability of PBL detection between different radiographic projection techniques or under inclusion of mainly clinical parameters [22][23][24][25][26]40,[46][47][48]. When viewing the data from the logistic regression analysis (Table 6), contrary data were observed for the detection of PBL and endo-perio lesions. While PBL was found to be associated with significantly less-reliable readings (OR 0.57, Table 6), endo-perio lesions were detected with a significantly higher detection probability (OR 1.54, Table 6). This becomes explainable by the simple fact that the latter pathology is representing a major finding on periapical radiographs, which is also easy to identify by less-experienced observers.
This research has strengths and limitations that should be discussed. The comprehensive study design, which included 4 diagnostic criteria, 14 raters, and 150 periapical radiographs, must be highlighted as a strength of a reliable study. The four selected criteria represent common dental pathologies that a dentist needs to be routinely diagnose in daily practice on this type of image. Here, it can be further argued that the simultaneous detection of different pathologies is close to the setting in dental practice, where multiple possibilities on X-rays need to be screened and identified correctly if present. In addition, the radiographs and dental structures were fully screened. This may have especially simplified the detection of periapical inflammations, which is frequently associated with profound caries or extensive restorations. This may explain the better CK values in this investigation compared to Patel et al. [14], where the crowns of the teeth on periapical radiographs were obscured to exclude any detection bias. There are also several limitations that must be mentioned. Under daily dental routines, many different factors can be considered, leading to the correct diagnostic decision on radiographic images [8,14,40]. In this study, however, periapical radiographs were evaluated without having access to the clinical background data and the indications that justified their prescription. This may have negatively influenced especially the reference standard. Interestingly, no significant difference was observed between the first and second evaluation round (Table 6). This finding could be addressed mainly to the extensive training before measuring the reliability. In consequence, learning effects during the study were potentially reduced. Furthermore, all included pathologies were rated by all raters as yes or no decision, which was exclusively attributed to the study extent and the workload for each participant. This simplified recording led most likely to more favourable results in comparison to a study design with multiple scores in each diagnostic category. Besides this, other-probably less frequent-pathologies have to be noted and should be known by the dentists, e.g., resorptions, cysts, trauma-induced findings or radio-opacities/radiolucencies. Such pathologies were not considered in the present study project. A similar report was published recently that analysed dentists' reliability in identifying restorations on periapical radiographs [32] and, interestingly, the CK values for detecting different restorative procedures were found to be better. This finding indicates that different radiographic findings are linked to varying reliability data. In this context, it has to also be mentioned that the dentist's clinical experience had a significant influence on the outcome in the present study. Here, experienced dentists detected more reliable pathologies on periapical radiographs which simply underlines the need for underand post-graduate training. With respect to the imbalance in the group of participating dentists (3 experienced vs. 11 inexperienced dentists) and the varying level of clinical experiences, this finding might potentially be biased and, therefore, should not be overrated. Nevertheless, future teaching or training programs should consequently address categories with less reliable findings, e.g., periapical inflammation and PBL. Another limitation might be the fact that different viewing conditions were used by all participants, which were non-standardized and might potentially cause variations during decision making. Here, different computer displays with varying sizes, brightness and contrast settings may have influenced the visibility and detection ability of the participating dentists. However, this is a well-known influencing factor [14,49] which remained uncontrolled in this investigation.

Conclusions
It can be concluded that this reliability study for detecting caries, periapical pathologies, PBL and endo-perio lesions on periapical radiographs documented moderate to substantial reliability data in terms of Cohen's Kappa values. Interestingly, the dentist's ability to detect the chosen pathologies was linked with significant differences. Periapical lesions and PBL were identified less reliably than caries and endo-perio lesions. Further, more experienced dentists made more reliable decisions.