Near-Infrared Transillumination for Occlusal Carious Lesion Detection: A Retrospective Reliability Study

The aim of this study was to assess the reliability of three diagnostic methods (near-infrared transillumination (NIRT), bitewing radiographs (BW), and clinical images (CI)) to detect occlusal carious lesions in a low caries risk population. This retrospective analysis included one hundred and eighty-eight occlusal surfaces, scored as sound surface, early lesion, or distinct lesion. We evaluated the agreement between and within the methods over time. Kappa statistics tested the correlation between the methods. Examiners detected occlusal early lesions more frequently with visual examination and NIRT and the same lesions were confirmed on the 2-year follow-up. Within the limitations of this study, we were able to establish that early occlusal lesions can be detected and monitored over time using NIRT and visual exam, while BW scores showed mostly sound surfaces at both examinations. NIRT combined with clinical examination can be considered appropriate to detect and monitor early enamel caries on the occlusal surface in low caries-risk populations.


Introduction
Although dental caries is one of the most common chronic diseases worldwide, the prevalence, extent, and severity of dental caries have declined significantly over the past decades in most populations [1]. The development of a carious lesion is a dynamic process that can be arrested if diagnosed before the symptoms appear. Early caries diagnosis is essential in daily practice as it helps detection long before symptoms appear which can allow the control of the carious process through prevention or early intervention. The search for a better method of caries detection in different stages is still a subject of constant evolution. Ideally, a diagnostic device should allow the detection of all stages of the carious process, including early and non-cavitated carious lesions [2]. Existing diagnostic methods have been found to be insufficient in detecting early lesions and the need for additional methods has long been acknowledged [3].
The task becomes more challenging when it comes to occlusal surfaces. This is due to the accumulation of debris and plaque that are difficult to remove because of the complex morphology of the pits and fissures. It is also assumed that the occlusal lesion is initiated on the fissure walls and is therefore concealed by superimposed sound tissue [4,5]. In addition, the extensive use of fluoride causes superficial remineralization of enamel delaying the cavitation process while dentine caries remain hidden bellow [6]. This leads to the development of lesions that could only be perceptible radiographically and are known as "hidden caries" or lesions that are even more challenging to detect if they are not yet visible Dental records of eighteen dental students in the final year were examined and collected after obtaining their oral consent for the anonymous use of the data.
BW radiographs had been taken by different practitioners with an interval of two years (+/−3 months) (exposure parameters were 70 kV, 7 mA, 0.1620130.20 s). Several digital imaging plates size 2 and CS 7600 scanning system (Carestream Health, Rochester, NY, USA) were used.
DIAGNOcam 2170U (Kavo, Biberach, Germany) and the reusable tip for adults were used to acquire the NIRT images, with the KaVo integrated desktop (KID) software V 2.4.2.
Intraoral photographs were taken with a digital camera (Nikon D5300, Nikkor 105 micro).

Data Collection
Inclusion criteria: last year dental students (18 years and older), with available bitewing radiographs with around 2 years intervals (3 months less or more were tolerated), and the NIRT images of both years and the clinical images of the first examination were included.
Exclusion criteria: students with incomplete data (missing X-rays, photographs or NIRT images). The primary analysis excluded six subjects from eighteen due to incomplete or missing data in the digital records. Thus, this retrospective study collected data from twelve subjects whose ages varied from 22 to 32 years old.
Clinical intraoral photographs were available for the first examination only. BW and NIRT images were available from the initial examination and the two years followup assessment.
A PowerPoint file was prepared for each subject. The clinical photographs, BW and NIRT images were displayed for each quadrant on a slide (i.e., four slides per subject). One slide was lacking for one subject because the follow-up NIRT assessment was missing. After removing the patient's initials from the slides, the forty-seven slides were mixed, randomly reorganized, then numbered granting complete and irreversible anonymity. Each of the final forty-seven slides with the clinical images, BW, and NIRT images of both examinations were then redistributed to six slides: Visual 1, BW 1, BW 2, NIRT1, NIRT 2, and all data combined. The final data set included 188 occlusal surfaces that were assessed and scored. Due to the retrospective study design, no sample size calculation was possible prior to the study.

Examiners
Examiner 1 (MA): has been working with NIRT and teaching and researching new diagnostic methods at the university of Geneva since 2012.
Examiner 2 (LV): is the head of the radiology department and has been working and teaching at the university for over 20 years.

Training
Examiners followed the ICDAS e-learning program, developed by the ICDAS Foundation which is a 90-minute course used to introduce the criteria to new users. The ICDAS e-learning program has been shown to improve the implementation of the diagnostic skills of students for the detection of occlusal caries lesions [33].

Data Scoring and Interpretation
Each diagnostic method was scored independently. For NIRT and BW each year was also scored separately following the scoring system shown in Table 1 [34]. Each score was recorded on an excel file for final analysis. The scoring was done again after one month from the first analysis using the same procedure described above to determine intra-examiner agreement for sixty random surfaces.  [34]). Visual evaluation using ICDAS based on clinical images (CI), near-infrared transillumination (NIRT), and bitewing radiographs (BW) were scored into 5 categories as described above. Score 1 was considered as early lesions while scores 2, 3 and 4 were considered as distinct lesions.   [34]). Visual evaluation using ICDAS based on clinical images (CI), near-infrared transillumination (NIRT), and bitewing radiographs (BW) were scored into 5 categories as described above. Score 1 was considered as early lesions while scores 2, 3 and 4 were considered as distinct lesions.

No shadow
Diagnostics 2023, 13, x FOR PEER REVIEW 4 of 12 Table 1. Scoring System of the three detection methods used in this study (adapted from Gomez et al. 2015 [34]). Visual evaluation using ICDAS based on clinical images (CI), near-infrared transillumination (NIRT), and bitewing radiographs (BW) were scored into 5 categories as described above. Score 1 was considered as early lesions while scores 2, 3 and 4 were considered as distinct lesions.

No radiolucency
Diagnostics 2023, 13, x FOR PEER REVIEW 4 o Table 1. Scoring System of the three detection methods used in this study (adapted from Gome al. 2015 [34]). Visual evaluation using ICDAS based on clinical images (CI), near-infrared transi mination (NIRT), and bitewing radiographs (BW) were scored into 5 categories as described abo Score 1 was considered as early lesions while scores 2, 3 and 4 were considered as distinct lesion Recently published Standard reporting of caries detection and (STARCARDDS) were followed whenever applicable [35].

Data Analysis
A total of 188 occlusal surfaces were scored twice based on the f techniques: BW, NIRT, and Visual evaluation of clinical images. The used only for the first assessment.
Possible scores were: 0, 1, 2, 3, 4 as well as "Missing", "Not "Filled". When measuring inter and intra-reliability, the teeth whose s "Not interpretable", or "Filled" were excluded. A score of 0 was reco score of 1 was recorded as "Early" and a score of 2, 3, and 4 were rec The choice to combine codes 2, 3 and 4 was made because codes 3 an the pool of cases we have (young low-risk caries adults) and most sco Differences in kappa coefficients between both examinations w NIRT and BW methods. In addition, differences in kappa coefficient of diagnostic techniques for the first examination and between NIRT low-up examination were reported. This was done for each diagnost examination with Cohen's kappa coefficient, which assumes the score iable. It is bounded by one. A value of one implies perfect agreement a indicates that agreement was less than would be expected just by cha

Thin grey shadow into enamel
Recently published Standard reporting of caries detection and diagno (STARCARDDS) were followed whenever applicable [35].

Data Analysis
A total of 188 occlusal surfaces were scored twice based on the followin techniques: BW, NIRT, and Visual evaluation of clinical images. The Visual used only for the first assessment.
Possible scores were: 0, 1, 2, 3, 4 as well as "Missing", "Not interpr "Filled". When measuring inter and intra-reliability, the teeth whose score wa "Not interpretable", or "Filled" were excluded. A score of 0 was recorded as score of 1 was recorded as "Early" and a score of 2, 3, and 4 were recorded a The choice to combine codes 2, 3 and 4 was made because codes 3 and 4 were the pool of cases we have (young low-risk caries adults) and most scores wer Differences in kappa coefficients between both examinations were repo NIRT and BW methods. In addition, differences in kappa coefficients betwe of diagnostic techniques for the first examination and between NIRT and BW low-up examination were reported. This was done for each diagnostic meth examination with Cohen's kappa coefficient, which assumes the score is a cat iable. It is bounded by one. A value of one implies perfect agreement and a ne indicates that agreement was less than would be expected just by chance [36] Radiolucency in outer half of enamel   [34]). Visual evaluation using ICDAS based on clinical images (CI), near-infrared transillumination (NIRT), and bitewing radiographs (BW) were scored into 5 categories as described above. Score 1 was considered as early lesions while scores 2, 3 and 4 were considered as distinct lesions.
Recently published Standard reporting of caries detection and diagnostic studies (STARCARDDS) were followed whenever applicable [35].

Data Analysis
A total of 188 occlusal surfaces were scored twice based on the following diagnostic techniques: BW, NIRT, and Visual evaluation of clinical images. The Visual method was used only for the first assessment.
Possible scores were: 0, 1, 2, 3, 4 as well as "Missing", "Not interpretable" and "Filled". When measuring inter and intra-reliability, the teeth whose score was "Missing", "Not interpretable", or "Filled" were excluded. A score of 0 was recorded as "Sound", a score of 1 was recorded as "Early" and a score of 2, 3, and 4 were recorded as "Distinct". The choice to combine codes 2, 3 and 4 was made because codes 3 and 4 were rare within the pool of cases we have (young low-risk caries adults) and most scores were 1 and 2.
Differences in kappa coefficients between both examinations were reported for the NIRT and BW methods. In addition, differences in kappa coefficients between each pair of diagnostic techniques for the first examination and between NIRT and BW for the follow-up examination were reported. This was done for each diagnostic method and each examination with Cohen's kappa coefficient, which assumes the score is a categorical variable. It is bounded by one. A value of one implies perfect agreement and a negative value indicates that agreement was less than would be expected just by chance [36].
Intra-examiner reliability (the reliability of measurements of the initial assessment and the follow-up assessment by a given examiner with a fixed diagnostic method) was

Wide grey shadow into enamel
Diagnostics 2023, 13, x FOR PEER REVIEW 4 of 12 Table 1. Scoring System of the three detection methods used in this study (adapted from Gomez et al. 2015 [34]). Visual evaluation using ICDAS based on clinical images (CI), near-infrared transillumination (NIRT), and bitewing radiographs (BW) were scored into 5 categories as described above. Score 1 was considered as early lesions while scores 2, 3 and 4 were considered as distinct lesions.
Recently published Standard reporting of caries detection and diagnostic studies (STARCARDDS) were followed whenever applicable [35].

Data Analysis
A total of 188 occlusal surfaces were scored twice based on the following diagnostic techniques: BW, NIRT, and Visual evaluation of clinical images. The Visual method was used only for the first assessment.
Possible scores were: 0, 1, 2, 3, 4 as well as "Missing", "Not interpretable" and "Filled". When measuring inter and intra-reliability, the teeth whose score was "Missing", "Not interpretable", or "Filled" were excluded. A score of 0 was recorded as "Sound", a score of 1 was recorded as "Early" and a score of 2, 3, and 4 were recorded as "Distinct". The choice to combine codes 2, 3 and 4 was made because codes 3 and 4 were rare within the pool of cases we have (young low-risk caries adults) and most scores were 1 and 2.
Differences in kappa coefficients between both examinations were reported for the NIRT and BW methods. In addition, differences in kappa coefficients between each pair of diagnostic techniques for the first examination and between NIRT and BW for the follow-up examination were reported. This was done for each diagnostic method and each examination with Cohen's kappa coefficient, which assumes the score is a categorical variable. It is bounded by one. A value of one implies perfect agreement and a negative value indicates that agreement was less than would be expected just by chance [36].
Intra-examiner reliability (the reliability of measurements of the initial assessment and the follow-up assessment by a given examiner with a fixed diagnostic method) was

Radiolucency in inner half
if enamel +/-Enamel dentin junction Diagnostics 2023, 13, x FOR PEER REVIEW 4 of 1 Table 1. Scoring System of the three detection methods used in this study (adapted from Gomez e al. 2015 [34]). Visual evaluation using ICDAS based on clinical images (CI), near-infrared transillu mination (NIRT), and bitewing radiographs (BW) were scored into 5 categories as described abov Score 1 was considered as early lesions while scores 2, 3 and 4 were considered as distinct lesions.
Recently published Standard reporting of caries detection and diagnostic studie (STARCARDDS) were followed whenever applicable [35].

Data Analysis
A total of 188 occlusal surfaces were scored twice based on the following diagnosti techniques: BW, NIRT, and Visual evaluation of clinical images. The Visual method wa used only for the first assessment.
Possible scores were: 0, 1, 2, 3, 4 as well as "Missing", "Not interpretable" an "Filled". When measuring inter and intra-reliability, the teeth whose score was "Missing" "Not interpretable", or "Filled" were excluded. A score of 0 was recorded as "Sound", score of 1 was recorded as "Early" and a score of 2, 3, and 4 were recorded as "Distinct" The choice to combine codes 2, 3 and 4 was made because codes 3 and 4 were rare withi the pool of cases we have (young low-risk caries adults) and most scores were 1 and 2.
Differences in kappa coefficients between both examinations were reported for th NIRT and BW methods. In addition, differences in kappa coefficients between each pai of diagnostic techniques for the first examination and between NIRT and BW for the fo low-up examination were reported. This was done for each diagnostic method and eac examination with Cohen's kappa coefficient, which assumes the score is a categorical var iable. It is bounded by one. A value of one implies perfect agreement and a negative valu indicates that agreement was less than would be expected just by chance [36].
Intra-examiner reliability (the reliability of measurements of the initial assessmen and the follow-up assessment by a given examiner with a fixed diagnostic method) wa   [34]). Visual evaluation using ICDAS based on clinical images (CI), near-infrared transillumination (NIRT), and bitewing radiographs (BW) were scored into 5 categories as described above. Score 1 was considered as early lesions while scores 2, 3 and 4 were considered as distinct lesions.
Recently published Standard reporting of caries detection and diagnostic studies (STARCARDDS) were followed whenever applicable [35].

Data Analysis
A total of 188 occlusal surfaces were scored twice based on the following diagnostic techniques: BW, NIRT, and Visual evaluation of clinical images. The Visual method was used only for the first assessment.
Possible scores were: 0, 1, 2, 3, 4 as well as "Missing", "Not interpretable" and "Filled". When measuring inter and intra-reliability, the teeth whose score was "Missing", "Not interpretable", or "Filled" were excluded. A score of 0 was recorded as "Sound", a score of 1 was recorded as "Early" and a score of 2, 3, and 4 were recorded as "Distinct". The choice to combine codes 2, 3 and 4 was made because codes 3 and 4 were rare within the pool of cases we have (young low-risk caries adults) and most scores were 1 and 2.
Differences in kappa coefficients between both examinations were reported for the NIRT and BW methods. In addition, differences in kappa coefficients between each pair of diagnostic techniques for the first examination and between NIRT and BW for the follow-up examination were reported. This was done for each diagnostic method and each examination with Cohen's kappa coefficient, which assumes the score is a categorical variable. It is bounded by one. A value of one implies perfect agreement and a negative value indicates that agreement was less than would be expected just by chance [36].
Intra-examiner reliability (the reliability of measurements of the initial assessment and the follow-up assessment by a given examiner with a fixed diagnostic method) was Shadow less than 2mm in dentine   [34]). Visual evaluation using ICDAS based on clinical images (CI), near-infrared transillumination (NIRT), and bitewing radiographs (BW) were scored into 5 categories as described above. Score 1 was considered as early lesions while scores 2, 3 and 4 were considered as distinct lesions.
Recently published Standard reporting of caries detection and diagnostic studies (STARCARDDS) were followed whenever applicable [35].

Data Analysis
A total of 188 occlusal surfaces were scored twice based on the following diagnostic techniques: BW, NIRT, and Visual evaluation of clinical images. The Visual method was used only for the first assessment.
Possible scores were: 0, 1, 2, 3, 4 as well as "Missing", "Not interpretable" and "Filled". When measuring inter and intra-reliability, the teeth whose score was "Missing", "Not interpretable", or "Filled" were excluded. A score of 0 was recorded as "Sound", a score of 1 was recorded as "Early" and a score of 2, 3, and 4 were recorded as "Distinct". The choice to combine codes 2, 3 and 4 was made because codes 3 and 4 were rare within the pool of cases we have (young low-risk caries adults) and most scores were 1 and 2.
Differences in kappa coefficients between both examinations were reported for the NIRT and BW methods. In addition, differences in kappa coefficients between each pair of diagnostic techniques for the first examination and between NIRT and BW for the follow-up examination were reported. This was done for each diagnostic method and each examination with Cohen's kappa coefficient, which assumes the score is a categorical variable. It is bounded by one. A value of one implies perfect agreement and a negative value indicates that agreement was less than would be expected just by chance [36].
Intra-examiner reliability (the reliability of measurements of the initial assessment and the follow-up assessment by a given examiner with a fixed diagnostic method) was Table 1. Scoring System of the three detection methods used in this study (adapted from Gomez e al. 2015 [34]). Visual evaluation using ICDAS based on clinical images (CI), near-infrared transillu mination (NIRT), and bitewing radiographs (BW) were scored into 5 categories as described above Score 1 was considered as early lesions while scores 2, 3 and 4 were considered as distinct lesions.
Recently published Standard reporting of caries detection and diagnostic studie (STARCARDDS) were followed whenever applicable [35].

Data Analysis
A total of 188 occlusal surfaces were scored twice based on the following diagnosti techniques: BW, NIRT, and Visual evaluation of clinical images. The Visual method wa used only for the first assessment.
Possible scores were: 0, 1, 2, 3, 4 as well as "Missing", "Not interpretable" and "Filled". When measuring inter and intra-reliability, the teeth whose score was "Missing" "Not interpretable", or "Filled" were excluded. A score of 0 was recorded as "Sound", a score of 1 was recorded as "Early" and a score of 2, 3, and 4 were recorded as "Distinct" The choice to combine codes 2, 3 and 4 was made because codes 3 and 4 were rare within the pool of cases we have (young low-risk caries adults) and most scores were 1 and 2.
Differences in kappa coefficients between both examinations were reported for th NIRT and BW methods. In addition, differences in kappa coefficients between each pai of diagnostic techniques for the first examination and between NIRT and BW for the fol low-up examination were reported. This was done for each diagnostic method and each examination with Cohen's kappa coefficient, which assumes the score is a categorical var iable. It is bounded by one. A value of one implies perfect agreement and a negative value indicates that agreement was less than would be expected just by chance [36].
Intra-examiner reliability (the reliability of measurements of the initial assessmen and the follow-up assessment by a given examiner with a fixed diagnostic method) wa

Distinct
Underlying dentinal shadow Score 1 was considered as early lesions while scores 2, 3 and 4 were consider Recently published Standard reporting of caries detection and (STARCARDDS) were followed whenever applicable [35].

Data Analysis
A total of 188 occlusal surfaces were scored twice based on the f techniques: BW, NIRT, and Visual evaluation of clinical images. The used only for the first assessment.
Possible scores were: 0, 1, 2, 3, 4 as well as "Missing", "Not "Filled". When measuring inter and intra-reliability, the teeth whose s "Not interpretable", or "Filled" were excluded. A score of 0 was reco score of 1 was recorded as "Early" and a score of 2, 3, and 4 were rec The choice to combine codes 2, 3 and 4 was made because codes 3 and the pool of cases we have (young low-risk caries adults) and most sco Differences in kappa coefficients between both examinations w NIRT and BW methods. In addition, differences in kappa coefficient of diagnostic techniques for the first examination and between NIRT low-up examination were reported. This was done for each diagnost examination with Cohen's kappa coefficient, which assumes the score iable. It is bounded by one. A value of one implies perfect agreement a Shadow more than 2 mm in dentine Recently published Standard reporting of caries detection and diag (STARCARDDS) were followed whenever applicable [35].

Data Analysis
A total of 188 occlusal surfaces were scored twice based on the follow techniques: BW, NIRT, and Visual evaluation of clinical images. The Visu used only for the first assessment.
Possible scores were: 0, 1, 2, 3, 4 as well as "Missing", "Not inter "Filled". When measuring inter and intra-reliability, the teeth whose score w "Not interpretable", or "Filled" were excluded. A score of 0 was recorded score of 1 was recorded as "Early" and a score of 2, 3, and 4 were recorded The choice to combine codes 2, 3 and 4 was made because codes 3 and 4 w the pool of cases we have (young low-risk caries adults) and most scores w Differences in kappa coefficients between both examinations were re NIRT and BW methods. In addition, differences in kappa coefficients betw of diagnostic techniques for the first examination and between NIRT and B low-up examination were reported. This was done for each diagnostic me examination with Cohen's kappa coefficient, which assumes the score is a c Underlying dentinal shadow Shadow more than 2 mm in dentine Radiolucency passed 1/3 of dentine Recently published Standard reporting of caries detection and diagnostic studies (STARCARDDS) were followed whenever applicable [35].

Data Analysis
A total of 188 occlusal surfaces were scored twice based on the following diagnostic techniques: BW, NIRT, and Visual evaluation of clinical images. The Visual method was used only for the first assessment.
Possible scores were: 0, 1, 2, 3, 4 as well as "Missing", "Not interpretable" and "Filled". When measuring inter and intra-reliability, the teeth whose score was "Missing", "Not interpretable", or "Filled" were excluded. A score of 0 was recorded as "Sound", a score of 1 was recorded as "Early" and a score of 2, 3, and 4 were recorded as "Distinct". The choice to combine codes 2, 3 and 4 was made because codes 3 and 4 were rare within the pool of cases we have (young low-risk caries adults) and most scores were 1 and 2.
Differences in kappa coefficients between both examinations were reported for the NIRT and BW methods. In addition, differences in kappa coefficients between each pair of diagnostic techniques for the first examination and between NIRT and BW for the follow-up examination were reported. This was done for each diagnostic method and each examination with Cohen's kappa coefficient, which assumes the score is a categorical variable. It is bounded by one. A value of one implies perfect agreement and a negative value indicates that agreement was less than would be expected just by chance [36].
Intra-examiner reliability (the reliability of measurements of the initial assessment and the follow-up assessment by a given examiner with a fixed diagnostic method) was measured. This was done for each diagnostic method with Cohen's kappa coefficient. Differences in kappa coefficients between NIRT and BW were reported. Landis and Koch (1977) [36] classified values as follows: <0 as indicating no agreement, 0-0.20 as slight, 0.21-0.40 as fair, 0.41-0.60 as moderate, 0.61-0.80 as substantial, and 0.81-1 as almost perfect agreement.
All kappa coefficients and differences were reported along their 95% confidence interval. The confidence intervals of the kappa coefficients were based upon the variance estimates, whereas those for the difference were obtained by the bootstrap. Bootstrapping is any test or metric that uses random sampling with replacement (e.g., mimicking the sampling process) and falls under the broader class of resampling methods.
The p values for testing the equality of two kappa coefficients were obtained by a permutation test. Permutation tests work by resampling the observed data many times in order to determine a p-value for the test.
Statistical analyses were performed using R (R Foundation for Statistical Computing, Vienna, Austria) with the package "psych" for the computation of kappa coefficients.

Results
This retrospective study analyzed data from twelve subjects whose ages varied between 22 and 32 years old, a total of 188 occlusal surfaces were scored. Table 2 provides an overview of the frequency (%) distribution of all scored surfaces by the experienced examiners including missing, not interpretable, and filled surfaces. The clinical images showed that 53 (28.2%) of the surfaces were restored (filled). Table 2. Frequency (%) distribution of scores given by the experienced dentists for each diagnostic method for both assessments, (Clinical images (CI), near-infrared transillumination (NIRT), and bitewing radiographs (BW)), including missing, filled, and non-interpretable surfaces. The most relevant results are in bold.  Table 3 illustrate the frequency distribution of the scores for each diagnostic method (Clinical images (CI), near-infrared transillumination (NIRT), and bitewing radiographs (BW) for both assessments. It can be observed that for the BW method, "Sound" is the most frequent rating, whereas for NIRT and Visual, "Early" is the most frequent one. The BW showed that only 3% of surfaces examined has occlusal carious lesions. Table 3. Frequency (%) distribution of scores for each diagnostic method for both assessments (Clinical images (CI), near-infrared transillumination (NIRT), and bitewing radiographs (BW)) excluding missing, filled, and non-interpretable surfaces. The most relevant results are in bold.

Agreement between and within Methods
The inter-rater reliability test for the clinicians done on 60 surfaces was substantial to excellent (between 0.61-0.85) depending on the method.
Measures of agreement between methods and within methods based on the scoring were assessed with Cohen's kappa coefficients ( Table 4). The coefficients showed no agreement between BW and NIRT, moderate agreement for BW, and substantial agreement for NIRT in the two examinations. Table 4. Cohen's kappa coefficients with 95% confidence intervals, between methods, and within methods. (Near-infrared transillumination (NIRT), and bitewing radiographs (BW)). The most relevant results are in bold. Surfaces with one missing, filled, non-interpretable image at any point were excluded. The results from this study show a low correlation between BW and visual which further confirms that early occlusal lesions cannot be detected by BW. Kappa values were higher for CI vs. NIRT (0.3) compared to CI vs. BW (0.03-0.04) in the current study confirming that NIR is a superior method to bitewings to detect early enamel caries on the occlusal surface. Table 5 below shows the distribution of scores of surfaces with full data (no missing data), from the experienced examiners for the 3 methods on both assessments, we can clearly see the agreement within methods (BW1 vs. BW2, NIRT 1 vs. NIRT2). Table 5. Cross-tabulations for 109 occlusal surfaces with full data of experienced examiners (Clinical images (CI), near-infrared transillumination (NIRT), and bitewing radiographs (BW)). The agreement of both BW assessments is mostly on sound surfaces while NIRT assessments are on early caries. While most of the early lesions detected on clinical images (CI) were scored sound on BW, NIRT was able to detect more early lesions than CI. The most relevant results are in bold.

Discussion
In the present clinical study, ICDAS scores of occlusal surfaces based on clinical photographs were compared to scores based on NIRT images and digital intra-oral radiographs. The results show that more early occlusal lesions were detected using NIRT followed by clinical images, while BW scores showed mostly sound surfaces at both examinations (first and second assessments) as shown in Table 3.
A similar in vivo study [37] that compared the three methods found that most carious lesions were detected using visual examination followed by NIRT and then BWs. Our finding might be explained by the stricter criteria we used for NIRT images. We considered any visible changes in the occlusal fissure system as an early sign of demineralization based on some of our unpublished in vitro work on extracted teeth with occlusal lesions.
Another reason might be the fact that we used photographs for the visual examination. Advanced technologies have made the use of intraoral photographs in clinical examination simpler, easier, and relatively cheap. Clinical images allow archiving, remote scoring, scoring from multiple examiners, and longitudinal analysis [38]. They could be especially useful in identifying occlusal carious lesions. It has been reported that the assessment of clinical images as a method of detection of occlusal dentine caries had higher sensitivity than visual examination using histology as a reference standard [38]. Clinical images also showed good inter-and intra-examiner reliability that was similar to visual examination [38][39][40].
The low percentage of carious lesions detected on BWs in our study corroborates the finding of previous studies concerning occlusal carious lesion detection on radiographs.
When occlusal lesions are detected on radiographs they usually have reached the middle third of the dentine [41]. Previous studies have shown that BWs have negligible diagnostic value to detect enamel carious lesions and occlusal superficial dentine carious lesions [42,43]. NIRT could therefore be a good adjunct to clinical examination for detecting early occlusal caries. Several studies have suggested that combining the visual examination with another method could improve the accuracy of occlusal carious lesions detection [44,45]. These studies also confirmed the findings of our study.
The population of this study is considered at very low risk for caries development, the students in dental medicine are well-informed about hygiene and they rarely develop new caries. This was confirmed by the lack of lesion progression observed between the two assessments two years apart.
The results discussed above confirm that occlusal carious lesions detection using NIR combined with visual assessment is more reliable than X-ray.
A previous study has shown that NIRT lesion detection is closest to clinical results in occlusal carious lesions and superior to other detection methods like dyes, laser fluorescence, and X-rays. Thus, NIRT would be the most useful method as an adjunct to visual inspection [26].
The results from this study show a low agreement between BW and visual examination which further confirms previous studies reporting that early occlusal carious lesions cannot be detected by BW. A limited number of studies compared NIRT with the visual method on the occlusal surface [25,26]. One study showed a very high correlation between clinical examination and NIRT with a kappa value of 0.99 [25], while the other study found that NIRT also correlated the most with visual examination with r = 0.51 [26]. The reason why Lara-Capi et al. [25] showed high concordance between visual and NIRT methods could be the simpler scoring system used. All dark occlusal shadows were scored as 1 While in our study we used a comprehensive scoring system from 0-4 where scores 1 and 2 indicated thin and wide grey shadows in the enamel, respectively, and scores 3 and 4 indicated shadows less and more than 2 mm into the dentine. Without a gold standard reference, the only statement that can be reported for comparing NIRT to visual examination is that NIRT was able to detect more occlusal lesions.
Further development of the monitoring concept using NIRT images is required to enable more precise follow-up of occlusal and proximal lesions. Some reports showed possible longitudinal monitoring for occlusal and proximal carious lesions using NIRT [23,32]. Although our study did not look into the management of early occlusal caries, we must emphasize the slow progression of these lesions in this sample of a low caries risk population and how this impacts their management. Our NIRT and BW scores showed that 90% of scored lesions remained stable in a 2-year period indicating a good possibility of their arrest or reversal. This is a valid argument for conservative management of early and questionable occlusal carious lesions. To our knowledge, not many studies discussed the characteristics and progression rate of these types of lesions, and a consensus is required on how to properly monitor and manage them [7]. One study following up on questionable occlusal carious lesions for twenty months found that 90% of lesions at the end of the period only required monitoring and no invasive treatment was required [7].
The advantages of NIRT over BW are that NIRT can overcome overlapping of the enamel, is non-ionizing, and can indicate the relative position of the lesion [29]. It can also analyze images in real-time. NIRT is most useful for early detection and patient followup and monitoring [23]. However, to monitor lesions using NIRT image, it is necessary to obtain comparable images at each recall. Detection and monitoring require further development and automatization. Advances in using artificial intelligence to detect and monitor carious lesions on X-rays and NIRT images seem to be promising [46].
There are still many questions that need to be answered as criteria for when NIRT vs. BW is needed. Examiners should make that decision based on many factors. In the meantime, NIRT can reduce the use of BW [47]. Other studies have confirmed our conclusion that that NIRT actually can be a valid alternative, especially for incipient lesions on the occlusal surface [22].
The design of the study does not allow histological validation of the observed carious lesion as it is a retrospective study. Enamel carious lesions cannot be validated in vivo studies because they are managed through preventive measures, unlike dentinal carious lesions, which could be validated when they are restored. In vitro studies can validate lesions via histological methods. Many studies recognized the problem of in vivo validation of enamel carious lesions [20,26,47]. The issue of gold standards and how to validate carious in vivo are not novel and have been recognized since the early 1990s [6]. To overcome these limitations, a study conducted validation of carious lesions by performing both in vivo detection in planned extraction of third molar teeth and then performing histological examination after extraction. This study showed NIRT to have the best correlation with histology and the highest sensitivity and accuracy rates to detect early occlusal carious lesions [48]. Kuhnisch et al. have also stated the difficulty of creating an appropriate in vitro setup because the optical properties of the different embedding materials are not equivalent to the periodontal tissue or anatomy.
The population of this study was a small low caries-risk and highly dental hygieneconscious group. This can be considered a limitation as the results may not be fully generalizable to the whole population.
Sensitivity and specificity values were not calculated in this retrospective study due to the problem stated above. Accurate sensitivity and specificity values cannot be obtained when novel methods like NIRT are compared to imperfect "gold standards" such as the visual method and BW. However, several prior studies showed NIRT higher sensitivity than BW to detect occlusal caries [22,27]. This does not exclude the possibility of more false positives with NIRT imaging, however, when suspected and before taking any therapeutical measures, it is possible to differentiate lesion from discoloration by cleaning the occlusal surface with sandblasting to remove any discoloration. The truth remains that even in case of a false positive diagnosis for an early lesion, the treatment would be increasing the preventive measures or sealing the surface if the patient is at risk. These minimally invasive interventions are harmless and painless and would only provide extra protection for that surface. It must be stressed that using new technologies such as NIRT while practicing old-fashioned dentistry where any lesion detected required a restoration can be problematic and harmful to the patients and is considered as over-treatment. Such tools require a modern caries management approach based on risk assessment, close monitoring, and the use of minimal invasive therapeutical restorations such as sealing and infiltration of early lesions [49][50][51].
It would have been interesting to have the clinical images for a visual evaluation for the second assessment too and to ensure a standardized way for taking the images with properly dried surfaces, but as a drawback of a retrospective study, only the use of available data is possible.
A strength of this study is the monitoring of the same lesions longitudinally so the second reading could be regarded as a validation method for the first reading. NIRT scores showed that over 90% of scores remained the same, signifying slow or no change amongst the low-risk sample in this study and indicating that NIRT could be a useful tool for monitoring and personalized caries management.
Further research on a population with higher caries risk can further confirm the monitoring potential; in our low caries risk population, the monitoring served more as a confirmation and validation for the lesion presented.

Conclusions
Within the limitations of this study, we were able to establish that more early occlusal carious lesions were observed using NIRT and visual exam, while BW scores showed mostly sound surfaces at both examinations with two years interval. NIRT can be considered appropriate to detect and monitor initial enamel carious lesions on the occlusal surface in low caries risk populations. This method has the benefit of being X-ray free and noninvasive and can be reused as necessary.