Diagnostic Accuracy of Confocal Laser Endomicroscopy for the Diagnosis of Oral Squamous Cell Carcinoma: A Systematic Review and Meta-Analysis

Background: Advances in treatment approaches for patients with oral squamous cell carcinoma (OSCC) have been unsuccessful in preventing frequent recurrences and distant metastases, leading to a poor prognosis. Early detection and prevention enable an improved 5-year survival and better prognosis. Confocal Laser Endomicroscopy (CLE) is a non-invasive imaging instrument that could enable an earlier diagnosis and possibly help in reducing unnecessary invasive surgical procedures. Objective: To present an up to date systematic review and meta-analysis assessing the diagnostic accuracy of CLE in diagnosing OSCC. Materials and Methods. PubMed, Scopus, and Web of Science databases were explored up to 30 June 2021, to collect articles concerning the diagnosis of OSCC through CLE. Screening: data extraction and appraisal was done by two reviewers. The quality of the methodology followed by the studies included in this review was assessed using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool. A random effects model was used for the meta-analysis. Results: Six studies were included, leading to a total number of 361 lesions in 213 patients. The pooled sensitivity and specificity were 95% (95% CI, 92–97%; I2 = 77.5%) and 93% (95% CI, 90–95%; I2 = 68.6%); the pooled positive likelihood ratios and negative likelihood ratios were 10.85 (95% CI, 5.4–21.7; I2 = 55.9%) and 0.08 (95% CI, 0.03–0.2; I2 = 83.5%); and the pooled diagnostic odds ratio was 174.45 (95% CI, 34.51–881.69; I2 = 73.6%). Although risk of bias and heterogeneity is observed, this study validates that CLE may have a noteworthy clinical influence on the diagnosis of OSCC, through its high sensitivity and specificity. Conclusions: This review indicates an exceptionally high sensitivity and specificity of CLE for diagnosing OSCC. Whilst it is a promising diagnostic instrument, the limited number of existing studies and potential risk of bias of included studies does not allow us to draw firm conclusions. A conclusive inference can be drawn when more studies, possibly with homogeneous methodological approach, are performed.

allowing effective analysis of the surgical margins and ensuring improved precision in the determination of tumour resection margins and preventing recurrence. This medical imaging modality enables an in vivo diagnosis and images can be acquired almost indefinitely whilst avoiding any iatrogenic harm to the patient.
To articulate comprehensive and conversant evidence-based recommendations for the coherent use of CLE, a systematic review and meta-analysis was designed, to assess the precision in the diagnosis of OSCC using histopathology as the reference standard.

Materials and Methods
A systematic review and meta-analysis was performed and the results were described according to the standard Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) statement [51]. The review protocol was registered in the PROSPERO International Prospective Register of Systematic Reviews (CRD42021278405).

Study Objective and Definition of Reference Standard
The search strategy followed the PICO (population, intervention, comparison, and outcome) guidelines [52]. The population comprised of OSCC patients; OSCC can be defined as squamous cell carcinomas in the oropharynx and oral cavity including tongue, palate, etc. The intervention used was the use of CLE for diagnosis of OSCC. True positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) were the outcomes measured while detection of OSCC using histology were the reference standard. A diagnosis subsequent to histopathological analysis of a biopsy specimen (incisional or excisional) was considered. The study design had no limitations, as long as original data were specified. The main objective of this systematic review and meta-analysis was to assess the accuracy of CLE for the diagnosis of OSCC.
To evaluate the applicability of studies, two reviewers (S.S. and X.J.) independently screened all retrieved articles by titles first followed by abstracts to determine their relevance. Possibly eligible articles were analysed after full-text recovery. Incongruities were discussed with a third reviewer (LJ).

Inclusion and Exclusion Criteria
The established eligibility criteria were: (1) the CLE instrument used in the study should be based on the principle of Fluorescent Laser Endomicroscopy only; (2) the examined lesions were OSCCs (well differentiated, moderately differentiated, and poorly differentiated histopathological subtypes); (3) histopathological diagnosis of OSCC was the reference standard, following the examination of the biopsy specimen (incisional or excisional); (4) only human studies were included; (5) no case reports or reviews were included; (6) articles written in English were exclusively included.
We excluded from the analysis: (1) all studies using Reflectance Confocal Microscopy; (2) all basal cell carcinomas or corneal/optical epithelial tumours; (3) all studies based on cytological smears; (4) no animal-based models or trials were included; (5) studies where full-text and recovery was unlikely, including a search of the relevant databases and attempting communication with the corresponding authors. Studies with overlapping populations found in studies were also excluded, including a single study with the most representative data. Moreover, the reference list of all the papers were reviewed to ascertain additional appropriate studies that may have been unnoticed whilst initial screening was undertaken.

Data Extraction
A reviewer (S.S.) extracted the data from the included studies, which was additionally validated by another reviewer (X.J.). The variables extracted included: the name of the first author, country, year of publication, tumour sites (percent frequency), number of reviewers validating the CLE device, fluorescent agent used, total patients and lesions, patient sex and age (mean/median, years), sensitivity of instrument, specificity of instrument, and confocal criteria employed for the diagnosis of OSCC.

Risk of Bias
Risk of bias and quality of included studies was evaluated using the Quality Assessment of Diagnostic Accuracy Studies) QUADAS-2 checklist for primary studies assessing the diagnostic accuracy in four KEY domains; Patient Selection, Index Test, Reference Standard, and Flow and Timing [53]. All four domains are appraised for risk of bias, and only the first three domains are appraised for applicability concerns. The risk of bias was assessed to be either high, low, or unclear on the basis of the following questions: (1) if the reference standard was more likely to correctly diagnose OSCC (2) if a threshold for index-test was respecified; (3); whether the patient selection was a sequential or random sample of patients enrolled; (4) if there was an appropriate interval between index test and reference standard.

Statistical Analysis and Meta-Analysis
Tables were created for each CLE-based diagnosis of OSCC with histopathology diagnosis of incisional or excisional biopsy specimens. The sensitivity, specificity, and corresponding 95% confidence intervals were represented using forest plots. Diagnostic accuracy in terms of sensitivity, specificity, and diagnostic odds ratio with 95% confidence intervals was evaluated on the basis of TP, FP, TN, and FN extracted from each of the included studies.
Sensitivity was estimated as the proportion of patients, correctly identified by CLE as having OSCC, and specificity as the proportion of patients, correctly identified on radiology as not having OSCC. The diagnostic odds ratio was defined as the odds of the CLE diagnosis being positive for a patient having OSCC relative to the odds of the CLE test being positive for patients not having OSCC. Meta-analysis of sensitivity and specificity were determined by designing a bivariate model (hierarchical logistic regression) [54] and the Hierarchal summary receiver operating characteristic curve (HSROC) was created. The HSROC curve represents sensitivity vs. specificity graphically and provides evidence concerning the general test performance across different thresholds. Within study and between studies variability was determined by this model.
Heterogeneity is calculated as Higgins I 2 , where I 2 = 0% indicates no observed heterogeneity and I 2 > 50% is categorised as substantial heterogeneity. Heterogeneity is a prominent attribute observed in almost all meta-analyses of diagnostic accuracy tests, which can be explained by variation in the index test efficiency due to varied suggestive diagnostic thresholds. Statistical analysis of the sources of heterogeneity was not performed as the size of the subgroups were insignificant (two or three studies per group).

Literature Search Results
The preliminary database search recognized a total number of 2095 of articles. After removal of duplicates, only 1554 articles persisted. After title and abstract assessment, 1509 articles were disqualified and 45 were selected for full-text recovery and further analysis. Thirty-nine articles were excluded after full-text analysis illustrated in Figure 1. Six studies totaling a number of 361 lesions in 213 patients were included in the final analysis [25,[55][56][57][58][59]. The study characteristics are tabulated in Table 1.   Two out of the six included studies did not specify the patient details of mean age, number of males and females. The average percent of males and females across the remaining four studies were 62.2% and 37.8% respectively. The manufacturer of the CLE devices CellVizio, Vivascope 2500, FIGH-300S and CONVIVO was Mauna Kea Technologies (Paris, France), Lucid Inc. (Lucid Technologies, Henrietta, NY, USA), Fujikura or HDIG (Sumita) and Carl Zeiss AG (Oberkochen, Germany), respectively. The majority of studies were carried out in Germany. One study did not specify the site of OSCC in oral cavity [56]. Confocal criteria for OSCC diagnosis varied considerably between studies (Table 2). Risk of bias and quality assessment of study reports. The outcome of the methodological quality assessment of the included studies is explained in Table 3. The included studies exhibited low or unclear risk for bias and applicability concerns in all domains. One study (16.66%) had an unclear (n = 1) risk of bias regarding patient selection with an indeterminate patient selection procedure. Five studies completely described the patient selection protocol. One study presented high and uncertain applicability concerns due to restrictions applied to the studied population (including lesions of suspected premalignancy and malignancy) and inclusion of the contralateral oral mucosa of patients, which overlooks the concept of field cancerization and applicability of the tool.
Two out of the six included studies had a high risk of bias or unclear risk of bias concerning the index test being mostly attributed to the blinding of investigators to patient characteristics. Most of the studies had low applicability concerns in the index test area due to clear demarcation of malignant changes in the observed epithelial cells.
Five of the included studies had a low risk of bias concerning the use of the reference standard due to clearly defined histopathological and confocal criteria, only one study had no clear diagnostic criteria and were at high risk of bias due to inadequate referencing standards. With respect to applicability concerns of the reference standard, only one study had a high risk due to the reference standard determined by an expert clinical diagnosis, whilst one study did not specify the assessors' experience level.
According to the QUADAS-2 tool, in the domain of flow and timing, three studies had a high risk of bias as these studies comprised a fair number of all benign and suspected dysplastic lesions.

Diagnostic Accuracy of CLE and Meta-Analysis
Whilst there are limitations to the conclusions that be drawn from a meta-analysis, the meta-analysis does give a general overview of the calculated sensitivity and specificity of any confocal microscope in different situations. Although, the studies used dissimilar tools in different settings, the actual point estimates of sensitivity and specificity depict 'threshold effect' in diagnostic clinical research.
All six studies were included in the meta-analysis. The results of the meta-analysis are reported but with its limitations, and attention against variation and potential biases. Sensitivity ranged from 71.4% to 99.3% and specificity ranged from 80% to 100%. The analysis revealed a pooled sensitivity and specificity of 95% (95% CI, 92.9-97%; I 2 = 77.5%) and 93% (95% CI, 90-95%; I 2 = 68.6%). The plot of CLE sensitivity and specificity with the summary values in the diagnosis of OSCC in the involved studies is illustrated in Figure 2. Five of the included studies had a low risk of bias concerning the use of the reference standard due to clearly defined histopathological and confocal criteria, only one study had no clear diagnostic criteria and were at high risk of bias due to inadequate referencing standards. With respect to applicability concerns of the reference standard, only one study had a high risk due to the reference standard determined by an expert clinical diagnosis, whilst one study did not specify the assessors' experience level.
According to the QUADAS-2 tool, in the domain of flow and timing, three studies had a high risk of bias as these studies comprised a fair number of all benign and suspected dysplastic lesions.

Diagnostic Accuracy of CLE and Meta-Analysis
Whilst there are limitations to the conclusions that be drawn from a meta-analysis, the meta-analysis does give a general overview of the calculated sensitivity and specificity of any confocal microscope in different situations. Although, the studies used dissimilar tools in different settings, the actual point estimates of sensitivity and specificity depict 'threshold effect' in diagnostic clinical research.
All six studies were included in the meta-analysis. The results of the meta-analysis are reported but with its limitations, and attention against variation and potential biases. Sensitivity ranged from 71.4% to 99.3% and specificity ranged from 80% to 100%. The analysis revealed a pooled sensitivity and specificity of 95% (95% CI, 92.9-97%; I 2 = 77.5%) and 93% (95% CI, 90-95%; I 2 = 68.6%). The plot of CLE sensitivity and specificity with the summary values in the diagnosis of OSCC in the involved studies is illustrated in Figure  2.   The form of the HSROC curve in Figure 3 and the estimated area under the curve (AUC) was 0.97, which suggested the absence of a threshold effect. The summary ROC curve in Figure 3 describes that effect, which explains most of the heterogeneity in these studies. Additionally, the random-effects method considered the variation between studies. The shape of the prediction region is a graphic illustration of the amount of heterogeneity between studies. It is reliant on the conjecture that the data follows a normal distribution and should therefore not be over-interpreted. The positive likelihood ratio ranged from 4.28 (95% CI, 0.73-25.06) to 98.5 (95% CI, 6.24-1554.1) and the negative likelihood ratio extended from 0.008 (95% CI, 0.001-0.055) to 0.3 (95% CI, 0.14-0.78). The pooled positive likelihood ratio and negative likelihood ratios were 10.85 (95% CI, 5.4-21.7; I2 = 55.9%) and 0.08 (95% CI, 0.03-0.2; I2 = 83.5%). The estimated diagnostic odds ratio (DOR) ranged from 14 The form of the HSROC curve in Figure 3 and the estimated area under the curve (AUC) was 0.97, which suggested the absence of a threshold effect. The summary ROC curve in Figure 3 describes that effect, which explains most of the heterogeneity in these studies. Additionally, the random-effects method considered the variation between studies. The shape of the prediction region is a graphic illustration of the amount of heterogeneity between studies. It is reliant on the conjecture that the data follows a normal distribution and should therefore not be over-interpreted.

Heterogeneity Analysis
With reference to heterogeneity analysis, a Spearman correlation coefficient of 0.314 (p = 0.544) was calculated which proposed the lack of a threshold effect. Statistical analysis of the sources of heterogeneity was not performed as subgroups were too small (two or three studies per group).
The foundations of bias include variation in (i) sites within the oral cavity assessed; (ii) type of CLE device; and (iii) level of CLE training of the reviewers.
As per the general methodology rules specified for Cochrane reviews [60], "tests for funnel plot asymmetry (to assess publication bias) should be used only when there are at least 10 studies included in the meta-analysis, because when there are fewer studies the

Heterogeneity Analysis
With reference to heterogeneity analysis, a Spearman correlation coefficient of 0.314 (p = 0.544) was calculated which proposed the lack of a threshold effect. Statistical analysis of the sources of heterogeneity was not performed as subgroups were too small (two or three studies per group).
The foundations of bias include variation in (i) sites within the oral cavity assessed; (ii) type of CLE device; and (iii) level of CLE training of the reviewers.
As per the general methodology rules specified for Cochrane reviews [60], "tests for funnel plot asymmetry (to assess publication bias) should be used only when there are at least 10 studies included in the meta-analysis, because when there are fewer studies the power of the tests is too low to distinguish chance from real asymmetry." As six studies have been included in the meta-analysis, a funnel plot was not performed.
Based on latest epidemiological data [4], the global prevalence of OSCC is 2%. Using existing data along with the results of this study, the total number of true positives, false positives, true negatives, and false positives can be projected in a hypothetical cohort of 1000 patients. This means that 20 patients in this cohort would have OSCC. If we use CLE as a diagnostic tool for OSCC (sensitivity of 95% and a specificity of 93%), one of these 20 OSCCs would go undetected, while 69 patients would be needlessly treated (Figure 4). power of the tests is too low to distinguish chance from real asymmetry." As six studies have been included in the meta-analysis, a funnel plot was not performed.
Based on latest epidemiological data [4], the global prevalence of OSCC is 2%. Using existing data along with the results of this study, the total number of true positives, false positives, true negatives, and false positives can be projected in a hypothetical cohort of 1000 patients. This means that 20 patients in this cohort would have OSCC. If we use CLE as a diagnostic tool for OSCC (sensitivity of 95% and a specificity of 93%), one of these 20 OSCCs would go undetected, while 69 patients would be needlessly treated (Figure 4).

Discussion
CLE is a potentially useful diagnostic tool that is non-invasive and allows real-time cellular imaging of the epithelium of the upper layers of the epithelium at resolutions comparable to histology. The criteria for CLE diagnosis of OSCC are easy to learn and even non-experts in the field of CLE have been able to make a precise diagnosis of OSCC by using these criteria [56]. Previous research has indicated the efficient and precise ability of CLE to envisage dysplastic head and neck squamous cell mucosa, with close reproducibility of the histopathological diagnosis [61,62].
This systematic review and meta-analysis compares histopathological diagnosis from in/ex vivo specimens to the diagnostic precision of CLE by means of analysing the results of 6 studies which comprised a total of 361 lesions. The search strategy used wideranging keywords in various relevant databases to find as many studies as possible.

Discussion
CLE is a potentially useful diagnostic tool that is non-invasive and allows real-time cellular imaging of the epithelium of the upper layers of the epithelium at resolutions comparable to histology. The criteria for CLE diagnosis of OSCC are easy to learn and even non-experts in the field of CLE have been able to make a precise diagnosis of OSCC by using these criteria [56]. Previous research has indicated the efficient and precise ability of CLE to envisage dysplastic head and neck squamous cell mucosa, with close reproducibility of the histopathological diagnosis [61,62].
This systematic review and meta-analysis compares histopathological diagnosis from in/ex vivo specimens to the diagnostic precision of CLE by means of analysing the results of 6 studies which comprised a total of 361 lesions. The search strategy used wide-ranging keywords in various relevant databases to find as many studies as possible.
The outcomes of the meta-analysis indicate a sensitivity of 95% and a specificity of 93% when using CLE for diagnosis of OSCC. However, care must be taken while interpreting these extraordinary values of both sensitivity and specificity. The substantial heterogeneity indicates the direct assessment of the diagnostic accuracy of CLE amongst the included studies improbable. CLE sensitivity for the diagnosis of OSCC ranged between 71.4% to 99.3%, and its specificity ranged between 80% and 100%. Though statistically insignificant (most likely due to inadequate statistical power), the discrepancies could still be explained by the diverse confocal criteria and dissimilar experimental designs (in vivo vs. ex vivo), but also investigator skill, and most likely other indefinite heterogeneity sources. Even when utilising similar diagnostic criteria, the experience of an investigator could have an influence on the diagnostic accuracy as it has been demonstrated that there is a strong correlation of expertise level and the interpretation accuracy [57].
Of the six studies included in our review, three are ex vivo [55,57,58] in design and three are in-vivo [25,56,59] in design. However, when comparing the CLE images for freshfrozen and formalin-fixed tissue from the patients, only marginal differences (regarding the range of brightness) in morphology were observed, with no variation in the resolution of the images. Thus, it can be stipulated that the images on ex vivo specimens would be largely reproducible in an in vivo situation [55].

Clinical Relevance
The efficiency and acceptability of this instrument has been addressed in the study by Nathan C et al. [25], in which they explain the advantage of CLE imaging as it decreases the sampling errors encountered during tissue biopsy or could lead to the decision of avoiding a biopsy altogether and leading to real-time management decisions.
The gold standard treatment for oral dysplastic lesions is excision or laser ablation, and lower grade suspicious lesions are usually kept under observation, due to possible chances of regression (9-45%) [63]. However, this decision is highly criticised due to higher rates of recurrence, metastasis, and incipient malignant progressions, and it is argued that clinical examinations and palpations are insufficient in determining the malignant potential of a visually low suspicious presentation of a lesion [64]. Hence, there is the potential utilization of CLE as a surveillance tool, which could aid in diagnosing which lesions can be observed instead of being resected and which lesions demand enhanced management.
In vivo CLE could develop into a very valuable and beneficial instrument in the diagnosis of OSCC; but, to be regarded as an adjunct to the gold standard reference of histopathology, this non-invasive technique ought to have the capacity to distinguish between the histopathological grades of OSCC [65]. Although the treatment strategies for OSCC are largely based on the TNM staging of the tumour [66], the histopathological grade is also of critical importance when strategizing the therapeutic approach to treat an OSCC patient.

Strengths and Limitations
The included studies had some limitations such as the improper visualization of the dorsal surface of the tongue due to the keratinized filiform papillae and the limited accuracy of detection of lesions below the superficial mucosa [25,55]. Most of the included studies had a small sample size and the reproducibility of the results are not reliable enough to make concrete diagnosis and treatment decisions [25,58,59]. Due to ex vivo experiment design, one study is limited by the unavailability of fresh-frozen tissues for all included patients [55] as this limits the reciprocation of their results. This particular study highlights the accuracy of a non-invasive sensitive imaging modality which can be used as a screening instrument. Early detection of OSCC can greatly improve the prognosis and thus help in reducing the current burden. Although it is still not a verified replacement of histopathology, its non-invasive nature does ensure a large screening based program, with early detections and treatments. However, we would like to acknowledge the fact that precancerous lesions were not recruited or considered for inclusion in this study and there is no evidence to support the fact that it may be successful to evaluate precancerous lesions. Given the accuracy and highly sensitive reproduction of cellular details, it can only be speculated that it will reproduce similar results for precancerous lesions and can be used for screening purposes.
One of the included studies notes that no study so far has investigated the clinical and surgical utility of CLE, especially intraoperative visualization of distinct cancerous and non-cancerous tissue margins, to limit the extent of surgical invasions [55].
To enable homogeneity, future studies could consider reporting the number of lesions analyzed, the skilled experience of the investigator, and attendance of skill development courses. Development and validation of a standardized confocal criteria for OSCC diagnosis via international agreement is desirable. Global consensus is essential for this instrument to begin its journey towards replacing the invasive surgical and histopathological techniques for screening purposes

Future Directions
Regarding the future scope of CLE, transference of the first tentative results of CLE in the human oral cavity into an effective and evidence based clinical setting will be a crucial step. Recent research has shown an evolution of artificial intelligence algorithms and the utilization of computational methods for an accurate diagnosis and prognosis of head and neck cancers [67]. Artificial intelligence science along with precision-based optical imaging systems such as confocal microscopy greatly improve the prospects of improving screening and prognostic outcomes of OSCC.
Further studies exploring the diagnostic accuracy of in vivo confocal laser endomicroscopy for OSCC are expected in future. To help comparability of the results it is recommended that the histopathological assessment of the excisional biopsy specimen be utilized as a reference standard.

Conclusions
Confocal laser endomicroscopy is a technique that may assist in the diagnosis of oral squamous cell carcinoma. A decisive conclusion relies on an increased number of studies investigating this technique that follow a homogeneous methodological approach, which will allow for a comparable assessment. Institutional Review Board Statement: Not applicable, as this study did not involve humans or animals.

Informed Consent Statement:
Not applicable, as this study did not involve humans.

Data Availability Statement:
This study did not report any data.
Conflicts of Interest: R.A.M. is a co-founder and director of Miniprobes Pty Ltd., a company that develops novel optical imaging systems. Miniprobes Pty Ltd did not contribute to this study. The authors declare no other conflicts of interest.