A Retrospective Study of the Diagnostic Accuracy of In Vivo Reflectance Confocal Microscopy for Basal Cell Carcinoma Diagnosis and Subtyping

Current national and European guidelines recommend distinct management approaches for basal cell carcinoma (BCC) based on tumor location, size, and histopathological subtype. In vivo reflectance confocal microscopy (RCM) is a non-invasive skin imaging technique which may change the diagnostic pathway for BCC patients. This study aimed to determine the sensitivity and specificity of RCM for BCC diagnosis, assess the predictive values of several confocal criteria in correctly classifying BCC subtypes, and evaluate the intraobserver reliability of RCM diagnosis for BCC. We conducted a retrospective study in two tertiary care centers in Bucharest, Romania. We included adults with clinically and dermoscopic suspect BCCs who underwent RCM and histopathological examination of excision specimens. For RCM examinations, we used the VivaScope 1500 and histopathology of the surgical excision specimen was the reference standard. Of the 123 cases included in the analysis, BCC was confirmed in 104 and excluded in 19 cases. RCM showed both high sensitivity (97.1%, 95% CI (91.80, 99.40)) and specificity (78.95%, 95% CI (54.43, 93.95)) for detecting BCC. Several RCM criteria were highly predictive for BCC subtypes: cords connected to the epidermis for superficial BCC, big tumor islands, peritumoral collagen bundles and increased vascularization for nodular BCC, and hyporefractile silhouettes for aggressive BCC. Excellent intraobserver agreement (κ = 0.909, p < 0.001) was observed. This data suggests that RCM could be used for preoperative diagnosis and BCC subtype classification in patients with suspected BCCs seen in tertiary care centers.


Introduction
Basal cell carcinoma (BCC) is the most prevalent skin cancer worldwide. In Europe, BCC incidence has been constantly rising by approximately 5% annually over recent decades [1], causing a major burden on healthcare systems [2,3]. Adding to this, an abrupt increase in BCC incidence in the young

Materials and Methods
A retrospective multicenter study was performed at the following 2 sites: the Dermatology Research Laboratory, "Carol Davila" University of Medicine and Pharmacy, Bucharest, Romania and the Department of Dermatology at Medas Medical Center, Bucharest, Romania. Patient data was collected retrospectively by searching the electronic archives of the participating centers for patients registered between 1 May 2017 and 31 October 2018.
We included consecutively identified patients older than 18 years with a clinical and dermoscopic suspicion of previously untreated BCC, whose medical records included medical history, clinical, dermoscopic, and RCM images as well as a histopathologic report of the excisional biopsy of the lesion. We excluded patients with missing or incomplete data, patients with lesions that were reported to be recurrences, previously treated lesions, or lesions extending to mucosal surfaces. Immunocompromised patients were not excluded from the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Ethics Committee of the "Carol Davila" University of Medicine and Pharmacy Bucharest (Project Number 185, approved on 26.12.2018). All participants gave written informed consent as part of their investigation and treatment procedures, at the time of their registration.
In both centers, RCM examination was conducted using the same commercially available confocal microscope (VivaScope 1500 ® ; Caliber ID, Henrietta, NY, USA.; MAVIG GmbH, München, Germany). RCM imaging at the Medas Medical Center was performed by ML and by CC at the "Prof. N. Paulescu" National Institute of Diabetes, Nutrition, and Metabolic Diseases.
The VivaScope 1500 uses an 830 nm laser diode, reaching a maximum output power of 20 mW at the skin level, allowing for skin imaging without causing injury to investigated tissues. A dermoscopic image captured using the VivaCam serves as a surface map to guide confocal imaging. Five level cubes (30 µm increments, vertically), including the corneal, granular/spinous, dermal-epidermal junction, and papillary dermis, are acquired in the center of the lesion. Each level is a mosaic with a minimum surface of 4 × 4 mm and a maximum of 8 × 8 mm. Individual stacks (4.5 µm increments, vertically) are also acquired in one or more areas of interest, up to a depth of 200 to 250 µm. Individual images of cellular and tissular architecture are also obtained. Only patients with lesions investigated following this RCM imaging protocol were included in the study. Verification that the RCM image set respects protocol was done through an inspection of the log file generated for each confocal examination.
Prior to the study, ML was trained in RCM use and interpretation during a one week confocal laser scanning microscopy course organized by MAVIG GmbH (distributor of the VivaScope ® device) at the University of Modena and Reggio Emilia in Italy. ML had more than three years of RCM experience prior to the start of the study. CC had more than nine years RCM experience.
Imaging, at the time of patient evaluation, was not conducted in a blinded fashion as patient history and clinical examination had to be conducted as part of the standard clinical care. However, the database of static RCM images was analyzed in a blinded fashion by ML immediately after completion and locking, and four weeks after, to document the presence or absence of BCC and of the aforementioned criteria.
All lesions included in the study were surgically treated with margins between 3 to 5 mm. Histopathologic confirmation of BCC presence and subtype, and excision margins inspection using hematoxylin and eosin stained bread-loafed sections was defined as the reference standard. The reporting of histopathological findings was performed by experienced pathologists. During assessment of the reference standard, the pathologist was masked to the findings of the RCM examination, but not to the clinical description of the lesions and patients' clinical history.
We recorded the following characteristics of participants and tumors and summarized them with descriptive statistics: age, gender, tumor topographic location, and tumor histopathological subtype. A distinction was made between superficial, nodular, and aggressive (micronodular, infiltrative, and basosquamous) BCC growth patterns. For the purposes of this study sclerodermiform/morpheaform BCC was considered equivalent to infiltrative BCC. In the case of mixed-type histopathological diagnosis, defined as two or more growth patterns, the most aggressive component was taken into account for analysis.
The primary objective was the agreement between the index test (RCM) and reference standard (histopathology of the excision specimen) in correctly determining BCC presence. The secondary outcomes were estimating the accuracy of predefined confocal criteria in correctly classifying BCC subtypes and determining the intraobserver agreement of preoperative BCC diagnosis through RCM.
One rater (ML) reviewed all RCM images of de-identified cases twice, at a four-week interval. The rater was blinded to clinical and dermoscopic images, histopathological report, and to his previous interpretation. Between evaluations, RCM case numbers were shuffled and recoded by an online software-based algorithm (available at https://www.graphpad.com/quickcalcs/) to prevent identification. Evaluation data were recorded in a standardized manner to BCC presence (yes or no) and presence of the 14 selected criteria (yes or no).
Lesions in which subsequent surgical excision was not performed (reasons were recorded) were excluded. According to Shinkins et al. [52], the ideal approach to including and analyzing inconclusive valid test results (n = 8) is to treat them as if in a clinical scenario. We have, therefore, considered these inconclusive valid results as RCM positive cases, and included the cases which had received the reference standard (n = 2) in the analysis. The numbers of true and false positives and negatives were recorded. We established the sensitivity, specificity, positive, and negative likelihood ratios, and positive and negative predictive values for BCC diagnosis by RCM using 2 × 2 contingency tables analysis. To calculate the overall diagnostic accuracy, the following formula was used: Overall diagnostic accuracy = sensitivity × prevalence + specificity × (1 − prevalence) [53,54]. We used binomial logistic regression to determine the odds ratio (OR) of the predefined confocal criteria for each individual BCC histological subtype. Confidence intervals were 95% and a p value of <0.05 was considered significant. Intraobserver agreement was defined as the degree to which the assessment of selected RCM images is identic for repeated measurements by the same person on different occasions [55]. Cohen's kappa was used to describe intraobserver agreement. Statistical analysis was performed using SPSS version 22.0 (IBM, New York, USA).

Participants
An electronic database search and chart review from the two participating centers identified 184 potentially eligible BCC cases. After evaluating each case for inclusion and exclusion criteria, we excluded three cases due to the poor quality of RCM images. Out of the 181 eligible BCC cases, 58 had not received the index test or the reference standard, hence they were excluded. The two inconclusive RCM cases with histopathological analysis were treated as test positives, leaving a total number of 123 lesions from 87 patients for further analysis (Figure 1). Lesions in which subsequent surgical excision was not performed (reasons were recorded) were excluded. According to Shinkins et al. [52], the ideal approach to including and analyzing inconclusive valid test results (n = 8) is to treat them as if in a clinical scenario. We have, therefore, considered these inconclusive valid results as RCM positive cases, and included the cases which had received the reference standard (n = 2) in the analysis. The numbers of true and false positives and negatives were recorded. We established the sensitivity, specificity, positive, and negative likelihood ratios, and positive and negative predictive values for BCC diagnosis by RCM using 2 × 2 contingency tables analysis. To calculate the overall diagnostic accuracy, the following formula was used: Overall diagnostic accuracy = sensitivity × prevalence + specificity × (1 − prevalence) [53,54]. We used binomial logistic regression to determine the odds ratio (OR) of the predefined confocal criteria for each individual BCC histological subtype. Confidence intervals were 95% and a p value of <0.05 was considered significant. Intraobserver agreement was defined as the degree to which the assessment of selected RCM images is identic for repeated measurements by the same person on different occasions [55]. Cohen's kappa was used to describe intraobserver agreement. Statistical analysis was performed using SPSS version 22.0 (IBM, New York, USA).

Participants
An electronic database search and chart review from the two participating centers identified 184 potentially eligible BCC cases. After evaluating each case for inclusion and exclusion criteria, we excluded three cases due to the poor quality of RCM images. Out of the 181 eligible BCC cases, 58 had not received the index test or the reference standard, hence they were excluded. The two inconclusive RCM cases with histopathological analysis were treated as test positives, leaving a total number of 123 lesions from 87 patients for further analysis ( Figure 1).  Eighty-seven patients (36 males and 51 females) with a mean age of 68.1 ± 12.17 years and median disease duration of 2 years were included in the study. Most lesions were of the nodular subtype, with 11 aggressive BCCs (aBCCs) represented (7 infiltrative BCCs, 3 basosquamous BCCs, and one micronodular BCC), which is consistent with the natural incidence of BCC subtypes. The distribution of the 104 BCCs in terms of subtype and the histopathological diagnoses for the remaining 19 lesions are summarized in Table 1. Most lesions were located in the head and neck area (n = 72), followed by the trunk (n = 30), lower extremities (n = 10), upper extremities (n = 8), and abdomen (n = 3). The number of BCCs in our study (n = 104) is sufficient to confidently calculate sensitivity and specificity with a maximum error of estimation of 6% and 14.1%, respectively, with a confidence interval of 1-alpha = 0.95 (95%). The average time between RCM examination and surgical treatment was 50.99 days.

Evaluation of RCM Criteria According to BCC Subtype
In superficial BCCs (sBCCs), RCM examination revealed the presence of cords connected to the epidermis (13/24) with peripheral palisading (19/24) (Figure 2). accuracy of preoperative RCM for detection of BCC in this case was only slightly higher, at 95.87% (95% CI 90.62, 98.64).

Evaluation of RCM Criteria According to BCC Subtype
In superficial BCCs (sBCCs), RCM examination revealed the presence of cords connected to the epidermis (13/24) with peripheral palisading (19/24) (Figure 2).   accuracy of preoperative RCM for detection of BCC in this case was only slightly higher, at 95.87% (95% CI 90.62, 98.64).

Logistic Regression Analysis for RCM Criteria in BCC Subtyping
We used both univariate and multivariate logistic regression to model the influence of RCM criteria on BCC subtype classification (odds ratios in Table 3 correspond to each BCC subtype). Keratinocyte atypia, epidermal streaming, ulceration, and inflammation were observed with comparable frequencies in all tumor subtypes. The analytic descriptive results of the confocal image analysis are summarized in Table 2.

Logistic Regression Analysis for RCM Criteria in BCC Subtyping
We used both univariate and multivariate logistic regression to model the influence of RCM criteria on BCC subtype classification (odds ratios in Table 3 correspond to each BCC subtype). We entered all 14 RCM criteria in a multivariate logistic regression analysis with backward elimination according to likelihood ratios and a classification cutoff of 0.5. Three separate models were created, one for each BCC subtype. The nBCC model correctly classified 66.3% of cases before including regression criteria and 81.7% after adding predictors, gaining a substantial increase in percentage accuracy in classification (PAC). Nodular BCC was more likely in the presence of peritumoral collagen bundles (OR = 11.454, 95% CI (1.636, 80.188), p = 0.014), increased vascularization (OR = 4.359, 95% CI (1.071, 17.730), p = 0.04), and if cords connected to the epidermis were absent (10.41 times lower odds; p = 0.008). For sBCC, the constant model correctly classified 77.9% of cases and 87.5% with the predictors added. Superficial BCC was the most common diagnosis if cords connected to the epidermis were observed (6.794-fold higher odds; p = 0.017). For aBCC, the change in PAC after adding the RCM criteria was smaller (1%). Aggressive BCC was most common in the presence of hyporefractile silhouettes (OR = 16.92, 95% CI (1.915, 149.499), p = 0.01) and the absence of big tumor islands (4.4-fold lower odds; p = 0.048).
Even though big tumor islands and peritumoral collagen bundles were strongly associated with nBCC in the univariate analysis, this effect was diminished by the influence of other variables in the multivariate statistical model. In aBCC, hyporefractile silhouettes remained a potent predictor in the univariate, but even more so in the multivariate model.

Intraobserver Agreement
The intraobserver agreement for BCC presence calculated from the cross-tabulation was 97.56%. Cohen's kappa was run to determine the intraobserver agreement between the two evaluations. The analysis showed that there was excellent agreement between the two evaluations, κ = 0.909 (95% CI 0.807, 1), p < 0.001.

Adverse Events for Index Test and Reference Standard
There were no adverse events after performing RCM. Adverse reactions after surgical excision included five patients with post-operative wound infections. All cases were successfully treated with oral antibiotics, without the need of hospitalization. There were no serious adverse reactions.
There is considerable dermoscopic pattern variability among different BCC histologic subtypes and, recently, dermoscopy has been shown to accurately discriminate between superficial BCC and all other histopathologic subtypes in approximately 80% of cases, based on a series of criteria [24], meaning that the remaining 20% of tumors that do not fit these criteria require histopathologic examination for subtyping. However, this study included only histopathologically proven BCCs, thus the validity of the criteria for differentiating BCC from other diseases was not assessed. Furthermore, differences in the incidence and frequency of various BCC subtypes among different populations should be taken into account [57].
Our study reveals significant differences in the confocal patterns among BCC subtypes, confirming that RCM provides additional morphologic information and suggesting that RCM enhances the preoperative diagnosis of BCC as well as its subtype classification. This aspect is particularly important in clinical practice since the therapeutic approach of BCC is largely determined by its histopathological subtype.
Our findings confirm previous reported data on RCM findings in BCC and associates specific criteria with different BCC subtypes. First and foremost, tumor cords connected to the epidermis strongly and significantly predicted sBCC, thus supporting previous data [49]. Epidermal streaming is described as one of the most important RCM criterion for the diagnosis of BCC [50,67], however, in our study, epidermal streaming was found in only 37.7% of sBCCs and was not statistically significant in predicting histotype. This result is in accordance with a previous study [49], which reports a 50% frequency of epidermal streaming in sBCC and 50% increased odds of sBCC, although without statistical significance. This may be connected to, as the authors have observed, the increased degree of subjectivity that comes into play when assessing this parameter. Nodular BCC was typified by the presence of big tumor islands and peritumoral collagen bundles, confirming the findings of Longo et al. [49]. Although increased vascularization was detected in all tumoral subtypes in our dataset, in multivariate analysis, this parameter was a predictor only for nBCC. In Nori et al.'s study [50], increased vascularization had a sensitivity of 83.9% and 95.7% and a specificity of 53.6% and 53.6% for nodular and superficial BCC, respectively. A cord connected to the epidermis was, in our study, a negative predictor for nBCC, their absence resulting in 10-times lower odds for this subtype. Previous results also show 93% lower odds for nBCC in the presence of cords connected to the epidermis [49]. Aggressive BCC was characterized by the presence of hyporefractile silhouettes (63.6%), while others have found these structures in 77.3% of infiltrative BCCs [49]. Furthermore, aBCC was the most common diagnosis in the absence of big tumor islands, a finding corroborated by others [49]. However, due to the particular appearance of hyporefractile silhouettes, their recognition might require substantial experience with confocal microscopy. Histopathologically, these structures correspond to non-pigmented tumor islands. While previous studies [49] report a high frequency (95.5%) of collagen fibers surrounding tumor islands in infiltrative BCCs, although without statistical significance, in our study, this criterion was present in only 36.4% of aBCCs and was not a statistically significant predictor.
Keratinocyte atypia and inflammation were present in the majority of tumors of all subtypes, while ulceration, onion-like structures, and dendritic structures inside tumor islands were less frequently seen, results also confirmed by previous findings [49,66]. Nori et al. [50] found the sensitivity of pleomorphic epidermis (keratinocyte atypia) for sBCC and nBCC to be 56.5% and 65.5%, while specificity was 63.8% for both subtypes.
In our study, the intraobserver agreement was assessed based on static de-identified RCM images. We report an intraobserver agreement for BCC presence of 97.56%, thus confirming previous findings of the reliability of RCM in correctly and consistently diagnosing BCC [56,68]. However, we believe this simple method of assessing agreement is flawed because it does not take into account chance agreement [69]. Therefore, Cohen's kappa was run, showing excellent agreement (κ = 0.909, 95% CI (0.807, 1), p < 0.001). We consider this to be one of the strengths of our study, along with the adherence to the STARD guidelines [51]. Using predefined RCM criteria and the same generation VivaScope 1500 device at both participating centers helped prevent heterogeneity of the results.
Limitations of our study include the retrospective design, which is subject to recall and observer bias. These have been addressed by the use of de-identified RCM images and the shuffling of images between evaluations. The use of static RCM images brings external validity issues into discussion, as there are significant differences between diagnosing and subtyping BCCs using blinded static images and real-time RCM combined with clinical information and dermoscopy [70]. We believe that in a real clinical scenario, where real-time RCM is typically used as an adjunct imaging tool to patient history, clinical examination, and dermoscopy, diagnostic accuracy measures of this complementary approach could be even higher. However, this assumption needs to be corroborated through further, preferably prospective, studies. Secondly, our study included a limited number of patients and our findings need to be confirmed prospectively to more precisely determine the sensitivity and specificity of these diagnostic criteria. Thirdly, our sample did not include any melanomas, one of the biggest differential diagnostic concerns of clinicians evaluating skin tumors, making this another limitation. Two of the biggest challenges RCM users face in accurately discriminating between BCC subtypes are the relative lack of studies reporting the reliability of subtype-specific confocal criteria and the limited depth of imaging of the RCM device (approximately 200-250 µm). The latter is already being addressed by several ongoing studies [71,72].
The use of RCM to avoid skin biopsy in selected cases could lead to a significant cost reduction if we consider that RCM requires one user and one confocal imaging device, while a skin biopsy necessitates a minimum of four persons: dermatologist, nurse, histopathology laboratory technician, and pathologist. However, prospective cost-effectiveness studies of RCM versus skin biopsy should be conducted in order to determine if there is a financial advantage to be gained. Previous diagnostic RCM studies have focused on sensitivity and specificity for diagnosing BCC and its histopathological subtypes, however, other aspects, such as time between diagnosis and treatment, should also be considered. The average time period between RCM and surgery in our study was 50.99 days, although this was due mostly to patient related factors. In our experience, RCM imaging only takes about 10 to 15 min per lesion, therefore optimizing patient flow from presentation to the operating room. Thus, one of the main advantages using RCM is on the spot diagnosis and treatment of BCCs compared to painful procedures, such as skin biopsies, with all the delays this implies. In the future, RCM could potentially replace the skin biopsy before Mohs micrographic surgery procedures, saving time, funds, and an avoidable and painful procedure. Moreover, by using the more flexible hand-held VivaScope 3000 (VivaScope 3000; Caliber ID, Henrietta, NY, USA.), clinically suspicious lesions can be evaluated even faster. Moreover, selected cases of sBCC patients could potentially benefit from completely non-invasive management [73].

Conclusions
In conclusion, our study shows that RCM is reliable in correctly diagnosing BCC and identifies specific confocal criteria associated with BCC subtypes. If accurate subtyping is achieved, RCM could play a key role in BCC management, therefore additional prospective studies are required to investigate whether the combination of dermoscopy and RCM would help increase the accuracy of preoperative BCC subtype classification.
Author Contributions: M.L., C.C., S.Z., and C.G. contributed to the conception of this study and performed the preliminary documentation. All authors participated in the design of the study and implemented the research. M.L., C.C., D.B., and S.Z. were responsible for the data acquisition, selection and analysis, and clinical interpretation of the data. M.L., I.M.P., V.M.V., D.B., C.C., S.Z., and C.G. participated in the statistical analysis and contributed to the interpretation of the results as well as the manuscript drafting and writing of the study. M.L., C.C., S.Z., and C.G. have revised critically the manuscript for important intellectual content. All authors reviewed and approved the final manuscript.