Development of a Single Molecule Counting Assay to Differentiate Chromophobe Renal Cancer and Oncocytoma in Clinics

Simple Summary We previously reported a gene signature, “Chromophobe-Oncocytoma Gene Signature” (COGS), to differentiate chromophobe renal cell carcinoma from oncocytoma. Current clinical workflow with histology and immunohistochemistry can fail to distinguish these two renal cancers. We have evaluated the potential of COGS genes to classify these two renal tumors, on a single-molecule counting platform by measuring the expression of the COGS genes in archival tissue at Augusta University Medical Center. We show the expression level difference of the COGS signature in these tumors is able to classify these cancers accurately using machine learning. The assay has the potential to be instituted in clinically complex cases to differentiate these renal tumors. Abstract Malignant chromophobe renal cancer (chRCC) and benign oncocytoma (RO) are two renal tumor types difficult to differentiate using histology and immunohistochemistry-based methods because of their similarity in appearance. We previously developed a transcriptomics-based classification pipeline with “Chromophobe-Oncocytoma Gene Signature” (COGS) on a single-molecule counting platform. Renal cancer patients (n = 32, chRCC = 17, RO = 15) were recruited from Augusta University Medical Center (AUMC). Formalin-fixed paraffin-embedded (FFPE) blocks from their excised tumors were collected. We created a custom single-molecule counting code set for COGS to assay RNA from FFPE blocks. Utilizing hematoxylin-eosin stain, pathologists were able to correctly classify these tumor types (91.8%). Our unsupervised learning with UMAP (Uniform manifold approximation and projection, accuracy = 0.97) and hierarchical clustering (accuracy = 1.0) identified two clusters congruent with their histology. We next developed and compared four supervised models (random forest, support vector machine, generalized linear model with L2 regularization, and supervised UMAP). Supervised UMAP has shown to classify all the cases correctly (sensitivity = 1, specificity = 1, accuracy = 1) followed by random forest models (sensitivity = 0.84, specificity = 1, accuracy = 1). This pipeline can be used as a clinical tool by pathologists to differentiate chRCC from RO.


Introduction
Chromophobe renal cell carcinoma (chRCC) and renal oncocytoma (RO) are renal tumor types, each generally termed as oncocytic renal neoplasms [1] that arise from collecting ducts and constitute about 8-13% of all renal tumors [1,2]. ChRCC is a slow-growing malignant tumor that, once metastasized, has a poor prognosis [3]. RO is a benign tumor that does not require treatment [3]. Due to prognostic differences, it is essential to differentiate these tumors to guide treatment. Both of these tumors are diagnosed using hematoxylin and eosin (H&E) and immunohistochemistry (IHC) staining [4]. Histologically these tumors are similar; however, features such as binucleate or multinucleated cells, nuclear wrinkling, perinuclear clearing, presence of mitotic figures, and cytoplasmic invaginations are distinctive to chRCC and can help differentiate chRCC from RO. However, given substantial morphological overlap, especially in biopsies, the distinction between the two may be challenging despite the aforementioned distinctive morphologic characteristics [4]. Due to their similarity in histology and lack of reliable IHC markers, the distinction between chRCC and RO can be challenging.
Immunohistochemical detection of CK7 is used in laboratories with variable sensitivity and specificity [4][5][6]. Other studies showed that the inclusion of CD117, vimentin, and S100A1 in clinical diagnosis distinguishes chRCC from RO [4,6,7]. Several recent studies [7][8][9] have suggested using genetic and molecular assays as well as deep learningbased image detection to classify these tumors. Throughout the literature, smaller sample sizes have been the main limitation leading to variation in marker sensitivity and specificity [6,[8][9][10]. In an effort to resolve interobserver variability and misdiagnosis, comprehensive guidelines and additional markers for renal tumor differentiation are needed.
We previously reported the Chromophobe-Oncocytoma-related gene signature (COGS) and a bioinformatics pipeline that can differentiate chRCC from RO and be implemented as a clinical tool [11]. Here we present our results on a retrospective validation of the gene signature at Augusta University Medical Center (AUMC). We developed a diagnostic workflow using a single-molecule counting assay to use formalin-fixed paraffinembedded (FFPE) samples to differentiate these tumors. We demonstrate the robustness of the gene-signature and bioinformatics pipeline for the classification of chRCC from RO with unsupervised and supervised models.

Human Subjects and Study Design
We identified patients (n = 82) who attended AUMC between 2003 and 2018. These patients underwent tumor excision or needle biopsies to diagnose renal masses identified in abdominal images. The inclusion criterion was a histological diagnosis of chRCC or RO. Core/fine needle biopsy samples were excluded. Power analysis showed that six samples per group were required to identify gene expression differences between chRCC and RO for COGS genes (β = 1.65, 1 − β = 0.8, α = 0.05). Based on inclusion criteria.
We retrieved archived formalin-fixed paraffin-embedded (FFPE) tissues diagnosed as chRCC (n = 17) or RO (n = 15) from the Department of Pathology at AUMC. The tissue samples (n = 32) had been surgically excised from renal tumors and embedded in paraffin after fixing with formalin (FFPE); 5µm thick sections were taken for histological diagnosis (Figure 1).
Three AUMC Pathologists examined the selected sample set and identified the tumor region on FFPE blocks while also evaluating the matching H&E slides.
Demographic and clinical data were collected using the Augusta University Cancer Registry and validated through the patients' electronic health records. The study was conducted according to the Declaration of Helsinki (1997, revised in 2013) and approved by the Institutional Review Board at Augusta University (Biomarkers and Therapeutics in Cancer, 611205-49). Three AUMC Pathologists examined the selected sample set and identified the tumor region on FFPE blocks while also evaluating the matching H&E slides.
Demographic and clinical data were collected using the Augusta University Cancer Registry and validated through the patients' electronic health records. The study was conducted according to the Declaration of Helsinki (1997, revised in 2013) and approved by the Institutional Review Board at Augusta University (Biomarkers and Therapeutics in Cancer, 611205-49).

Coring, RNA Extraction, and Quality Control
Tumor-rich region marked by the pathologist was cored using a 2 mm biopsy punch (Integra Miltex, York, PA, USA). Paraffin was removed from the tissue core by treatment with Citrisolve (Fisher Scientific, Piscataway, NJ, USA) at 65 °C. The tissue was then homogenized in 10 mM Tris HCl-EDTA (pH 7.4) by mechanical disruption. According to the manufacturer's instructions, RNA from lysates was extracted using an RNA extraction kit designed for FFPE tissue (RNEasy, Qiagen, Hilden, Germany). RNA quality, quantity, and concentration were checked using Qbit and TapeStation 2200 (Supplemental Figure S1).

Quantification of Gene Expression Data
Custom nCounter assay consisting of gene expression code set for 33 genes (including three housekeeping genes) was synthesized (NanoString Technology Inc., Seattle, WA, USA) [12]. Two hundred nanograms of RNA/sample were hybridized into the reporter probe. Data capture was performed on a NanoString nCounter Digital Analyzer (NanoString Technology Inc., Seattle, WA, USA) and exported as reporter code count (RCC) files. These RCC files were analyzed using nSolver analysis software (Version 4.0.70, NanoString Technology Inc., Seattle, WA, USA) for quality control purposes (imaging, binding density, positive spike in control, and limit of detection).
The output from nSolver software was read into R for data pre-processing. The data was initially normalized by their concentration, followed by background thresholding. It was normalized first by the geometric mean of the code set's internal positive controls (Supplemental Figure S2A,B) and then the geometric mean of the housekeeping genes included in our assay (HPRT1, LDHA, and TBP) (Supplemental Figure S2C,D). The data was then log2-transformed and subjected to our machine learning models described below. We calculated the optimum cut point for each gene by calculating sensitivity plus specificity at each percentile. The value at which sensitivity plus specificity is maximum

Coring, RNA Extraction, and Quality Control
Tumor-rich region marked by the pathologist was cored using a 2 mm biopsy punch (Integra Miltex, York, PA, USA). Paraffin was removed from the tissue core by treatment with Citrisolve (Fisher Scientific, Piscataway, NJ, USA) at 65 • C. The tissue was then homogenized in 10 mM Tris HCl-EDTA (pH 7.4) by mechanical disruption. According to the manufacturer's instructions, RNA from lysates was extracted using an RNA extraction kit designed for FFPE tissue (RNEasy, Qiagen, Hilden, Germany). RNA quality, quantity, and concentration were checked using Qbit and TapeStation 2200 (Supplemental Figure S1).

Quantification of Gene Expression Data
Custom nCounter assay consisting of gene expression code set for 33 genes (including three housekeeping genes) was synthesized (NanoString Technology Inc., Seattle, WA, USA) [12]. Two hundred nanograms of RNA/sample were hybridized into the reporter probe. Data capture was performed on a NanoString nCounter Digital Analyzer (NanoString Technology Inc., Seattle, WA, USA) and exported as reporter code count (RCC) files. These RCC files were analyzed using nSolver analysis software (Version 4.0.70, NanoString Technology Inc., Seattle, WA, USA) for quality control purposes (imaging, binding density, positive spike in control, and limit of detection).
The output from nSolver software was read into R for data pre-processing. The data was initially normalized by their concentration, followed by background thresholding. It was normalized first by the geometric mean of the code set's internal positive controls (Supplemental Figure S2A,B) and then the geometric mean of the housekeeping genes included in our assay (HPRT1, LDHA, and TBP) (Supplemental Figure S2C,D). The data was then log2-transformed and subjected to our machine learning models described below. We calculated the optimum cut point for each gene by calculating sensitivity plus specificity at each percentile. The value at which sensitivity plus specificity is maximum is called the optimum cut point. This data was then used as the "test" dataset in the machine learning models.

Discovery Data, Train-Validation-Test Set, Supervised Models
COGS was developed from a meta-analysis dataset from multiple GEO studies containing 53 chRCC and 36 RO arrays [11]. This dataset was split randomly by 80:20 into train and validation sets for unsupervised and supervised learning with the caret R package [13]. "Test" dataset contains the entire NanoString dataset for COGS genes, 30 samples Unsupervised learning models were developed with UMAP and Hierarchical clustering. UMAP was implemented as a feature extraction algorithm to develop two components from COGS expression in the training dataset [14]. As UMAP is a stochastic algorithm, it was checked with 100 iterations by visual inspection of cluster formation, each with a different random seed. The hierarchical model was developed using the "Euclidean" distance and "ward.D2" clustering method. For the supervised UMAP model, a supervised embedding was developed using the train set to identify the internal structure of the training dataset. This supervised embedding was developed as a form of feature engineering and classification models were built on that new space. Test data was passed to this embedding for the classification of the samples.
Multiple supervised models (random forest, generalized linear model with L2 regularization, and support vector machine) were trained on COGS training data [11] with three repeats of 10-fold cross-validation and tested on Test data. These models were selected based on their performance in previous gene expression studies in the literature [15][16][17]. The discovery data introduced a UMAP manifold for supervised UMAP models and tested on NanoString data. NanoString data's UMAP projection was used to cluster the data. A contingency matrix was created for all models' results and performance metrics (Sensitivity, specificity, positive and negative predictive value) were calculated. The p-value for the metrics was estimated by bootstrap distribution where applicable. Classification metrics, i.e., sensitivity, specificity, positive predictive value, negative predictive value, and accuracy, were calculated by constructing a confusion matrix for each model's true positive, false positive, true negative, and false-negative rate.

Statistical Methods
All statistical analyses were performed using the R language and environment for statistical computing (v4.1.2; R Foundation for Statistical Computing, Indianapolis, IN, USA). Software packages were used for cutpointr, caTools, UMAP, ComplexHeatmap, Caret, and tidy models in the R environment. Continuous data are presented as mean and standard deviation, and differences in the group means were tested using Student's t-test. Categorical variables are presented as count and percentages and differences in count data were evaluated using Chi-square tests. All p-values were two-sided and a p < 0.05 was considered significant.

Human Subject Demographics
The demographic information on subjects diagnosed with chRCC (n = 17) and RO (n = 15) is presented in Table 1. The mean age of onset of chRCC and RO was 57.8 (SD = 8.51) and 64.3 (SD = 13.9) years. Patients with RO patients significantly are older than chRCC (p-value = 0.003). There is a higher number of females in RO; overall, there were no significant differences in the distribution of males and females in both tumor types ( Table 1). All chRCC tumors were staged. The majority of the cases were Stage I (64%), whereas Stage II and III were 17% each. RO cases were not staged. The race breakdown for chRCC was 70% Caucasian and 29% African American, while RO was 60% Caucasian and 33% African American.

Univariate Analysis
We performed a univariate AUC analysis for differentiating chRCC and RO with count difference. The sensitivity was 1 for nine genes (AQP6, The maximum is 1 (AQP6) for accuracy, and the median is 0.90. Amongst all the COGS genes, AQP6 has the highest sensitivity, specificity, AUC, and accuracy. The full table is presented in Table 2.

Unsupervised and Supervised Machine Learning Models
As single markers can fail to differentiate chRCC and RO, we implemented multivariate models using unsupervised learning. We implemented UMAP and hierarchical clustering to see the combined markers' ability to differentiate these tumors. UMAP analysis on NanoString count data showed two clusters, and 29/30 samples were correctly identified (Supplemental Figure S3). Then we developed a supervised UMAP model using COGS discovery data as the training set by setting parameters to n_neighbors = 15, epoch = 200, min_dist = 0.1, init = "spectral", metric = "Euclidean" (Figure 2A) [11]. This trained model was tested on the AUMC cohort. This trained UMAP model created two clusters congruent with their histology (30/30 samples are correctly classified) ( Figure 2B). Hierarchical clustering showed 100% congruency with their histological classification ( Figure 2C).

Unsupervised and Supervised Machine Learning Models
As single markers can fail to differentiate chRCC and RO, we implemented multivariate models using unsupervised learning. We implemented UMAP and hierarchical clustering to see the combined markers' ability to differentiate these tumors. UMAP analysis on NanoString count data showed two clusters, and 29/30 samples were correctly identified (Supplemental Figure S3). Then we developed a supervised UMAP model using COGS discovery data as the training set by setting parameters to n_neighbors = 15, epoch = 200, min_dist = 0.1, init = "spectral", metric = "Euclidean" (Figure 2A) [11]. This trained model was tested on the AUMC cohort. This trained UMAP model created two clusters congruent with their histology (30/30 samples are correctly classified) ( Figure 2B). Hierarchical clustering showed 100% congruency with their histological classification ( Figure  2C).  Although unsupervised models are a powerful tool for identifying data structure in high dimensional space, they are not ideal for testing a new sample. Therefore, we compared four supervised learning models, viz., random forest, support vector machine, and a generalized linear model with L2 regularization trained with COGS' discovery data with 10-fold cross-validation with three repeats. The four models were compared (Table 3) by sensitivity, specificity, positive predictive value, negative predictive value, and accuracy. Supervised UMAP showed to be most accurate in classifying the chRCC and RO based on their COGS expression (Sensitivity = 1.0, Specificity = 1.0, PPV = 1.0, NPV = 1.0 and Accuracy = 1.0). As Supervised UMAP did not have any misclassification, the p-value is 0, and the 95% confidence interval cannot be calculated. Random forest models performed better than the other SVM and GLMNet models. For random forest models, a minimum of 250 trees were needed to achieve an accuracy of 1 ( Figure 3A). A sample tree is presented in Figure 3B. Although support vector machines are popular in classification analysis, they did not reach statistical significance in our dataset ( Figure 3C, Table 3). Coefficients for the generalized linear model with ridge regression are presented in Figure 3D. All the supervised models were used to create a confusion matrix, and sensitivity, specificity, positive predictive value, negative predictive value, and accuracy were calculated (Table 3). All the p-values were calculated by accuracy over no information rate for the models.

H&E Scoring between Pathologists
Three independent AUMC pathologists evaluated H&E for the selected sample set (n = 32). Two of the three pathologists had one discrepancy in diagnosis from the original clinical diagnosis, while the third pathologist had two discrepancies. We developed a linear mixed model with clinical diagnosis as the fixed effects and H&E and scorer as the random effects. The model showed that H&E accounts for 0.918 (p-value <0.001) of the variation in the diagnosis of these cases. The student t-test between the clinical diagnosis and pathologist impression or between the pathologist impression did not show any difference (p-value > 0.05). All the raw data is provided in Supplemental Table S1. Cancers 2022, 14, x 8 of 12

H&E Scoring between Pathologists
Three independent AUMC pathologists evaluated H&E for the selected sample set (n = 32). Two of the three pathologists had one discrepancy in diagnosis from the original clinical diagnosis, while the third pathologist had two discrepancies. We developed a linear mixed model with clinical diagnosis as the fixed effects and H&E and scorer as the random effects. The model showed that H&E accounts for 0.918 (p-value <0.001) of the variation in the diagnosis of these cases. The student t-test between the clinical diagnosis and pathologist impression or between the pathologist impression did not show any difference (p-value > 0.05). All the raw data is provided in Supplemental Table S1.

Discussion
In this study, we tested the potential of COGS to differentiate chRCC and RO renal tumor types (Supplemental Table S2). We also developed a single-molecule counting assay in a single institutional cohort. Our univariate analysis demonstrates that the majority of the genes in the COGS signature shows expression level difference between the tumor types and can serve as potential biomarkers to differentiate these tumors ( Table 2). We showed that the COGS profile between these tumors is distinct in unsupervised models (UMAP and hierarchical clustering) ( Figure 2B,C). To classify future samples, we developed and compared four different supervised models (random forest, support vector machine, generalized linear model, and supervised UMAP) ( Figure 3A,D) and identified supervised UMAP outperforming all the other models in classifying chRCC and ROs (Table  3). This workflow can be implemented in clinical settings to quantify and correctly classify future samples suspected of chRCC or RO.

Discussion
In this study, we tested the potential of COGS to differentiate chRCC and RO renal tumor types (Supplemental Table S2). We also developed a single-molecule counting assay in a single institutional cohort. Our univariate analysis demonstrates that the majority of the genes in the COGS signature shows expression level difference between the tumor types and can serve as potential biomarkers to differentiate these tumors ( Table 2). We showed that the COGS profile between these tumors is distinct in unsupervised models (UMAP and hierarchical clustering) ( Figure 2B,C). To classify future samples, we developed and compared four different supervised models (random forest, support vector machine, generalized linear model, and supervised UMAP) ( Figure 3A,D) and identified supervised UMAP outperforming all the other models in classifying chRCC and ROs (Table 3). This workflow can be implemented in clinical settings to quantify and correctly classify future samples suspected of chRCC or RO.
The current clinical workflow to differentiate these tumors is based on H&E impressions and certain IHC markers. H&E with or without IHC suffers in sensitivity and specificity because there is a lacking of definitive features on H&E or the overlapping expression of the immunohistochemical markers. This is even more difficult in biopsy cases where there is limited tissue. CK7 is the most widely used marker specific for chRCC with variable sensitivity (0.73-1.0) and specificity (0.84-1.0) [18,19]. S100A1 is the specific IHC marker for RO with variable sensitivity (0.87-1.0) and specificity (0.7-1.0) [20,21]. The variability in these markers can leave up to 10% of cases to be misdiagnosed by traditional clinical workflow. IHC staining can be subjective and requires optimization as the reading is dependent on the quality of the staining [22]. In comparison, gene expression workflows follow a strict protocol, have high quality-control measures, and require a smaller amount of tissue. In this study, we used a 2 mm punch biopsy core (~3-4 mm in length) to collect tumor tissue which is a smaller amount of tissue than core biopsies used in the current clinical workflow [23]. Our workflow requires 72 h from tissue to RNA quantification, which is comparable to pathologist evaluation, IHC processing, and optimization. Our NanoString validation pipeline correctly identified 100% of the cases analyzed in this study. This limits the possibility of inconclusive biopsy results, repeat biopsies [24], as well as inadequate treatment for chRCC or unnecessary surgical intervention for RO cases. This workflow can overcome the obstacles such as inconclusive biopsy results or interobserver variability between pathologists with the integration of supervised learning. This will significantly improve the patient's prognosis and lower the economic burden as it will lower unnecessary surgical intervention, i.e., partial or total nephrectomy. AQP6, HOOK2, AP1M2, ESPR1, and CLDN from the COGS panel were already suggested in the previous studies as potential biomarkers [25,26]. We report AQP6, AP1M2, LIMS1, PRDX3, SPINT2, BSPRY, ELMO3, ESPR1, HOOK2, ITGB3 (sensitivity > 0.93, specificity > 0.93, AUC > 0.96, and accuracy > 0.93) as top-performing genes in our study and can serve as potential transcriptomic biomarkers between these tumors.
A limitation of using bulk transcriptomics is that the expression profile includes tumor cells and stromal cells, immune cells, etc. Future studies are needed to identify the differences between tumor cells themselves in chRCC and RO and incorporate stroma and immune-specific genes into the signature. Patients with RO were older in age compared to chRCC as the peak age of detection of RO is the seventh decade, whereas peak chRCC detection is the sixth decade [27]. Another limitation was due to the FFPE RNA being highly fragmented, which resulted in some genes not showing the observed differences in our discovery data, such as LAMA1 and MSH2. To work around the fragmentation issues, multivariate models were implemented as they account for such variation due to technical or biological reasons. Although SVM models are one of the most well-understood machine learning algorithms, in this study, they did not reach statistical significance. This could have stemmed from the smaller sample size (n = 30) collected from a single institution. Therefore, an additional study is needed to evaluate these findings in a larger sample set collected from multiple sites.
The strength of this study is the direct clinical application. The supervised model training was on bulk transcriptomic data from multiple GEO studies previously validated in microarray and RNASeq platform, which are now translated into the NanoString platform. Therefore, quantifying the expression of COGS genes assayed on any platform can be conducted and followed up by the implementation of the machine learning models. This workflow can be performed directly on biopsies by extracting RNA and quantifying it using the NanoString platform. Our method has shown an overall greater accuracy (Supervised UMAP, accuracy = 1.0) in comparison to the current clinical workflow [18,20,21,28].
We have shown that the single-molecule counting on the nCounter platform working in conjunction with our machine-learning pipeline can be a viable clinical assay for differentiation of chRCC from RO. This approach has been shown to predict therapeutic outcomes in breast cancer [29], diagnosis of small B-cell lymphoid neoplasm [30] cancer, and classification of gliomas [31].

Conclusions
We report a pipeline that uses gene expression profiling by single-molecule counting to differentiate chRCC from RO, which can be an important clinical tool in renal cancer diagnosis.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/cancers14133242/s1, Figure S1: RNA quality control on Tapestation 2200. A: Gel images showing RNA quality. The bottom number represents RIN (RNA integrity number). B: Showing a representative sample for RNA fragments. The green horizontal line represents the target range for hybridization. The RNA concentration was calculated for this range for each sample; Figure S2: Positive control and housekeeping gene normalization using geometric mean. A: Before normalization of the positive control. B: After positive control normalization. C: Before housekeeping gene normalization. D: After housekeeping gene normalization; Figure S3: Uniform manifold approximation and projection (UMAP) analysis showing two clusters. Raw count data was normalized with internal controls and log2 was transformed prior to UMAP analysis. Histology 1 represents chRCC and 2 represents RO samples; Table S1: Comparison of original hematoxylin-eosin stained sections at the time of diagnosis with a re-review of the slides by three independent pathologists. The identity and diagnosis were blinded to be reviewed by pathologists; Table S2: Functional description for the COGS (Chromophobe-Oncocytoma Gene Signature) genes. Funding: This work is supported by an institutional grant for the Biomarkers and Therapeutics for Cancers (BAT) study. P.M.H.T. was supported by the NIH/NIDDK fellowship (F30DK121461). Institutional funds also provided Article-processing charges for publishing this article.

Institutional Review Board Statement:
The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board of Augusta University (protocol code 611205-37 and approved on 3 April 2020).

Informed Consent Statement:
The tissue blocks and the phenotype data collected were de-identified. Therefore, the patient consent was waived by the IRB.