Assessing Various Control Samples for Microarray Gene Expression Profiling of Laryngeal Squamous Cell Carcinoma

Selection of optimal control samples is crucial in expression profiling tumor samples. To address this issue, we performed microarray expression profiling of control samples routinely used in head and neck squamous cell carcinoma studies: human bronchial and tracheal epithelial cells, squamous cells obtained by laser uvulopalatoplasty and tumor surgical margins. We compared the results using multidimensional scaling and hierarchical clustering versus tumor samples and laryngeal squamous cell carcinoma cell lines. A general observation from our study is that the analyzed cohorts separated according to two dominant factors: “malignancy”, which separated controls from malignant samples and “cell culture-microenvironment” which reflected the differences between cultured and non-cultured samples. In conclusion, we advocate the use of cultured epithelial cells as controls for gene expression profiling of cancer cell lines. In contrast, comparisons of gene expression profiles of cancer cell lines versus surgical margin controls should be treated with caution, whereas fresh frozen surgical margins seem to be appropriate for gene expression profiling of tumor samples.


Introduction
A major hurdle in the analysis of laryngeal squamous cell carcinoma (LSCC), is the proper selection of non-tumor controls for comparative analysis. This malignant neoplasm derives from the squamous epithelium of the upper aerodigestive tract, and therefore, cells of this origin are routinely used as controls in LSCC expression profiling. However, the biopsying of healthy individuals with the aim to obtain control tissues is unfeasible due to ethical reasons, hence, the collection of such tissues is limited to post mortem biopsying or surgical approaches. These include non-tumor oral and oropharyngeal epithelial tissues, obtained via uvulopalatopharyngoplasty (UPPP) of patients with obstructive sleep apnea [1][2][3], wisdom tooth extraction or frenectomy. Moreover, a widely used control source for LSCC expression studies includes tumor free surgical margins, obtained during the treatment of cancer patients [4][5][6][7].
Another type of control sample offered by several companies includes epithelial cell cultures obtained by bronchial brushings or by isolation of epithelial cells from cadaveric donations or non-tumor oral keratinocytes immortalized by transduction with retroviral vectors containing telomerase reverse transcriptase (hTERT) [8,9]. These cell lines derive from the oral epithelium, such as gingiva or buccal oral mucosa, which may be obtained during routine dental surgeries. Importantly, gene expression profiling of these tissues carries a bias due to long term cell culture in artificial conditions.
To address this issue, we used multidimensional scaling (MDS) [10] and hierarchical clustering [11]. MDS allows to detect latent variables from a previously obtained distance matrix, thereby revealing potential similarities and differences, which are not directly observed, of the particular dataset. Whereas, the basic idea of hierarchical clustering is to create specific groups (clusters) based on the similarity of the analyzed samples. The obtained clusters are further used to build a hierarchy which may be visualized as a dendrogram. By these methods, we compared four types of control samples (human bronchial epithelial cells, human tracheal epithelial cells, normal squamous cells and tumor surgical margins) being a frequent choice in LSCC gene expression profiling. These controls were compared versus primary tumor samples obtained from LSCC patients by surgical resection, as well as laryngeal squamous cell carcinoma (LSCC) cell lines. Our findings are potentially helpful in the selection of most suitable controls in gene expression profiling of head and neck tumors.

RNA Isolation and Microarray Analysis
Total RNA was isolated using the Trizol reagent as described previously [16]. After removing the culture medium, the cells were immediately suspended in Trizol followed by cell lysis. Phase separation was performed by adding chloroform and RNA was precipitated by isopropanol and further resuspended in water with diethyl pyrocarbonate (DEPC). RNA integrity was measured using Agilent RNA 6000 Nano chip and Agilent 2100 Bioanalyzer, only samples with RIN > 7.2 were analyzed. Total RNA was shipped to ATLAS Biolabs (Berlin, Germany) to perform GeneChip Human Genome U133 Plus 2.0 Array (Affymetrix, Santa Clara, CA, USA) profiling. Microarray experiments were performed in three runs the resulting CEL files were normalized together using the MAS5 algorithm (mas5 function from affy R package) [17].

Bioinformatics
Global comparison of the expression profiles was performed using the MDS method with Manhattan distance calculation and hierarchical clustering using Ward's method. The analysis was performed using the R packages base, stats, reshape2 [18,19] and the plots were prepared using the ggplot2 R package [20]. In order to establish the distances between the centroids of the analyzed cohorts, the Euclidean distances (Ed) were calculated. These values served as a numerical representation of the level of similarity among studied groups. For better clarity, all described Ed values are presented in simplified format (without scientific notation-e+06).

Results and Discussion
A general observation emerging from our study is that the analyzed cohorts separated according to two dominant factors that we called "malignancy", which separated controls from malignant samples along dimension one and "cell culture", which reflected the differences between cultured and non-cultured samples along dimension two. However, the separation along dimension two is probably also caused by the "microenvironment" that separated samples composed almost completely of epithelial cells (HBEpiC, HTEpiC) from samples with an admixture of other cell types (surgical margins, tumor samples and to some extent, LAUP).
As expected, the gene expression profiles of the LSCC tumors significantly differed from all control groups and were presented as a distinct population in the MDS analysis (Ed range 14.6-24) (Figures 1-3). Interestingly, LSCC cell lines and tumors were relatively distant (Ed 15.8), which further stresses the necessity to use adequate controls for each of these samples.
An interesting finding of our study is the level of heterogeneity, observed among the LSCC cell lines and tumor samples. The level of heterogeneity within these samples exceeds the heterogeneity observed in the control epithelial cells (HBEpiC, HTEpiC) or surgical margins, respectively. This result underscores the known characteristic of high heterogeneity of LSCC tumors.
As expected, the bronchial and tracheal epithelial cells showed a high level of similarity in terms of gene expression profiles (Ed = 3). However, the LAUP samples only partially overlapped with the area occupied by HTEpiC and HBEpiC and, importantly, were located much closer to the surgical margins and tumor samples (Ed = 15.7; 22) than the HTEpiC and HBEpiC controls. This is probably caused by the admixture of fibroblasts and other nonepithelial cells in the LAUP samples. Importantly, there was a strong dissimilarity in gene expression profiles between cultured epithelial cells (HBEpiC, HTEpiC, LAUP) and surgical margins, indicating that the latter, although clearly different from the tumor samples and LSCC cell lines (Ed = 19.3; 24), are also a distinct entity from the other tested epithelial controls. Surgical margins were also closest to malignant samples along dimension one. This finding is interesting, especially in the sense that several authors reported that the apparently tumor cell free margins can harbor cancer cells and that epithelial cells in the margin can harbor epigenetic changes, predisposing them to tumor formation [21,22]. Additionally, the LAUP controls are shifted towards "malignancy" along dimension one which might reflect, for example, a smoking signature that mimics the neoplastic cells. Primary tumors, in turn, formed a group that was significantly separated from all other samples but clearly characterized by the shortest distance to LSCC cell lines (Ed = 15.8) and surgical margins (Ed = 19.3) indicating that features of both of these gene expression profiles may be found within this group. Moreover, the observed similarity in gene expression profiles of tumor samples and surgical margins along dimension two, in addition to the fact that both sample types were not cultured, might also reflect the influence of the microenvironment that HBEpiC, HTEpiC and cell lines lack.

Conclusions
Based on the presented results, we advocate the use of cultured HBEpiC and HTEpiC cells, and presumably also other types of cultured epithelial cells, as the ideal controls for gene expression profiling of LSCC cell lines. HBEpiC and HTEpiC controls are composed of a pure population of epithelial cells and as such, may be recognized as the normal counterparts of LSCC cells. Additionally, our data suggest that comparisons of gene expression profiles of cancer cell lines versus surgical margins should be treated with caution. Conversely, fresh frozen surgical margins and other controls obtained by surgical intervention seem to be appropriate for gene expression profiling of tumor samples as they allow one to eliminate the cell culturing bias. Moreover, the application of matched non-cancerous tissue seems more appropriate than a cohort of surgical margins collected from non-matched donors.

Informed Consent Statement:
The Institutional Ethical Review of the University of Medical Sciences approved tissue collection (no. 904/06), and informed consent was obtained from the patients.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.