Head and neck squamous cell carcinoma is currently ranked as the 6th most common cancer in the world. There are an estimated 650,000 incident cases and over 300,000 resultant deaths from head and neck squamous cell carcinoma (SCC) annually [1
]. Human Papillomavirus (HPV) has increasingly been recognized as a major risk factor in this type of cancer, especially in oropharyngeal squamous cell carcinoma. Two high-risk strains of HPV (16,18) are oncogenic and are responsible for 12.8%–59.9% of head and neck squamous cell carcinomas [2
]. Removal of malignant tissue requires a careful balance of options: remove too much and it may negatively impact the patient’s quality of life, or remove too little and the cancer will return. Therefore, accurate identification of cancer margins, ideally in vivo, is crucial to achieving this balance and optimizing treatment. Currently, a pathologist is required to determine whether or not the surgical margins are positive or negative, which takes a significant amount of time.
In an effort to improve margin detection and patient outcomes optical diagnostics are becoming more widely used [4
]. Spectroscopic techniques such as fluorescence [5
] and Raman spectroscopy [7
] have emerged as potent cancer diagnostic tools. Raman spectroscopy, a label-free approach, does not require special dyes or other, potentially toxic, probes (e.g., quantum dots and carbon nanotubes [11
]) to elicit signal. Although Raman scattering signals can be weak, it does offer molecular specific detection in the vicinity of the probe.
Raman spectroscopy has been used extensively as a cancer diagnostic tool for ex vivo and minimally invasive in vivo analyses of affected tissues for more than two decades [12
]. Despite the wide-spread applicability of Raman spectroscopy, our focus is specifically on cancers of the head and neck. Improvements and innovations in instrumentation and computational approaches for discrimination coupled with the growing incidence of head and neck cancers has resulted in numerous studies in recent years [13
]. Much of this work has focussed on the conventional fingerprint region (400–1800
), corresponding to biochemically relevant spectral signatures, although there have been efforts that have examined higher wavenumber Raman shifts (>2000
] for additional discriminatory signatures. In particular, the high wavenumber Raman shifts offer reduced autofluorescence background signals while providing additional biochemical signatures useful in tissue characterization [20
]. Multivariate analyses have proven adept at enhancing spectral differences allowing for good separation among healthy and diseased tissue classes. However, the complexity of classification may be confounded by patient factors, such as tobacco and alcohol use, which can lead to variations in the observed spectra.
We have been investigating optical techniques that may be used for in vivo identification of cancer margins [21
]. The present work analyzes the Raman signatures from tissue samples (healthy and diseased) from patients who have undergone treatment at Mount Sinai Hospital in New York City. The Raman spectra were analyzed using principal component analysis in order to demonstrate discrimination between the cancerous samples and the healthy controls. Results from this study indicate that although the majority of the spectral signatures are nearly identical in our cancerous tissue and healthy controls, there are several peaks, including several previously unreported peaks
in the 130–400
Raman shift region, that proved useful in differentiating between the healthy and malignant samples. In addition, we observe systematic intensity differences among the spectra, where the intensity of the Raman signal (peaks + baseline) in the cancerous samples are identifiably higher than their corresponding control. In what follows, we will introduce the experimental arrangement (Section 2
), briefly discuss the multivariate analysis (Section 3
), and present our results (Section 4
The results from this study indicate that one can readily differentiate between cancerous and noncancerous tissue samples by analyzing the spectroscopic data obtained through Raman spectroscopy. Figure 2
shows representative spectra from healthy and malignant tissue obtained with the apparatus (Figure 1
) taken across many days and corroborated with different probe configurations. The probe employed for these experiments collected a small fraction of the scattering producing weak signals that required some post-processing. A 5 point smoothing function was applied to the data shown as well as a normalization to the 1000
raw value. The signal from our samples was nearly the same over the measured 4300
There are some notable differences in the two spectra that enable discrimination between the healthy and malignant tissue samples. The Raman peaks that played a key role in separating the tissue classes are marked with arrows in Figure 2
. Some of the marked peaks have been identified in the literature. Notably, within the conventional fingerprint region (400
), the band at 500
is associated with glycogen, and the peak at 780
is due to nucleic acids, specifically the nucleobase adenine [8
]. Note that the specific locations of the Raman shifts will vary slightly depending on the local environment (e.g., water content). The high frequency and low frequency peaks have not been identified, but are actually observed to undergo larger intensity changes and therefore provide greater discriminatory capability.
To accentuate the variations in the observed Raman spectra from the different tissue sample classes, a representative difference spectrum was generated from the two spectra presented in Figure 2
. Figure 3
shows the result. There is a general suppression across the entire Raman shift spectrum (i.e., the normalized healthy tissue samples show slightly greater intensity across the spectrum consistent with previous measurements on adenocarcinoma [27
]), but more importantly the differences of the discriminatory peaks are more noticeable. The two distinct lower frequency peaks at 130
as well as a broad peak near 200
are of interest because they are previously unreported, lie close to the fingerprint region, and proved useful in our statistical analysis. We will return to these peaks later. In addition, smaller, less distinctive differences are observed at 870
, which have been previously reported as arising from Tryptophan and C–C stretching, respectively [28
A shows a scatter plot of the first two principal component scores for the Raman spectra obtained from the patients listed in Table 1
using the full spectral signature. The PCA results show separation among the healthy and diseased classes. The diseased tissue class is comprised of two sub-classes: squamous cell carcinoma (SCC) and SCC specifically from tonsil tissue. These classes are represented in Figure 4
as green triangles (healthy controls), SCC (black squares), and tonsil SCC (red circles). One can see how there is a distinct differentiation between the location of the cancerous samples and their adjacent controls on this scatter plot, indicating that one can differentiate cancerous samples from non-cancerous samples by analyzing the PC scores of the spectroscopic data. For comparison we also plot the first two PC scores using a subset of the spectral data constrained within the conventional fingerprint region (Figure 4
B). The same convention is used for labeling the classes. The results from an analysis of this region also show good separation of the healthy and diseased tissue classes.
For the analysis of both the full spectrum and conventional fingerprint spectrum, shown in Figure 4
, there are five data points from healthy tissue resected from patient 5120 (Table 1
) that have been associated with the diseased tissue class. The remaining tissue samples from patient 5120 showed clear distinction between the healthy and diseased tissue. We suspect this particular tissue sample, despite being labeled a healthy control, is in fact diseased, or at least in the early stages of disease, though we cannot rule out mislabeling.
For the full spectral analysis the first PC represents most of the spectral information. This appears to be the result of a strong Rayleigh signature in the low Raman shift region. Because of this, we also plot the second and third PC scores for the full spectrum (Figure 5
A) and the conventional fingerprint region (Figure 5
B). These classes are again represented as green triangles (healthy controls), SCC (black squares), and tonsil SCC (red circles). Notably, the full spectrum analysis still shows a clear separation among the classes of healthy and diseased tissue. The same five data points from patient 5120 are readily seen to be associated with the diseased class in Figure 5
A. However, when only the conventional fingerprint region is analyzed using PCs 2 and 3, no clear distinction is observed among the tissue classes.
The separation observed in the full spectrum analysis has significant contributions from the peaks identified in the difference spectrum (Figure 3
). This can be seen by examining the principal component loadings. Figure 6
shows plots of the loadings for the first three principal components. Evident in all three plots are the peaks observed in Figure 3
whose strength differs among the healthy controls and the diseased tissue samples. While the loadings for the first PC is dominated by a large Rayleigh signal, the signature of these peaks is evident. We are currently engaged in improving the apparatus to eliminate any stray light entering the spectrometer and minimizing the Rayleigh peak.
A check of our results was performed to verify that the H & E stain was not impacting the spectroscopic discrimination of our results since the analysis shown in both Figure 4
and Figure 5
utilized both stained and unstained samples to increase the size of the data set. We were confident that we could analyze the mixed samples because H & E stain has low absorption at our operating wavelength (785 nm) and therefore should produce little fluorescence to interfere with the measured Raman spectra [29
]. Furthermore, Andronie, et al. concluded “that hematoxylin and eosin does not interfere in the tissue Raman signal” further bolstering our hypothesis that diagnostic information is still available [23
]. Performing the same preprocessing and PC analysis on the full spectrum with only the unstained samples resulted in the scores plots shown in Figure 7
A. The associated loadings are seen in Figure 7
B. As can be seen from these results, even though staining and fixation do alter the chemical composition of the tissue, clear separation among the healthy controls and diseased tissue samples is observed.
While the low frequency peaks appear to have diagnostic capability, we chose to reanalyze the data due to the presence of the Rayleigh peak that dominates
. Eliminating the low frequency region (<400
) we performed the same principal component analysis on the full data set (stained and unstained samples) as well as only the unstained samples in the region from 400
. This removes a significant portion of the Rayleigh peak while retaining the peaks observed in the 3500
region. Figure 8
shows the results for the unstained tissue samples. The scores for PCs 1 and 2 are shown in Figure 8
A and the corresponding loadings plots are shown in Figure 8
B. As before, the healthy and diseased tissue show a clear separation. In the loadings for
, a weak tail of the Rayleigh signal can be seen, but this appears to have a smaller role in this latest analysis compared to earlier. Figure 9
shows the results for all the data (stained and unstained). The scores for PCs 1 and 2 are shown in Figure 9
A and the corresponding loadings plots are shown in Figure 9
B. The separation of healthy and diseased tissue is still apparent, however, along the
axis, the distance between the classes appears narrower. This is likely due to the inclusion of the stained tissue samples and consequent chemical alteration due to staining and fixation. The same weak tail of the Rayleigh signal can be seen be seen in the loadings for
, but does not appear to affect the outcome.
Finally, we analyzed only the high frequency shift regime (1800
) for the unstained samples. Figure 10
A shows the scores plot while Figure 10
plots the corresponding loadings. Here the vast majority of the information is contained in
. Despite this, the separation among the diseased and healthy tissue is observed. However, it is noteworthy that the discrimination is largely a consequence of
. From the loadings plot, some small peaks are observed for
loadings are dominated by the peaks in the 3500
region, i.e., those high frequency peaks noted in the difference spectrum (Figure 3
, thus supporting our contention of the discriminating capability of these peaks. The same analysis was repeated on the data set including all (stained and unstained) tissue samples. The scores plot is shown in Figure 11
A. The corresponding loadings are shown in Figure 11
B and are dominated by the high frequency Raman shift peak in the region from 3500
. The separation between the diseased and healthy tissue classes in clear although the data is reoriented from the unstained case (Figure 10
) because both PCs have loadings that weight the high frequency peaks. Regardless, the mixing of stained and unstained tissue samples has not impacted the diagnostic capability despite any chemical alterations to the tissue due to staining and fixation.
In all cases, the same five data points addressed earlier score among the diseased tissue; however, our contention is that this sample was mislabeled. As we anticipated, the results of our analysis of only the unstained tissue samples are consistent with those presented early for the data set comprised of both stained and unstained samples.
5. Summary and Conclusions
We reported on the use of Raman spectroscopy at 785 nm to discriminate between healthy and cancerous tissue samples from patients with head and neck cancer, and the role that previously unreported signatures played. The observed spectral signatures were processed with a principal component algorithm to yield a clear delineation between the tissue classes. The discrimination was dominated by several previously unreported peaks in both low and high frequency regions. The low frequency peaks are of particular interest because their proximity to the fingerprint region.
The observed Raman spectra demonstrate distinct and previously unreported peaks in the low frequency shift signal region (130–400
) which lies outside the conventional “fingerprint” region (400
) but were important to discriminating between the cancerous and healthy tissue samples. The precise origin of these low frequency shifts remains unknown. While the 130
peak lies close to the cut-on wavelength of the filter it was observed to change between healthy and diseased tissue in numerous measurements. The observed changes seem to be indicative of a change within the tissue sample as it transitions from a healthy to a cancerous state. Low-frequency Raman shift signatures are typically associated with crystalline structures and are used in the pharmaceutical industry for dosage analysis [31
]. Morphological vibrations of whole virus have been of interest to researchers and also result in low-frequency Raman shifts [32
], however these shifts are extremely low and generally fall outside the region of our observations. While these peaks are not likely due to the HPV virus itself, the differences in the observed peaks could be indicative of the integration of viral DNA into the cell and the expression of oncogenes. One possible explanation for these peaks is that they correspond to nucleic acids, however, such a designation requires further study.
The observed signature in the 130
range is likely due to the changes the cells in the tissue sample undergo as the cancer develops. Measurements of both healthy and diseased cervical tissue show similar degradation of signal, particularly the glycogen signature, as the tissue becomes cancerous [25
]. Although the glycogen signature lies within the conventional fingerprint window and did play a role in this discrimination, there is no reason to believe that such changes do not also occur at lower Raman shift frequencies as other glyco- compounds do present in this region [35
Further study of this new spectral signature for the differentiation of healthy and malignant tissue is ongoing. Despite its unknown origin, the spectral differences between the tissues samples is sufficiently clear that multivariate statistical analyses demonstrate excellent discrimination capability, and offers the promise of a real-time tool for in vivo diagnostics which may ultimately improve surgical outcome and the patient’s quality of life.