# Raman Spectroscopy of Head and Neck Cancer: Separation of Malignant and Healthy Tissue Using Signatures Outside the “Fingerprint” Region

## Abstract

## 1. Introduction

## 2. Experimental Setup

#### 2.1. Tissue Samples

#### 2.2. Apparatus

## 3. Principal Component Analysis

**X**(M × N) is reduced and may be expressed as

**T**(M × A) is the scores matrix and

**P**(A × N) is the loadings matrix, and

**E**is the error matrix. The principal components were computed using the princomp function in MatLab. In general, for discrimination purposes, we retain those principal components that contained 99% of the information content.

## 4. Results

## 5. Summary and Conclusions

**Figure 1.**Schematic of the experimental setup showing the 785 nm laser directed into the Raman probe via the 10× objective lens. The probe illuminates the tissue sample and collects the scattered light. The elastically scattered signal is removed via a long pass filter in the filter/lens assembly before the light is transmitted into the Maya Pro 2000 NIR spectrometer for dispersion and storage.

**Figure 2.**Representative Raman spectra from two different samples, one healthy and one cancerous over the entire ∼4000 ${\mathrm{cm}}^{-1}$ Raman shift signal. The peaks that show differences between the healthy tissue and the cancerous tissue are indicated by the arrows.

**Figure 3.**The difference spectrum obtained by subtracting the cancerous spectrum from the healthy spectrum in Figure 2. This spectrum highlights the peaks indicated above.

**Figure 4.**Plot of the first two principal component scores for (

**A**) for the full spectrum (100 ${\mathrm{cm}}^{-1}$–4300 ${\mathrm{cm}}^{-1}$) and (

**B**) the conventional fingerprint (400 ${\mathrm{cm}}^{-1}$–1800 ${\mathrm{cm}}^{-1}$). Both plots show good separation between the healthy controls and the malignant tissue samples (both tonsil squamous cell carcinoma and squamous cell carcinoma). The numbers in parentheses represent the information content associated with each principal component.

**Figure 5.**Plot of the second and third principal component scores for (

**A**) for the full spectrum (100 ${\mathrm{cm}}^{-1}$–4300 ${\mathrm{cm}}^{-1}$) and (

**B**) the conventional fingerprint (400 ${\mathrm{cm}}^{-1}$–1800 ${\mathrm{cm}}^{-1}$). The full spectrum analysis reveals a distinct boundary between the healthy and diseased tissue, however no obvious separation of the data is observed when looking at the conventional fingerprint region. This increased separation is due to the peaks observed outside the conventional fingerprint regime. The numbers in parentheses represent the information content associated with each principal component.

**Figure 6.**Plots of the loadings for the first three principal components. The loadings for the first is dominated by a large Rayleigh peak near the 0 ${\mathrm{cm}}^{-1}$ shift, but small peaks can be seen further out. The loadings for the second and third principal components clearly show the peaks and contribute strongly to the discrimination capability of the spectra.

**Figure 7.**(

**A**) Plot of the first two principal component (PC) scores for only the unstained tissue samples. The analysis was performed on the full spectral data, and show good separation between healthy and diseased tissue. The numbers in parentheses represent the information content associated with each principal component. (

**B**) Corresponding loadings plots for the PCs shown in (

**A**).

**Figure 8.**(

**A**) Plot of the first two PC scores for only the unstained tissue samples. The analysis was performed in the Raman shift region from 400 ${\mathrm{cm}}^{-1}$–4300 ${\mathrm{cm}}^{-1}$ and show good separation between the healthy and diseased tissue classes. The numbers in parentheses represent the information content associated with each principal component. (

**B**) Corresponding loadings plots for the PCs shown in (

**A**).

**Figure 9.**(

**A**) Plot of the first two PC scores for only the unstained tissue samples. The analysis was performed in the Raman shift region from 400 ${\mathrm{cm}}^{-1}$–4300 ${\mathrm{cm}}^{-1}$ and show good separation between the healthy and diseased tissue classes. The numbers in parentheses represent the information content associated with each principal component. (

**B**) Corresponding loadings plots for the PCs shown in (

**A**).

**Figure 10.**(

**A**) Plot of the first two PC scores for the unstained tissue samples. The analysis was performed in the Raman shift region from 400 ${\mathrm{cm}}^{-1}$–4300 ${\mathrm{cm}}^{-1}$ and show good separation between the healthy and diseased tissue classes. The numbers in parentheses represent the information content associated with each principal component. (

**B**) Corresponding loadings plots for the PCs shown in (

**A**).

**Figure 11.**(

**A**) Plot of the first two PC scores for all (stained and unstained) the tissue samples. The analysis was performed in the Raman shift region from 1800 ${\mathrm{cm}}^{-1}$–4300 ${\mathrm{cm}}^{-1}$ and continues to show good separation between the healthy and diseased tissue classes despite the inclusion of the stained tissue samples. The numbers in parentheses represent the information content associated with each principal component. (

**B**) Corresponding loadings plots for the PCs shown in (

**A**).

Patient | Tissue ID | Sex/Age | Diagnosis |
---|---|---|---|

005094 | S01A | F/53 | Control |

005094 | S01B | F/53 | Tonsil SCC |

005112 | S01A | M/49 | Tonsil SCC |

005118 | S01A | M/49 | Tonsil SCC |

005120 | S01A | M/70 | Control |

005120 | S01B | M/70 | Control |

005120 | S01C | M/70 | SCC |

005120 | S01D | M/70 | SCC |

005120 | S01E | M/70 | SCC |

