Biochemical Proﬁles of In Vivo Oral Mucosa by Using a Portable Raman Spectroscopy System

: Most oral injuries are diagnosed by histopathological analysis of invasive and time-consuming biopsies. This analysis and conventional clinical observation cannot identify biochemi-cally altered tissues predisposed to malignancy if no microstructural changes are detectable. With this in mind, detailed biochemical characterization of normal tissues and their differentiation features on healthy individuals is important in order to recognize biomolecular changes associated with early tissue predisposition to malignant transformation. Raman spectroscopy is a label-free method for characterization of tissue structure and speciﬁc composition. In this study, we used Raman spectroscopy to characterize the biochemistry of in vivo oral tissues of healthy individuals. We investigated this biochemistry based on the vibrational modes related to Raman spectra of four oral subsites (buccal, gingiva, lip and tongue) of ten volunteers as well as with principal component (PC) loadings for the difference between the four types of oral subsites. Therefore, we determined the biochemical characteristics of each type of healthy oral subsite and those corresponding to differentiation of the four types of subsites. In addition, we developed a spectral reference of oral healthy tissues of individuals in the Brazilian population for future diagnosis of early pathological conditions using real-time, noninvasive and label-free techniques such as Raman spectroscopy. and L.F.d.C.e.S.d.C.; visualization, M.S.N. and L.F.d.C.e.S.d.C.; supervision, M.S.N. and L.F.d.C.e.S.d.C.; project administration, M.S.N. and L.F.d.C.e.S.d.C.; funding acquisition, L.F.d.C.e.S.d.C. authors


Introduction
Currently, there is an increasing need for techniques capable of providing biochemical characterization of tissues in real time. A range of applications requires development of those techniques in order to improve the accuracy of tissue identification, disease detection and surgical guidance. One of these applications is oral cancer diagnosis. During the progression of cancer, biochemical changes occur within the cancer cells [1], altering the levels of nucleic acids, lipids and carbohydrates that can serve as biomarkers for monitoring diseases [2][3][4][5][6][7][8][9].
Although histopathological examination is currently the most accurate and reliable method of diagnosis, this examination has several limitations. For example, surgical biopsies are invasive, require sample preparation and take a long time to analyze, which can cause anxiety and discomfort to patients, resulting in treatment delays. In addition, histopathological analysis is associated with interobserver variability. For all the aforementioned reasons, a non-invasive, real-time point-of-care method to detect and accurately diagnose cancer and premalignancies at early stages could benefit patients as well as decrease the risk of oral cancer incidence and mortality. One of the most cost-effective methods to diagnose cancer is the optical biopsy [1,3,[10][11][12][13][14][15][16][17][18]. The term optical biopsy is widely used in optical spectroscopy [6,14,[19][20][21][22][23][24], which assists in the diagnostic process and analyzes optical properties [19,[25][26][27] associated with tissue biochemistry [3][4][5][6][7][8][9]14,16,[23][24][25]. Among optical spectroscopic techniques, Raman spectroscopy is highlighted as one of the most molecular-specific methods which does not suffer interference from water absorption.
Raman spectroscopy is a non-invasive optical technique based on inelastic light scattering (Raman scattering), which changes the wavelength of the incident light depending on the structure of vibrational energy levels of tissue biomolecules such as lipids, proteins and nucleic acids [11,12,[52][53][54]. Raman spectroscopy has the ability to extract molecular-specific information of tissue constituents and their functional groups and molecular conformations based on the vibrational modes of tissue biomolecules [13,52,55]. This spectroscopic technique can provide a molecular-level signature of the biochemical composition and cell structure with submicrometric spatial resolution and can be useful for monitoring changes in composition for the diagnosis of early and non-invasive cancer in ex vivo and in vivo tissues. The qualitative and quantitative analysis of Raman spectra allows rapid detection of subtle biochemical changes during the onset of diseases (e.g., early tissue predisposition to malignant transformation) which cannot be identified with conventional clinical observation and other methods relying on tissue microstructural alterations [13,52,55]. Furthermore, early biochemical changes without microstructural manifestation may even be overlooked by the gold-standard histopathological analysis [56,57].
In our previous work [42] using the same raw data as this paper, we have characterized the biochemical content of each type of healthy oral subsite, and built a tissue classifier for comparison of these subsites based on Raman spectra in order to identify the correct tissue location for future comparison with potentially malignant tissues. However, the description of features of biological sources of tissue differentiation is not clear from average quantities of biochemical compounds reported in our previous study. Understanding of sources of tissue differentiation requires multivariate analysis specifically designed for feature extraction which is exploited in our present study.
With the above in mind, our aim in the present study is to contribute to the elucidation of the biochemical components identified from Raman spectra of in vivo normal oral tissues. In particular, we have identified the biochemical compounds associated with Raman vibrational modes most responsible for differentiation among buccal mucosa, lip, gingiva and tongue tissues. We believe that this article will serve as a basis for future studies using Raman spectroscopy to diagnose oral lesions.

Clinical Protocol and Research Ethics
The study was approved by the Research Ethics Committee of Universidade do Vale do Paraíba (UNIVAP) via submission to Plataforma Brasil Brazil (number 1132237-2015). Informed consent was obtained for all patients participating in the study. All methods involving human participants were carried out in accordance with relevant guidelines and regulations, including the ethical standards of the institutional and/or national research committee, and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Our study included 10 healthy volunteers. In these volunteers, we collected three Raman spectra of each oral subsite, including lip, buccal mucosa, tongue and gingiva. Therefore, the dataset of this study consists of 120 spectra in total (30 spectra per oral subsite).

Raman Spectroscopy Equipment and Data Collection
Our Raman spectroscopic measurements were performed by using a laser emitting in the 785 nm wavelength (60 mW of power) coupled to a fiber optic probe (EMVision, Loxahatchee, FL, USA) responsible for delivering the excitation light to biological tissues and sending the collected backscattered light to a Raman spectrometer (Kaiser Optical Systems imaging spectrograph Holospec, f/1.8i-NIR, Ann Arbor, MI, USA). Our probe comprised a central 100 µm fiber with a band-pass filter at its tip for 785 nm laser excitation laser surrounded by six 100 µm fibers with long-pass filter at the probe tip for collection of the diffuse reflected light. The excitation and collection fibers were separated following a Y style so that the diffuse reflected light which propagated through a tissue volume was sent to the detection module. In this module, this light passed through a dichroic mirror and a holographic notch filter before being focused on the entrance aperture of the spectrometer so that only the filtered Raman scattered light reached the spectrometer. In the spectrometer, the Raman signal was obtained by using a CCD detector (Andor IDUs 420 Series) with approximately 95% of quantum efficiency. All components of our Raman spectroscopy system including dichroic mirror, a holographic notch filter, bandpass filter and customized fiber optic probe were part of the commercial Holospec Raman spectrometer system and Andor Solis detector. The Raman spectroscopy instrumentation used to collect the Raman spectra tissue of this study is shown in Figure 1. In this study, Raman spectra were acquired through the average of 20 iterations (repetitions) of 2 s per spectrum.
Optics 2021, 2, FOR PEER REVIEW 3 Therefore, the dataset of this study consists of 120 spectra in total (30 spectra per oral subsite).

Raman Spectroscopy Equipment and Data Collection
Our Raman spectroscopic measurements were performed by using a laser emitting in the 785 nm wavelength (60 mW of power) coupled to a fiber optic probe (EMVision, Loxahatchee, FL, USA) responsible for delivering the excitation light to biological tissues and sending the collected backscattered light to a Raman spectrometer (Kaiser Optical Systems imaging spectrograph Holospec, f/1.8i-NIR, Ann Arbor, MI, USA). Our probe comprised a central 100 μm fiber with a band-pass filter at its tip for 785 nm laser excitation laser surrounded by six 100 μm fibers with long-pass filter at the probe tip for collection of the diffuse reflected light. The excitation and collection fibers were separated following a Y style so that the diffuse reflected light which propagated through a tissue volume was sent to the detection module. In this module, this light passed through a dichroic mirror and a holographic notch filter before being focused on the entrance aperture of the spectrometer so that only the filtered Raman scattered light reached the spectrometer. In the spectrometer, the Raman signal was obtained by using a CCD detector (Andor IDUs 420 Series) with approximately 95% of quantum efficiency. All components of our Raman spectroscopy system including dichroic mirror, a holographic notch filter, band-pass filter and customized fiber optic probe were part of the commercial Holospec Raman spectrometer system and Andor Solis detector. The Raman spectroscopy instrumentation used to collect the Raman spectra tissue of this study is shown in Figure  1. In this study, Raman spectra were acquired through the average of 20 iterations (repetitions) of 2 s per spectrum. In this study, Raman spectra collected from gingiva tissue included signals from alveolar bone, teeth, periodontal ligament and gingiva, as illustrated in Figure 2. In this study, Raman spectra collected from gingiva tissue included signals from alveolar bone, teeth, periodontal ligament and gingiva, as illustrated in Figure 2.

Data Analysis
Once Raman spectra were collected, the Raman spectra had their baseline removed by subtracting a polynomial of order five from the local minima of each spectrum. Next, the spectra were smoothed by using a Savitsky-Golay filter (5th order, frame size 7). The wavenumber range between 800 and 1730 cm −1 was chosen for analysis, as the Raman background of silica/quartz fibers is strong from 600 cm −1 up to the quartz peak at 800 cm −1 [59,60], and the range 1730-1800 cm −1 was considered irrelevant for differentiation of tissues in this study. In fact, Raman vibrational modes are tabulated only up to 1756 cm −1 by Talari et al. and Movasaghi et al. [61]. We used the tabulated data by Talari et al. and Movasaghi et al. for assignment of the Raman vibrational modes of spectra of principal component (PC) loadings obtained from the principal component analysis (PCA) used in this study. Briefly, PCA created a new set of linearly independent variables based on linear combinations of the original variables (wavenumbers) so that maximum variance of the dataset is explained. The new variables (principal components or PCs) were ordered from the highest to the lowest explained variance (relative to the total variance of the dataset). With this in mind, first order components (e.g., PC1, PC2 and PC3) represent the dimensions where Raman tissue data are most "spread". Since Raman data vary mostly across these dimensions, first order components contain the information about the wavenumbers where tissue of all oral subsites (lip, buccal mucosa, tongue and gingiva) are most spread, and, thus, have a higher chance of being differentiated. In order to not bias data on wavenumbers where the amplitude of the Raman signal is higher, the data were scaled by z-scoring each value of Raman intensity at each wavenumber. The z-scores for each wavenumber were centered to have mean 0 and scaled to have standard deviation 1. Once PCA was performed, the new coordinates of each sample (PC scores) were com-

Data Analysis
Once Raman spectra were collected, the Raman spectra had their baseline removed by subtracting a polynomial of order five from the local minima of each spectrum. Next, the spectra were smoothed by using a Savitsky-Golay filter (5th order, frame size 7). The wavenumber range between 800 and 1730 cm −1 was chosen for analysis, as the Raman background of silica/quartz fibers is strong from 600 cm −1 up to the quartz peak at 800 cm −1 [59,60], and the range 1730-1800 cm −1 was considered irrelevant for differentiation of tissues in this study. In fact, Raman vibrational modes are tabulated only up to 1756 cm −1 by Talari et al. and Movasaghi et al. [61]. We used the tabulated data by Talari et al. and Movasaghi et al. for assignment of the Raman vibrational modes of spectra of principal component (PC) loadings obtained from the principal component analysis (PCA) used in this study. Briefly, PCA created a new set of linearly independent variables based on linear combinations of the original variables (wavenumbers) so that maximum variance of the dataset is explained. The new variables (principal components or PCs) were ordered from the highest to the lowest explained variance (relative to the total variance of the dataset). With this in mind, first order components (e.g., PC1, PC2 and PC3) represent the dimensions where Raman tissue data are most "spread". Since Raman data vary mostly across these dimensions, first order components contain the information about the wavenumbers where tissue of all oral subsites (lip, buccal mucosa, tongue and gingiva) are most spread, and, thus, have a higher chance of being differentiated. In order to not bias data on wavenumbers where the amplitude of the Raman signal is higher, the data were scaled by z-scoring each value of Raman intensity at each wavenumber. The z-scores for each wavenumber were centered to have mean 0 and scaled to have standard deviation 1. Once PCA was performed, the new coordinates of each sample (PC scores) were composed of a weighted sum of Raman intensities at each wavenumber and could be used to check the differentiation among oral subsites. The weight (loading) of each original variable (wavenumber) on the composition of a PC determined the importance each wavenumber was given at each PC. Then, the combination of weights (PC loadings) showed the wavenumber ranges influencing most of the variance in the dataset. By excluding the wavenumbers out of the spectral region of interest (800-1730 cm −1 ), we could evaluate indicators of tissue differentiation using vibrational modes within this range by looking at PC scores and cumulative explained variance by PCs. A flowchart illustrating the steps of Raman spectral analysis is shown in Figure 3.  Figure 4 suggests that the average Raman spectra of buccal mucosa, lip, gingiva and tongue have similar characteristics to those found in previous studies [62,63]. Prominent characteristics include the peaks at 938 cm −1 and 1130 cm −1 most prominent in tongue tissues, the phosphate peak at 960 cm −1 in gingiva and the peaks at 1271 cm −1 , 1303 cm −1 , 1447 cm −1 and 1657 cm −1 in buccal mucosa and lip. On the other hand, the characteristic peaks of certain tissues may not be useful for the understanding of the tissue differentiation in   Figure 4 suggests that the average Raman spectra of buccal mucosa, lip, gingiva and tongue have similar characteristics to those found in previous studies [62,63]. Prominent characteristics include the peaks at 938 cm −1 and 1130 cm −1 most prominent in tongue tissues, the phosphate peak at 960 cm −1 in gingiva and the peaks at 1271 cm −1 , 1303 cm −1 , 1447 cm −1 and 1657 cm −1 in buccal mucosa and lip. On the other hand, the characteristic peaks of certain tissues may not be useful for the understanding of the tissue differentiation in terms of biochemical content due to the biological intra-and inter-patient variability. This variability is one of the sources of confusion when discriminating diseased and healthy tissues. Insights into the biological variability or heterogeneity of the healthy oral subsites investigated can be drawn from the progression of cumulative explained variance upon consideration of increasing numbers of PCs.  The relatively slow progression of the cumulative variance explained as we increase the number of considered PCs ( Figure 5) suggests high biological variability among the oral subsites considered in this study. The first 3 PCs (PC1, PC2 and PC3) explained 68.9% of the variance of the dataset, whereas 81.7% was explained with 7 PCs, 85.1% with 10 PCs, 90% with 20 PCs and 95% with 40 PCs. Still, the PC scores plot ( Figure 6) indicates PC1, PC2 and PC3 lead to a clear separation of buccal mucosa, gingiva and tongue tissues, whereas lip may be confused with the other three tissues due to its high heterogeneity.  The relatively slow progression of the cumulative variance explained as we increase the number of considered PCs ( Figure 5) suggests high biological variability among the oral subsites considered in this study. The first 3 PCs (PC1, PC2 and PC3) explained 68.9% of the variance of the dataset, whereas 81.7% was explained with 7 PCs, 85.1% with 10 PCs, 90% with 20 PCs and 95% with 40 PCs. Still, the PC scores plot ( Figure 6) indicates PC1, PC2 and PC3 lead to a clear separation of buccal mucosa, gingiva and tongue tissues, whereas lip may be confused with the other three tissues due to its high heterogeneity.  The relatively slow progression of the cumulative variance explained as we increase the number of considered PCs ( Figure 5) suggests high biological variability among the oral subsites considered in this study. The first 3 PCs (PC1, PC2 and PC3) explained 68.9% of the variance of the dataset, whereas 81.7% was explained with 7 PCs, 85.1% with 10 PCs, 90% with 20 PCs and 95% with 40 PCs. Still, the PC scores plot ( Figure 6) indicates PC1, PC2 and PC3 lead to a clear separation of buccal mucosa, gingiva and tongue tissues, whereas lip may be confused with the other three tissues due to its high heterogeneity.

Raman Vibrational Modes Leading to Differentiation of Healthy Oral Subsites
Accurate characterization of each oral subsite regardless of biological variability requires complete understanding of biochemical compounds leading to the tissue differentiation among subsites. Therefore, we have identified the peaks of absolute values of PC loadings, which correspond to the wavenumbers leading to the larger variance in the analyzed dataset and, as shown in Figure 6, highest differentiation among buccal mucosa, gingiva, lip and tongue tissues. Figure 7 shows the loadings of PC1 (32.9% of total variance of the dataset) as a function of wavenumber. Here, we showed both positive and negative loadings in order to retain fidelity to which wavenumber ranges were considered independent when calculating PCs. Vibrational modes were assigned to peaks of absolute amplitude of PC1 loadings according to Table 1. The relationship between vibrational modes assigned to peaks of PC1, PC2 and PC3 loadings and oral biology and biochemistry has been included in the discussion section.

Raman Vibrational Modes Leading to Differentiation of Healthy Oral Subsites
Accurate characterization of each oral subsite regardless of biological variability requires complete understanding of biochemical compounds leading to the tissue differentiation among subsites. Therefore, we have identified the peaks of absolute values of PC loadings, which correspond to the wavenumbers leading to the larger variance in the analyzed dataset and, as shown in Figure 6, highest differentiation among buccal mucosa, gingiva, lip and tongue tissues. Figure 7 shows the loadings of PC1 (32.9% of total variance of the dataset) as a function of wavenumber. Here, we showed both positive and negative loadings in order to retain fidelity to which wavenumber ranges were considered independent when calculating PCs. Vibrational modes were assigned to peaks of absolute amplitude of PC1 loadings according to Table 1. The relationship between vibrational modes assigned to peaks of PC1, PC2 and PC3 loadings and oral biology and biochemistry has been included in the discussion section.    In contrast to the PC1 loadings, Figure 8 indicates that the loadings of PC2 (21.4% of total variance of the dataset) as a function of wavenumber were mostly positive values. As can be observed, bands corresponding to tissue differentiation occur in completely different wavelength ranges compared to those of loadings of PC1 and PC3, which confirms that the PCs are independent and contain complementary information for that differentiation. Vibrational modes were assigned to peaks of absolute amplitude of PC2 loadings according to Table 2.  In contrast to the PC1 loadings, Figure 8 indicates that the loadings of PC2 (21.4% of total variance of the dataset) as a function of wavenumber were mostly positive values. As can be observed, bands corresponding to tissue differentiation occur in completely different wavelength ranges compared to those of loadings of PC1 and PC3, which confirms that the PCs are independent and contain complementary information for that differentiation. Vibrational modes were assigned to peaks of absolute amplitude of PC2 loadings according to Table 2.  Glucose, triglycerides, C-C (lipid) 1335 cm −1 CH 3 CH 2 wagging or twisting Collagen or nucleic acids 1495 cm −1 C-C stretching in benzenoid ring 1680 cm −1 Bound and free NADH Compared to the loadings of PC1 and PC2, those of PC3 (14.5% of total variance of the dataset) have a much higher frequency of variation and narrower peaks (Figure 9), which suggests that a large range of biomolecules contributes to small variations of the Raman signal of healthy oral subsites. Vibrational modes were assigned to peaks of absolute amplitude of PC3 loadings according to Table 3.
Compared to the loadings of PC1 and PC2, those of PC3 (14.5% of total variance of the dataset) have a much higher frequency of variation and narrower peaks (Figure 9), which suggests that a large range of biomolecules contributes to small variations of the Raman signal of healthy oral subsites. Vibrational modes were assigned to peaks of absolute amplitude of PC3 loadings according to Table 3.    In-plane vibrations of the conjugated -C=C-Carotenoids 1627 cm −1 Cα=Cα stretch and amide C=O stretching absorption β-form polypeptide films 1688 cm −1 Disordered structure; non-hydrogen bonded Amide I

Discussion
The clinical aspects of the healthy gingival mucosa appear in a pale pink color, firm, soft consistency and dotted surface, similar to an orange peel. According to Lascala et al. [64], in the histological point of view, the periodontium consists of connective tissue, covered by stratified parakeratinized squamous epithelium, which can vary with the degrees of keratinization (Figure 9). When the gingiva has abnormalities in its structure, there may be an increase in the thickness of the epithelium.
Berkovitz et al. [65] stated that the jugal mucosa is composed of stratified nonkeratinized squamous epithelial tissue, containing cells rich in glycogen and loose connective tissue, underlying the epithelium. However, in the lingual mucosa tissue, we find connective tissue with blood and lymph vessels, nerve ganglia, nerves, adipose tissue and lymphoid tissue, filiform papillae, fungiform papillae and circumvented papillae. The portion of the tongue facing the palate is called the lingual dorsum and the portion facing the buccal floor is called the lingual belly. Berkovitz et al. [66] also indicated that in this region, the lining epithelium is a keratinized stratified pavement. In the connective tissue below, we find hair, sweat glands and sebaceous glands. The intermediate portion, known as the red zone of the lip, has a stratified squamous epithelium slightly keratinized (1133 cm −1 ; Table 1), and whose adjacent connective tissue (1212 cm −1 , 1245 cm −1 , 1335 cm −1 , 1641 cm −1 , 1688 cm −1 ; Tables 1-3) is richly capillary. Finally, the inner lining of the lips and cheeks (mucous membrane of the oral cavity) is covered by the buccal mucosa. In this case, the epithelium is stratified, non-keratinized, with a lamina of loose connective tissue.
Considering the results obtained in this research, in the epidermis, the lipids (1064 cm −1 , 1074 cm −1 , 1305 cm −1 , 1380 cm −1 and 1440 cm −1 ; Tables 1-3) that make up the barrier cell membranes consist mainly of cholesterol, free fatty acids and ceramides (1064 cm −1 , 1074 cm −1 , 1305 cm −1 , 1380 cm −1 and 1440 cm −1 ; Tables 1-3) [67]. The palate epithelium and gingiva appear to be more similar to the epidermis, and both areas are keratinized and produce flat scales on the surface, and there are particles of membrane lining in their nucleated cells [65,68].
Cytosine is an important part of DNA and RNA (813.6 cm −1 , 824 cm −1 , 906 cm −1 , 970.3 cm −1 , 1335 cm −1 and 1428 cm −1 ; Tables 1-3), as it is one of the nitrogenous bases which encode the genetic information of these molecules, and may be modified in different bases to carry epigenetic information. In DNA, adenine and thymine are present in the same percentages and are always paired with each other. Watson and Crick showed that the DNA molecule is a double helix made up of two paired strands, held together by weak chemical bonds, known as hydrogen bonds, each with its nucleotide sequence-adenine, thymine, cytosine and guanine, which can be referred to as A, T, C and G-complementing the other. That is, adenine is paired with thymine and cytosine with guanine.
Currently, more than 600 specimens of carotenoids (1525 cm −1 , Table 3) have been identified, structurally classified into seven different types and distributed in various isomeric forms [72]. The name "carotenoids" is derived from the scientific name of the carrot. According to Krinsky et al. [72], carotenoids (1525 cm −1 , Table 3) in the human body are partially converted to vitamin A (retinol), playing an important nutritional role, in addition to exercise and other actions. In this way, carotenoids can reduce the risk of chronic non-communicable diseases, prevent cataract formation and reduce aging-related macular degeneration. In addition, carotenoids (1525 cm −1 , Table 3) play a fundamental role as a protector against photooxidation.
Mesquita et al. [73] suggested that natriuretic peptides type B (BNP) and amino terminal fraction of proBNP (NT-proBNP) are considered standard biomarkers in decompensated heart failure. Some materials correlated with calcium phosphate have generated interest in researchers. What motivates this interest is the chemical compatibility and similarity that exists between minerals (calcium phosphates and apatites (962.3 cm −1 ; Table 3)) and different parts of the human body, such as bone and dental tissues [64].
Schnieders et al. [74] discussed the porous morphology of calcium phosphates (962.3 cm −1 ; Table 3), presenting the possibility of incorporating drugs on its surface. Upon drug adsorption on the surface of calcium phosphates, it is possible to generate a biomaterial that can be used in denture coating and even as cement material in a dental restoration procedure.
Kuroki et al. [75] carried out a comparative study and have showed that to maintain the proliferation of human cells, palmitic acid (1133 cm −1 ; Table 1) is essential as energy storage. The epithelium of the oral mucosa (both basal and suprabasal layers) showed a significantly higher percentage composition of palmitic acid (1133 cm −1 ; Table 1) than the epidermis, but no difference in its distribution between the two layers. These results suggested a much higher energy metabolism in the oral mucosa. The percentage composition of palmitic acid (1133 cm −1 ; Table 1) was significantly higher in keratinocytes (1133 cm −1 ; Table 1) of the oral mucosa (non-keratinization; 28.58 ± 5.25) and the gingiva (parakeratinization; 23.00 ± 1.40) compared to in the epidermis (orthokeratinization; 17.54 ± 0.37).
Finally, it is worth mentioning that Raman spectroscopy could be combined with other optical techniques enabling qualitative tissue evaluation through structural analysis. One of these techniques is optical coherence tomography, which could potentially be used to ensure the Raman signals are captured only from the tissue of interest [76,77].

Conclusions
In this study, we have analyzed the vibrational modes of peaks of absolute loading amplitudes of principal components (PCs) of Raman spectra in order to determine the biochemical compounds leading to the differentiation of buccal mucosa, lip, gingiva and tongue tissues. In addition, we have provided insight into the biological variability and heterogeneity of healthy oral tissues, as well as the biochemical characteristics for differentiation and accurate characterization of the four types of oral subsites (buccal mucosa, lip, gingiva and tongue). Upon definition of the tissue biochemistry of healthy oral subsites, we developed a spectral reference of oral healthy tissues of individuals in the Brazilian population for future diagnosis of early pathological conditions using real-time, noninvasive and label-free techniques such as Raman spectroscopy. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to expansion of the study and data analysis, possibly leading to commercial applications.