Vibrational Spectroscopy Saliva Profiling as Biometric Tool for Disease Diagnostics: A Systematic Literature Review

Saliva is a biofluid that can be considered as a “mirror” reflecting our body’s health status. Vibrational spectroscopy, Raman and infrared, can provide a detailed salivary fingerprint that can be used for disease biomarker discovery. We propose a systematic literature review based on the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines to evaluate the potential of vibrational spectroscopy to diagnose oral and general diseases using saliva as a biological specimen. Literature searches were recently conducted in May 2020 through MEDLINE-PubMed and Scopus databases, without date limitation. Finally, over a period of 10 years, 18 publications were included reporting on 10 diseases (three oral and seven general diseases), with very high diagnostic performance rates in terms of sensitivity, specificity, and accuracy. Thirteen articles were related to six different cancers of the following anatomical sites: mouth, nasopharynx, lung, esophagus, stomach, and breast. The other diseases investigated and included in this review were periodontitis, Sjögren’s syndrome, diabetes, and myocardial infarction. Moreover, most articles focused on Raman spectroscopy (n = 16/18) and more specifically surface-enhanced Raman spectroscopy (n = 12/18). Interestingly, vibrational spectroscopy appears promising as a rapid, label-free, and non-invasive diagnostic salivary biometric tool. Furthermore, it could be adapted to investigate subclinical diseases—even if developmental studies are required.


Introduction
Translation of precision medicine into mainstream clinical care is being prioritized worldwide and is increasingly being advanced as the future paradigm for more effective medical management. Precision medicine, also coined as P4 medicine by Hood and Friend [1], who characterized it as Molecules 2020, 25, 4142 2 of 27 being "predictive", "preventive", "personalized", and "participatory", embraces a system approach to understanding underlying disease pathophysiology coupled with individually tailored healthcare informed by an individual's genes, lifestyle, and environment [2,3]. The search for biomarkers can then be beneficial in various clinical situations for patient management: Screening of patients at risk of the disease or with the disease at an early stage, differential diagnosis of the disease with other conditions, the prognosis of the disease independently of the treatment, prediction of the response to treatment, and monitoring of disease evolution [4].
In this context of the search for diagnostic markers, vibrational spectroscopy (VS), infrared absorption (IR), and Raman scattering spectroscopies, appears to be a promising alternative approach in research for developing new modalities with the aim to improve patient healthcare via the better diagnosis, prognosis, and surveillance. vs. modalities hold such promises because the "molecular fingerprint" that it provides a snapshot of the sample biomolecular composition, and variations therein can be exploited to identify disease status [4]. The diagnostic potential of vs. approach has been published. However, in the vast majority of cases as proof-of-concept studies, mainly in malignant tumors on various types of biosamples, such as biofluids [5], cells [6], or tissues [7]. Biofluids seem particularly suitable for the detection of many types of diseases, because they are in direct connection with organs of the human body and are generally easily collected [4,8].
In a healthy individual, the daily salivary secretion is estimated to be between 0.5 and 1.5 L. Its collection is easy, non-invasive, painless, and low-cost with minimal risks of exposure to infectious agents [13]. Studies with different techniques of proteomics, metabolomics, transcriptomics, or microbiomics have shown the potential interest of using saliva in the diagnosis of oral diseases (such as periodontitis or oral cancer), but also of systemic diseases (such as breast cancer, diabetes, and Sjögren's syndrome) [9,13].
Thus, the aim of this systematic literature review was to demonstrate the real potential of vs. to diagnose oral and general diseases using saliva. Literature searches were recently conducted without date limitation, in May 2020, through MEDLINE-PubMed and Scopus databases, according to PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines.

Results
The initial search using keywords combinations returned 172 articles on PubMed and 164 articles on Scopus. In the first phase, duplicates articles were removed. Titles and abstracts of the remaining 267 papers were reviewed, and 227 of them were excluded as they were not relevant to the inclusion criteria. The remaining full-text articles have been thoroughly scrutinized, and one additional article was found after scrutinizing the references of these 40 retained papers.
All 41 articles were assessed for eligibility, and 23 were finally excluded. The reasons for excluding these full-text articles were because they had less than 20 patients in either group (n = 14), they did not use saliva (n = 3), the aim of the study was not the diagnosis of a disease (n = 3), it was not an original document (n = 1), not written in English (n = 1) or because there was no control group (n = 1). Finally, 18 key papers have been included in this systematic literature review ( Figure 1). not an original document (n = 1), not written in English (n = 1) or because there was no control group (n = 1). Finally, 18 key papers have been included in this systematic literature review ( Figure 1). Although this review aimed to investigate the use of vibrational spectroscopies (Infrared and Raman) as a tool for disease diagnostics, most articles were interested in using Raman spectroscopy (n = 16/18), and more, specifically surface-enhanced Raman spectroscopy (SERS) (n = 12/18). Only two articles focused on Fourier Transform Infrared Spectroscopy (FTIR). This is summarized in Figure 2A. The diagnostic performances in terms of sensitivity, specificity, and accuracy were evaluated by different algorithms. From Figure 2B, it can be noticed that the model based on PCA-LDA followed by LOOCV was most widely used.
Among those 18 articles, 10 different diseases were studied: Three oral and seven general diseases. Moreover, cancers remain the most studied pathology with 13 articles out of 18. The total number of patients included in studies of this review was 2082 with 1226 patients with diagnosed diseases (n = 925) or premalignant disorders/intermediate stage (n = 301) and 856 healthy volunteers (Table 1). Although this review aimed to investigate the use of vibrational spectroscopies (Infrared and Raman) as a tool for disease diagnostics, most articles were interested in using Raman spectroscopy (n = 16/18), and more, specifically surface-enhanced Raman spectroscopy (SERS) (n = 12/18). Only two articles focused on Fourier Transform Infrared Spectroscopy (FTIR). This is summarized in Figure 2A. The diagnostic performances in terms of sensitivity, specificity, and accuracy were evaluated by different algorithms. From Figure 2B, it can be noticed that the model based on PCA-LDA followed by LOOCV was most widely used.
Among those 18 articles, 10 different diseases were studied: Three oral and seven general diseases. Moreover, cancers remain the most studied pathology with 13 articles out of 18. The total number of patients included in studies of this review was 2082 with 1226 patients with diagnosed diseases (n = 925) or premalignant disorders/intermediate stage (n = 301) and 856 healthy volunteers (Table 1).
The total number of patients, including in these cancer studies was 1736 out of which 747 were with cancer (82 squamous cell carcinoma, 264 nasopharyngeal cancer, 82 lung cancer, 49 esophageal adenocarcinoma, 104 gastric cancer, and 166 breast cancer), 271 patients were with premalignant disorders (115 oral, 123 esophageal, and 33 breast lesions) and 718 were healthy patients.

Oral Cancer
Saliva is directly in "contact" with oral squamous cell carcinoma, the most common oral malignancy. In 2016, Jaychandran et al. presented a study based on conventional Raman spectroscopy, evaluating saliva for discrimination of oral squamous cell carcinoma (50 patients), compared to oral premalignant disorders (87 patients) and healthy controls (21 patients) [14]. They also compared their results on saliva with other 'liquid biopsies' (blood and urine) and a conventional tissue biopsy. Spectroscopic data analysis was through a principal component analysis (PCA) followed by linear discriminant analysis (LDA). Raman peaks for discrimination between malignant, premalignant, and normal groups were observed for pyrimidine, amide, mucin, hemocyanin, and carotenoids ( Table 2). Results showed that PCA-LDA was able to discriminate spectra from cancer patients versus non-cancer with an accuracy of 93.1%. Moreover, accuracy was better with saliva samples than with blood or urine (78% and 90.5%, respectively), with however best results obtained with tissue samples. Mean sensitivities and specificities were not described.
Rekha et al. performed a study, also using Raman and PCA-LDA, but with a leave-one-out cross-validation (LOOCV) [15]. They were interested only in saliva sample analysis and compared samples of 23 healthy volunteers to both, 28 patients with oral submucous fibrosis (premalignant group), and 32 clinically diagnosed patients for oral squamous cell carcinoma.
Raman peaks showing differences between different patient groups corresponded to various amino acids, such as histidine, valine, and proline, as well as amide I, nucleic acid, lactic acid, and lipids ( Table 2).

Nasopharyngeal Cancer
Other cancers can contribute endogenously to the composition of saliva via the nasal and bronchial secretions, such as cancers of the nasopharynx or lung. Nasopharyngeal carcinoma (NPC) is by far the most common cancer in the nasopharynx cancers [32].
Feng's group published three articles between 2014 and 2017 on this subject. In the first, Feng et al. carried out a SERS analysis on purified proteins from saliva samples from 62 patients with diagnosed nasopharyngeal cancer and from 30 healthy donors [16]. They used the PCA-LDA model with LOOCV. The authors attribute the SERS peaks of discrimination to phenylalanine, tyrosine, tryptophan, proline, certain proteins, collagen, phospholipids, and Amide I ( Table 2). The use of a ROC (receiver operating characteristic) curve allowed them to obtain with this model an AUC (area under curve), providing some measure of aggregate classification performance. AUC was of 92.4%, as well as sensitivity, specificity, and accuracy values of 98.4%, 73.3%, and 90.2%, respectively, indicating the approach to be a promising one.
In 2016, Qiu et al. presented a complementary study with an identical protocol (SERS analysis, spectral processing with PCA-LDA-LOOCV, and calculation of a ROC curve) and a similar population (32 patients with nasopharyngeal carcinoma versus 30 non-cancer volunteers) [17]. The only difference is that the analysis was done on the whole saliva of the patients without prior purification of the proteins. Major differences in peak intensities between the cancer group and the control group were highlighted. These peaks were attributed, among others, to adenine, nucleic acids, collagen, phenylalanine, glycogen, and fatty acids ( Table 2). The results obtained here were very slightly lower than the previous ones: The AUC of the ROC curve was 91.8, and the classification accuracy was 83.9% for a sensitivity of 86.7%, and a specificity of 81.3%.
In 2017, the same group, Lin X. et al., published another report, with a bigger cohort (170 patients with nasopharyngeal carcinoma and 71 controls) [18]. Moreover, the spectral analysis was performed on the purified saliva proteins, and the rest of the protocol remained strictly identical. Specific SERS peaks were also identified between cancer and control groups, particularly corresponding to phenylalanine, proline, valine, proteins, and collagens (see Table 2). The performance of the prediction model was, however, less than with the two previous studies, with an AUC of the ROC curve of 0.795, as well as sensitivity, specificity, and classification accuracy of 70.7%, 70.3%, and 70.5%, respectively.

Lung Cancer
In 2012, Li et al. used SERS on saliva samples taken from 21 clinically diagnosed lung cancer patients and from 20 healthy [19]. Major changes regarding peaks between these two groups were assigned to amino acids and nucleic acid bases (Table 2). After multivariate analysis with PCA combined with LDA, the study resulted in sensitivity, specificity, and accuracy of 78%, 83%, and 80%, respectively.
Qian et al. carried out in 2018, a study using SERS to discriminate 61 lung cancer saliva samples from 66 non-cancer controls [20]. Twelve peaks that varied significantly from one group to another were identified and attributed mainly to change in protein residues and the content of nucleic acid molecules ( Table 2). Chemometrics analysis was performed using two algorithms: Support vector machine (SVM) and random forest (RF). Differences in SVM results between lung cancer patients and healthy participants' saliva were highlighted after a LOOCV. Slightly better results were achieved with the RF method, reporting an optimal sensitivity and specificity of 96.7% and 100%, respectively, although the SVM method was not that outdone, with a sensitivity of 95.1% and a specificity of 100%.

Esophageal Cancer
Furthermore, other cancers, such as esophageal and gastric, can contribute endogenously to the composition of saliva by the gastroesophageal reflux. Maitra et al. published two articles in 2019 and 2020 about esophageal adenocarcinoma, which is the most common esophageal cancer in the developed world [21,22]. In these studies, they collected samples of four different biofluids (plasma, serum, urine, and saliva) from six categories of patients: Patients with a diagnosed esophageal adenocarcinoma (OAC), with high grade dysplasia (HGD), with low grade dysplasia (LGD), with Barrett's esophagus (a premalignant lesion of esophageal adenocarcinoma), with esophageal inflammation, and healthy volunteers. The two studies differed by the techniques used, attenuated total reflectance-Fourier transform infrared spectroscopy (ATR-FTIR) for the first, and conventional Raman spectroscopies for the second. Several predictive models were built using different supervised classification algorithms (principal component analysis quadratic discriminant analysis, PCA-QDA; successive projections algorithm quadratic discriminant analysis, SPA-QDA; genetic algorithm quadratic discriminant analysis, GA-QDA).
With ATR-FTIR, the best results were achieved with SPA-QDA [21]. Category-distinguishing wavenumbers obtained for SPA-QDA and GA-QDA models corresponded to regions of phosphodiester, polysaccharides, pectin, phosphate II, phenyl vibrations, amide I, guanine, and lipids ( Table 2). SPA-QDA allowed to correctly classify the different categories of patients, and in particular those greater than 20 patients: Normal (n = 38), Barrett's esophagus (n = 27) and OAC (n = 25) with accuracy values between 88.8% and 96.3%. With SPA-QDA, the value of sensitivity was between 95.4% and 100%, and the value of specificity was between 62.5% (healthy) and 100% (Table 1).

Gastric Cancer
Chen et al. [23] performed a study using SERS on saliva samples of 84 late gastric cancer, 20 early gastric cancer, and 116 healthy volunteers. They relied on studies that demonstrated that certain metabolites, such as amino acids could be used as cancer biomarkers [33,34]. They first carried out a saliva assay of the 10 most concentrated amino acids in saliva (taurine, glycine, glutamine, ethanolamine, histidine, alanine, glutamic acid, hydroxylysine, proline, and tyrosine) and whose concentrations varied the most between the three categories of patients (Table 2). They found using a ROC curve that the combination of 10 amino acids allowed them to distinguish patients with gastric cancer from healthy patients. After this first step, they developed a protocol to detect these amino acids with SERS, before using it on the saliva samples. Spectroscopic data analysis was performed using PCA to discriminate the spectra of the patients according to the different groups. With this approach, they achieved a sensitivity of 87.7% and a specificity of 80% in discriminating advanced gastric cancer from non-cancer patients. The results concerning early gastric cancer were not to be outdone, with a sensitivity and a specificity of 80.0% and 88.8%, respectively. Mean accuracy was not described, but a negative predictive value of 94.5% was given for controls [23].

Breast Cancer
Three studies have investigated breast cancer, although saliva is not in direct contact with this organ.
The first study on breast cancer screening using vs. of saliva samples was published in 2015 by Feng et al. [24]. They included 31 patients with proven cancer, 33 patients with a benign tumor, and 33 healthy volunteers. Spectral analysis was performed after saliva protein purification, and the main observed peaks were attributed to phenylalanine, tryptophan, tyrosine, hydroxyproline, proline, amide I and III, collagen, and lipids (Table 2). PLS-DA, a regression extension of PCA, was employed in combination with LOOCV to analyze and discriminate between the saliva protein SERS spectra of the three groups of participants. The authors achieved a sensitivity between 72.7% and 75.8%, a specificity between 81.3% and 93.8%, and an overall accuracy between 78.4% and 87.6% to discriminate breast cancer, benign breast lesions, and control (Table 1).
In 2017, Hernández-Arteaga et al. published a study on breast cancer diagnosis based on SERS of salivary sialic acid assay [25] by considering the previous observations by Ozturk et al., who demonstrated that sialic acid concentration was significantly higher in breast cancer patients than in control patients [35]. Using a calibration set of sialic acid (SA), the concentration of SA correlated well with three peak intensities at 1002, 1237, and 1391 cm −1 corresponding to pyranose, amide III, and carboxyl, respectively (Table 2). Results showed that salivary SA concentration was significantly higher in the cancer group (n = 100) compared to the control group (n = 106), 18.5 mg/dL and 3.5 mg/dL, respectively. Moreover, the authors concluded that the discriminating threshold concentration of sialic was 7 mg/dL. This method showed a sensitivity of 94%, a specificity of 98%, and an accuracy of 92%.
In 2019, the same group conducted a similar SERS study with 35 breast cancer patients and 129 healthy patients [26]. ROC curve defines a threshold value of salivary SA of 12.5 mg/dL, for which sensitivity and specificity were 80.6% and 93.1%, respectively. In addition, the Area Under Curve (AUC) calculated from the ROC curve at this sialic acid concentration reached 0.95, indicating that this diagnostic test is very promising.

Other Diseases
Among the 18 selected articles, five dealt with four diseases unrelated to cancer, two concerning oral diseases directly "in contact" with saliva (periodontitis and Sjögren's syndrome) In contrast, two others were systemic diseases (diabetes and myocardial infarction).

Periodontitis
Periodontitis are multifactorial infectious diseases affecting almost 50% of the population with an inflammatory component. They affect supporting tissues of the tooth and can lead to tooth loss in their most advanced stages [36].
In 2019, Hernandez-Cedillo et al. reported a study using SERS [27] for discriminating periodontitis from control samples. This work aimed at carrying out a dosage of sialic acid in the saliva of 93 subjects: Thirty-three with periodontitis, thirty with gingivitis (superficial periodontal inflammation), and thirty healthy volunteers. For this purpose, they used the same protocol they had developed in 2017 for breast cancer [25]. Their results showed that patients with periodontitis could be discriminated from healthy volunteers, but not from those with gingivitis. Moreover, they determined a "threshold" concentration of 12 mg/mL of sialic acid above which the diagnosis of periodontitis could be performed. For this concentration, the test had a sensitivity of 69.6%, a specificity of 100%, and the AUC (ROC curve) was 88.8%.

Sjögren's Syndrome
Sjögren's syndrome (SjS) is a systemic autoimmune disease that is characterized by lymphoid infiltration of the salivary and lacrimal glands [28]. In 2019, Stefancu et al. used SERS of saliva and serum samples to discriminate SjS (n = 29) from control (n = 21) subjects [28]. For both saliva and serum, SERS spectra depicted some similar bands that were attributed mainly to purine metabolites, such as uric acid, xanthine, and hypoxanthine ( Table 2). The supervised classification model based on PCA-LDA followed by LOOCV resulted in a sensitivity, specificity, and accuracy of 96.5%, 90.5%, and 94% for saliva and 96.5%, 100%, and 98% for serum, respectively. In 2020, the same group, Moisoiu et al., conducted a SERS analysis on saliva from SjS (n = 31) and control (n = 22) subjects [29]. Spectral analysis and processing remained strictly identical. Mean sensitivity and specificity were 77% and 74%, respectively, whereas the overall accuracy was 75%. A distinguishing feature of this study was the association of SERS with a two-dimensional ultrasonic elastography technique that improved sensitivity, specificity, and accuracy to 80%, 81%, and 81%, respectively.

Diabetes
Diabetes is a multifactorial metabolic disease characterized by chronic hyperglycemia and disturbances in the metabolism of carbohydrates, lipids, and proteins. It is caused by a deficiency in insulin secretion (type I), the action of insulin (type II), or both [30,37,38].
The only publication on the saliva-based diagnosis of diabetes dates from 2010. In this work, Scott et al. performed an FTIR analysis of saliva samples from 39 diabetic patients and 22 control patients [30]. An LDA was employed to identify six discriminant spectral regions that best differentiate diabetic from control patients attributable to glycation products, proteins, and amino acids ( Table 2). An accuracy of 88.2% was obtained with LDA-cross validation on the test set. Sensitivity and specificity values were not provided.

Acute Myocardial Infarction
Acute myocardial infarction (AMI) is a myocardial necrosis caused by ischemia and persistent hypoxia related to obstruction of a coronary artery [39].
In 2015, Cao et al. studied saliva samples from 46 AMI patients and 43 healthy volunteers [31]. A conventional Raman spectroscopy analysis was performed, and data were processed by PCA-LDA, followed by a LOOCV. Prominent Raman peaks were identified and assigned at cysteine, phenylalanine, tyrosine, tryptophan, hydroxyproline, nucleic acids, proteins, amide I and II (Table 2). A ROC curve was constructed from the results obtained using the predictive model. The sensitivity and specificity were 80.4% and 81.4%, respectively, while the calculated AUC was 0.855. This study suggested Raman spectroscopy as a potential diagnostic tool.
For a visual comparison and in order to summarize the diagnostic performances of the different prediction models used in the 18 studies concerning vs. of saliva, the values are displayed in Figure 3.

Diabetes
Diabetes is a multifactorial metabolic disease characterized by chronic hyperglycemia and disturbances in the metabolism of carbohydrates, lipids, and proteins. It is caused by a deficiency in insulin secretion (type I), the action of insulin (type II), or both [30,37,38].
The only publication on the saliva-based diagnosis of diabetes dates from 2010. In this work, Scott et al. performed an FTIR analysis of saliva samples from 39 diabetic patients and 22 control patients [30]. An LDA was employed to identify six discriminant spectral regions that best differentiate diabetic from control patients attributable to glycation products, proteins, and amino acids ( Table 2). An accuracy of 88.2% was obtained with LDA-cross validation on the test set. Sensitivity and specificity values were not provided.

Acute Myocardial Infarction
Acute myocardial infarction (AMI) is a myocardial necrosis caused by ischemia and persistent hypoxia related to obstruction of a coronary artery [39].
In 2015, Cao et al. studied saliva samples from 46 AMI patients and 43 healthy volunteers [31]. A conventional Raman spectroscopy analysis was performed, and data were processed by PCA-LDA, followed by a LOOCV. Prominent Raman peaks were identified and assigned at cysteine, phenylalanine, tyrosine, tryptophan, hydroxyproline, nucleic acids, proteins, amide I and II ( Table  2). A ROC curve was constructed from the results obtained using the predictive model. The sensitivity and specificity were 80.4% and 81.4%, respectively, while the calculated AUC was 0.855. This study suggested Raman spectroscopy as a potential diagnostic tool.
For a visual comparison and in order to summarize the diagnostic performances of the different prediction models used in the 18 studies concerning vs. of saliva, the values are displayed in Figure  3.

Discussion
In this work, we assessed the potential of vibrational spectroscopy as a biometric tool to diagnose oral and general diseases using saliva as a biological specimen. This systematic literature review was conducted using MEDLINE-PubMed and Scopus databases, according to the PRISMA guidelines.
The selected studies show the promising potential of saliva-based vibrational spectroscopy as a non-invasive and rapid diagnostic tool. The majority of researches in the field of biomedical vibrational spectroscopy has been based on the analysis of human tissues, healthy and cancerous tissues, such as breast, lung, colon, prostate, oral, liver, kidney, and many others. In the past decade, there has been growing interest in a multiple of blood-derived biofluids, also termed "liquid biopsies", such as serum and plasma, but also bile, urine, and saliva because of their availability in comparison with solid biopsies. Indeed, 267 publications have been identified as using vs. of saliva for diagnostics, but a large majority of them are pilot or proof-of-concept studies, including a small number of patients.
It is important to note that specific characteristics as disease prevalence (impacting positive and negative predictive values) and sample numbers are important in order to evaluate the clinical utility of this tool. Hence, in the context of this study, we defined a minimum of 20 patients per group as one of the criteria according to a report by Bell et al. [40], who showed in 2018 that with this number, the size effect seems to have a limited impact (small standardized difference) on the results for pilot studies compared to the main trial. In addition, the size effect has more impact below n = 20 as the sensitivity and specificity values were very random, although the essential statistical parameters were considered in the evaluation for clinical utility. Consequently, other vs. studies using saliva, were not included in this review, due to a number of patients less than 20 per group, and concerning other diseases, such as burning mouth syndrome [41], asthma [42], Alzheimer's disease [43], chronic renal failure [44], ovarian cancer [45], various infections, e.g., influenza [46] or pseudomonas [47], and cystic fibrosis [48]. Therefore, out of the 267 studies, only 18 satisfied the criteria according to the PRISMA guidelines.
In this review, results from all selected publications were very promising with interesting accuracy values of 70-80% for three studies, 80-90% for five studies, and >90% for five studies. Five studies presented no accuracy value. It is noteworthy that for both cancer and non-cancer pathologies, the performance of diagnostic tests via vs. was satisfactory with relatively high accuracy values. Interestingly, different groups have shown similar results with similar spectral and analytical methods. For oral cancer, Jaychandran et al. [14] and Rekha et al. [15] obtained a diagnostic accuracy of 93.1% and 89.1%, respectively, using the same pre-analytical conditions and PCA-LDA.
In contrast and surprisingly enough, for Sjögren's syndrome (SjS), the same group found different results with an equivalent number of patients, the same technique (SERS), and an identical prediction model (PCA-LDA followed by LOOCV). Indeed, in 2019 Stefancu et al. obtained a sensitivity of 96.5%, specificity of 90.5%, and accuracy of 94% [28], while in 2020 Moisoiu et al. obtained a sensitivity of 77%, specificity of 74%, and accuracy of 75% [29]. These different performances could be related to the sample preparation method (deproteinized saliva) [28] or not [29]. Furthermore, the same group of researchers, in 2014, Feng et al. [16] obtained an accuracy value of 90.2% and Qiu et al. in 2016 [17], and accuracy value of 83.9% for nasopharynx cancer, despite an identical procedure for statistical analysis (PCA-LDA with LOOCV) and using the same approach as SERS. Again, the differing results could probably be explained by a change in the saliva preparation, the first group using total purified protein saliva, the second, frozen total saliva without cells, although the difference in the sample size may also impact on the accuracy values.
In addition, the salivary composition can be influenced by the collection time (as cortisol secretion peak between 06:00 to 08:00 a.m.), collection methods (stimulated or not), current medical treatments, the presence of comorbidities or oral inflammation [11,12]. In this review, for example, the time of sampling is not always specified in the included studies (n = 10/18), as well as the volume of saliva that was collected, varying from 1 to 4 mL when specified. The pre-analytical parameters are important to consider and can constitute a methodological bias. To our knowledge, there are no publications or guide specifying the importance of the pre-analytical preparation of saliva used in VS. However, for blood sampling, it has been shown that storage at −80 • C and in plastic tubes has no effect on generated spectra, whereas variations in the drying and the storage of samples (fresh or frozen, avoid freeze-thaw cycles) can have an effect [49]. Therefore, standardization of saliva sample handling (collection, processing, and storage protocols) is crucial to ensure reproducibility and consistency in the results obtained as suggested by other reports [4,18,50].
The concept of confounding factors is also very important, and yet, in all selected studies of this review, it is very inadequately addressed. It is difficult to know whether biases in patient selection exist, for example, with respect to age, sex, comorbidities, tobacco and alcohol consumption, drug uptake before/after saliva collection, as these factors may strongly influence the saliva content, and hence, its resulting spectral profile. Derruau et al. recently showed using saliva-based IR spectroscopy, that the periodontal diseases, a multifactorial inflammatory, and infectious oral disease could influence the salivary infrared spectra in the spectral range of lipids and proteins absorption (2800-3000 cm −1 ) [11] and could become a confounding factor in the detection of other multifactorial inflammatory diseases. Indeed, for Hernandez et al., in three different reports published in 2017 and 2019 [25][26][27], periodontal disease and breast cancer were discriminated by SERS using the same marker bands of sialic acid. Thus, one pathology becomes a confounding factor of the other, and vice versa. In Hernandez's first publication on breast cancer, the patient inclusion criteria were based on information, such as 'Patients had no oral complaints'. In the second, patients should not have periodontitis or gum bleeding. However, in both of these, there is no mention of who and how were these criteria evaluated. No specialists in oral diseases are in the author list. The implication of clinicians in connection with the development of clinical tests and particularly with respect to the concerned pathology is obvious to establish the validity of these clinical tests (e.g., in the assessment of patient inclusion criteria). Twenty-two percent of the publications cited in this review (n = 4/18) do not present any clinicians related to the studied pathology among the authors.
Other criteria are to be taken into account as the type of vs. used as analytical methods. In this review, selected studies on saliva using IR, Raman, and SERS, respectively, gave accuracy values in the ranges 88.2-88.8%, 89.1-95.6%, and 70.5-94%. ATR-FTIR spectroscopy is very applicable to the routine monitoring of biofluids [51], but the drying of the sample is necessary, inducing a longer pre-analytical preparation time, which may however limit its clinical application. As water is a weak scatterer, Raman microspectroscopy is unaffected by aqueous solutions, permitting in vivo and live-cell imaging [52] and particularly amenable to saliva samples. Thus, in this review, this could explain the higher number of Raman studies (n = 16) compared to IR (n = 2).
However, it is difficult to conclude that one technique can perform better than the other, although in the case of IR, the drying process of the sample can introduce chemical and physical inhomogeneities in the sample, due to the so-called "coffee ring" effect, cracking and gelation patterns, that could impact on the reproducibility and sensitivity [53][54][55]. The comparison between these complementary vs. techniques was difficult in this review as the included studies were hardly comparable: (i) Only one study per disease was found for diabetes [30], gastric cancer [33], acute myocardial infarction [31], or periodontitis [27], (ii) the number of included patients is low, and (iii) only one group, Maitra et al., has used both techniques on the same set of saliva samples in the case of esophageal cancer. If the criterium n ≥ 20 is taken into account, only the control, Barrett's esophagus, and OAC groups are potentially useful, with accuracy values of 88.8% and 95.6% for IR and Raman, respectively, using SPA QDA data processing.
In addition, in 2010, Scott et al. obtained quite similar IR results as Maitra et al. [21] for diabetes with an accuracy of 88.2% IR. On the other hand, if one considers the study by Caixeta et al., in 2020 on rats (n = 21), an accuracy of 95.2% and sensitivity of 100% were achieved [56]. The publication of robust larger studies is paramount for a proper comparison of the different techniques. More recently, Parachalil et al., undertook a similar comparison of Raman and ATR-FTIR of plasma using identical sample preparation and analysis protocols, to quantitatively monitor diagnostically relevant changes of glucose [57]. They demonstrated that liquid Raman spectroscopy can perform at least, as well as ATR-FTIR, which requires a drying step.
Based on the adsorption of molecules of the sample on metallic nanoparticles (NPs), the SERS approach has been developed to significantly increase the signal intensity (of the order of 10 5 to 10 6 ), as well as to decrease any sample autofluorescence [58]. By exploiting NPs, such as silver (Ag) or gold (Au), enhanced spectra are generated to allow a better characterization, detection, and identification of biomolecular analytes in a shorter timeframe. This explains the higher number of SERS studies (n = 12/18) in the selected articles. Yet, the intensity and shape of a SERS spectrum strictly depend on the combination of many experimental conditions, and the SERS effect could also be influenced by many factors, like laser power, temperature, solvent, SERS substrate, the ratio between the total number of nanoparticles and the volume of biofluid, but also on the stability of the NPs, the state of the analyte, and exposure time. All these factors could explain the disparities in the accuracy results obtained between 70.5% and 94%, whatever the studied pathologies. These are not significantly very different from values obtained with conventional Raman spectroscopy in the selected articles (from 89.1% to 95.6%).
Furthermore, it is also important to note that the choice of the data pre-processing and processing methods is also an important factor to be considered as they form part of the pre-analytical parameters. In 2014 and 2017, Feng et al. and Lin et al., from the same research group, reported accuracy values of 90.2% for 62 cancer and 30 healthy patients and 70.5% for 170 cancer and 71 healthy individuals using SERS and PCA-LDA-LOOCV for data analysis and the same saliva preparation. These results again show the importance of sample size, but also that data processing with a small number of patients may lead to an overestimation of the performances of the classification model. These models often require splitting into a training set, a validation set, and an independent prediction set, thus requiring a significantly high number of patients initially.
Another important aspect is the ROC curve analysis that is widely considered as the most objective and statistically valid method for evaluating biomarker performance, particularly in the context of clinical test development [16]. However, out of these 18 publications, only eight studies published by three teams used ROC curves with accuracy values ranging from 70.5% to 92%. exploited only three specific wavenumbers corresponding to sialic acid that is considered as a saliva biomarker for oral cancer [59]).
During the past years, single-molecule studies using SERS have been developed with the aim of quantifying the molecule implicated in the studied pathology. Indeed, in 2017, Hernandez et al. demonstrated the ability of SERS to measure concentrations of sialic acid in human saliva and to discriminate healthy from breast cancer patients with an accuracy of 92% [25]. Yet, in 2019, Hernandez et al., used the same technique and same discriminant frequencies, to delineate control patients from those with periodontitis, with an accuracy of 88.8% [26]. Although the good accuracy values indicate a high-performance classifier, these results raise several remarks. For instance, the same three marker bands related to sialic acid identified two different pathologies at two different anatomical sites, in breast cancer and in inflammatory disease of infectious origin (periodontitis). It appears that sialic acid does not represent a discriminating molecule in the evaluation of these two pathologies by SERS.
Furthermore, Stefenelli et al. reported that sialic acid levels are increased in the serum of patients with uterus, lung, colon/rectum, stomach, or prostate cancer. This may be indicative of the presence not only of breast cancer, but also of other types of cancers and/or cancer unrelated severe inflammatory conditions [60]. Sialic acid used alone can be a confounding factor, which confirms the importance of the selection of patients included by a history and collection of precise clinical data. A clinical test based on a number of discriminating bands should become more specific for the sought pathology.
In the majority of the selected studies (n = 15/18), tentative assignments suggest the presence of several bands corresponding to proteins and amino acids (n = 7/18), lipids (n = 5/18), and DNA and/or RNA bases (n = 4/18). Lipid rich features in normal conditions and prominent protein features in tumors and other pathological conditions have been described. Among all these studies, some molecules more often appear to discriminate between healthy and diseased patients. In particular, the amide I band was assigned as a discriminant in 14 out of 18 studies, phenyl vibrations in 10 studies, Amide III, phenylalanine, tyrosine in eight studies, and tryptophan, proline in seven studies. Further, several research works have been reported on the profile of amino acids of human saliva and their use in disease diagnosis [61,62]. The varying levels of these amino acids in saliva, related to the degradation of proteins present in saliva, represent interesting markers for pathology detection [15].
In fact, the access to multiple biomarkers rather than single or few biomarkers would be better for patient care, in particular for screening patients at high risk or at an early stage of the disease [4]. Indeed, some selected studies in this review do not only include healthy or diseased patients, but also patients with precancerous lesions (oral and esophageal; n = 4), benign lesions (breast; n = 1), differentiate between early and late stages of cancer (gastric; n = 1) or superficial or deep damage (periodontal; n = 1). Early detection and prevention are the key strategies to manage cancer and intervenes at an early stage, therefore significantly reducing morbidity and mortality from the malignant disease [32]. There is a continuing effort in the search of new technology that can detect early biochemical signs of malignancy, and therefore, respond to these objectives. Using Raman spectroscopy, Jaychandra's study revealed an efficient classification with an accuracy of 93.1% for saliva samples between normal, precancerous, and oral squamous cell carcinoma patients [14]. The biological components, pyrimidine, glycoproteins especially mucin, oxygenated hemocyanin, and carotenoids showed differences in the three groups of saliva, normal (n = 21), premalignant (oral leukoplakia, oral submucous fibrosis, (n = 87)), and malignant (oral squamous cell carcinoma, (n = 50)). Particularly, the peaks at 752 cm −1 of oxygenated hemocyanin, at 1158 and 1525 cm −1 of carotenoids in saliva show variations between the three groups. In 2017, Rekha et al., also succeeded using Raman spectroscopy, to separate control from precancerous and from malignant lesions of the same cancer type with a performance of 82.4% and 89.1%, respectively, in cross-validated groups [15]. The intensity of the amide band I was higher for malignancies than for pre-malignancies or normal patients, while the opposite was observed for the lipids intensity bands (1128, 1310, and 1742 cm −1 ). However, the accuracy value calculated from a normal group versus premalignant and malignant groups was only 55.4% The premalignant patient group with malignant patients in 25% of cases and 21.4% with control ones while malignant patients group with premalignant ones in 37.5% of cases. The study by Maitra et al. is even more revealing of the problem of choice and of the number of patients per group, and thus, failed to discriminate between premalignant and normal stages [21]. This finding underscores the need for larger-scale studies or for using alternative spectral data processing methods.
The validation of a new clinical diagnostic method, a long-term follow-up, and correlation with gold standard endpoints is primordial. To be accepted in routine practice, sensitivity, specificity, and accuracy values for disease diagnosis need to be exceptional, as is the ability to determine emerging or progressive diseases. The use of saliva-based vs. could become complementary to radiological, biological, histological investigations.
For breast core biopsy, the histopathological examination has reported sensitivity values between 90.1 and 93% and is more operator-dependent, while publications referred to in this review report accuracy values of 78.4 to 92% depending on the analytical methods used. For oral cancers, the diagnostic gold standard is clinical examination followed by a biopsy for histopathological confirmation, with accuracy values ranging from 75 to 90% [63,64].
This range can be explained, among other things, by the method of taking the biopsy, but is also very operator dependent. In this review, for this same pathology, Jaychandran et al. and Rekha et al. obtained accuracy values of 93.1 and 89.1%, respectively, using Raman spectroscopy. [14,15]. In addition, for lung cancers, the initial gold standard is the clinical examination associated with the chest X-ray with an accuracy of 81%, according to Quekel et al. 1999 [63]. In this review, for the diagnosis of lung cancer by salivary VS, the accuracy value obtained was 80% [19].
In summary, the accuracy values obtained by saliva-based vs. in comparison with those of gold-standard diagnostics, are quite comparable for cancer, results also found with other non-cancer pathologies, such as SjS [28,29,65]. Overall, the reported diagnostic measures for saliva-based vs. are promising with existing diagnostic modalities. Although diagnostic accuracy levels are high for VS, it is difficult to replace gold standards. However, the advantages of saliva-based vs. are numerous over, for example, invasive, painful, and complication-risk biopsies, radiography, and CT-induced irradiations. Saliva-based vs. is non-invasive, painless, with minimal or no sample preparation, no labeling, extemporaneous, quick, and easy to perform. In addition, saliva is a complex biofluid reflecting the physiological and pathological state of the body, due to the presence of numerous biocomponents. Saliva-based vs. could become a diagnostic tool complementary to the gold standard by detecting and potentially quantifying metabolites induced by the pathology and its evolution. Therefore, this technology could be used through a large range of clinical situations: screening of patients at risk of the disease or with the disease at an early stage, differential diagnosis of the disease with other conditions, the prognosis of the disease independently of the treatment, prediction of the response to treatment, and monitoring of disease progress [4,66].
Saliva is also comparable to human serum metabolomes in terms of chemical complexity and abundance of metabolites [9,12]. Also, vs. of saliva could be an alternative to vs. using blood or its derived products, without being invasive and stressful for the patient and not requiring a more complex storage mode. In this review, only three reports compared saliva with blood as sampling media. In the case of oral cancers, Jaychandan et al. obtained accuracy values of 91.3% and 78% for saliva and blood, respectively, using Raman spectroscopy [14]. In the case of esophageal cancers, Maitra et al. obtained accuracy values of 95.6%, 82.6%, and 91.3% for saliva, plasma, and serum, respectively [21].
The use of saliva-based vs. is promising in comparison with blood-based vs. and may appear more adequate also in the case of oral cancers, which are in direct contact with this biofluid.

Material and Methods
This study was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guidelines [67].

Research Question/Focused Question
This systematic review aimed at appreciating if Vibrational Spectroscopy could be applied to saliva samples as a disease diagnostic tool.

Search Strategy
Two electronic databases were searched: MEDLINE (PubMed, Public access to Medline) and Scopus. The last research was conducted on 30th May 2020, and any publication to this date was evaluated for inclusion. Keywords sentences for both databases are described in Table 3. Studies obtained for screening were downloaded into the Zotero research tool, duplicates were then identified and excluded from the total list of articles.

Inclusion and Exclusion Criteria/Eligibility Criteria/Study Selection Criteria
The PICOS framework (Population, Intervention, Comparison, Outcome, Study design process with added qualitative search terms) was used to set inclusion and exclusion criteria. Details of eligibility and exclusion criteria of studies are shown in Table 4. Only publications with both a clinically and/or a histopathologically confirmed disease and a control group, each containing more than 20 participants, were included [40,68]. Studies involving non-saliva, non-human, animals, tissue samples, or pooled cells were excluded. Studies had to report characteristic parameters of the diagnostic tool, such as sensitivity, specificity, accuracy, or AUC of ROC curve.

Screening for Eligibility/Inclusion
Articles identified with keywords from MEDLINE-PubMed and Scopus search were screened by two members of the review team based on title and abstract according to inclusion and exclusion criteria. Potential divergences were solved by discussion. A full-text review of the selected articles was then performed by the same two members. The references of these articles were also checked to include any interesting papers that were not picked up during direct MEDLINE-PubMed and Scopus search. If relevant, these new publications were also downloaded and added to the list of full-text articles for assessment. Eligible articles were finally included in the systematic review.

Outcomes and Data Extraction
The outcomes related to the ability of vs. to diagnose diseases by analysis of human saliva included: sensitivity, specificity, accuracy, and AUC values.
Data were extracted from each article and stored in Excel ® (Microsoft, Redmond, Washington, USA) format. When multiple spectral techniques or data analysis techniques were evaluated within a study, data were described based on the most effective technique used. When data were presented for both a training set and test/cross-validation set, data from the test set were presented as these reflect most closely the performance of the test in clinical practice.

Conclusions
Saliva-based vs. appears promising, as it is based on the abilities to objectively fingerprint the biochemical profile underlying the early onset of disease. However, the studies included in this review lack robustness and are hardly comparable, which may further explain the divergence of the results and that a meta-analysis on this subject is currently not feasible. Furthermore, several parameters, such as the use of different substrates, laser frequency, detectors, temperatures, sample solvents, and others (pre-analytical conditions, data processing, etc.) impact on the performances of vs. techniques and could be a hindrance for routine clinical translation in the near future. Moreover, new methodological and technical strategies need to be developed to improve the reproducibility and the "standardization" of VS.
The recent years have evidenced the emergence of high-throughput screening (HTS) VS. techniques capable of providing rapid data collection, quality control, and classification processes. These approaches could indeed be promising for future saliva-based diagnostics approaches. Based on ATR-FTIR platform technology, the ClinSpec Dx™ spectroscopic liquid biopsy (blood), able to identify brain cancer disease at an early stage, is one of the first portable vs. applied in clinical practice. Furthermore, fiberoptic probes and miniaturization of instruments are also interesting for real-time and routine diagnosis. Interestingly, DIAFIR, a medtech company, has developed NASHMIR ® , a non-invasive test based on the metabolic signature of NASH, from a simple drop of serum. This technology combines mid-IR technology with an ATR-based optical fiber biosensor. These portable tools could be particularly adapted to saliva for clinical applications. Concerning the SERS technology, there has been an ongoing development of SERS substrates-especially those involving gold-or silver-NPs in order to increase the reproducibility and enhance significantly molecule detection. The approach will require further refinement and substrate cost reduction for clinical application.
Taken together, although promising, further work is required before saliva-based vs. diagnostics could be confirmed, especially on larger cohorts, and translated to routine clinical use. Efforts should be ongoing to standardize saliva-based VS, taking into account pre-analytical and analytical requisites, prior to its development as a diagnostic/screening test for human diseases.