Salivary Microbiome and Cigarette Smoking: A First of Its Kind Investigation in Jordan

There is accumulating evidence in the biomedical literature suggesting the role of smoking in increasing the risk of oral diseases including some oral cancers. Smoking alters microbial attributes of the oral cavity by decreasing the commensal microbial population and increasing the pathogenic microbes. This study aims to investigate the shift in the salivary microbiota between smokers and non-smokers in Jordan. Our methods relied on high-throughput next-generation sequencing (NGS) experiments for V3-V4 hypervariable regions of the 16S rRNA gene, followed by comprehensive bioinformatics analysis including advanced multidimensional data visualization methods and statistical analysis approaches. Six genera—Streptococcus, Prevotella, Vellionella, Rothia, Neisseria, and Haemophilus—predominated the salivary microbiota of all samples with different percentages suggesting the possibility for the salivary microbiome to restored after quitting smoking. Three genera—Streptococcus, Prevotella, and Veillonella—showed significantly elevated levels among smokers at the expense of Neisseria in non-smokers. In conclusion, smoking has a definite impact on shifting the salivary microbiota in smokers. We can suggest that there is microbial signature at the genera level that can be used to classify smokers and non-smokers by Linear Discriminant Analysis Effect Size (LEfSe) based on the salivary abundance of genera. Proteomics and metabolomics studies are highly recommended to fully understand the effect of bacterial endotoxin release and xenobiotic metabolism on the bacterial interrelationships in the salivary microbiome and how they affect the growth of each other in the saliva of smokers.


Introduction
One of the foremost public health problems affecting the world today is smoking, which represents a preventable cause of premature death [1]. Smoking has a crucial implication in causing many common diseases such as cancer, chronic obstructive pulmonary diseases, and periodontitis [2][3][4]. The impact of smoking tobacco on periodontal health is not a novel concept; however, several studies have shown the significant adverse effect of smoking on the microbiome and cytokines expression of the buccal mucosa [5]. Nevertheless, tobacco in addition to other environmental factors may affect the equilibrium of the oral microbiome by inducing a possible alteration in functional pathways and allowing oral pathogens to grow, which ultimately leads to several diseases [6][7][8]. Furthermore, different oral pathogens harm the development and function of innate and adaptive immunity of the host [9].

DNA Sequencing
Illumina sequencing adapters and dual-index barcodes (Nextera XT indices) were added to the amplicon target via polymerase chain reaction (PCR) amplification. Samples were run on Bioanalyzer, spot-checking for amplicon size. The 16S sequencing (2 × 300 bp PE V3-V4) was performed on Illumina's MiSeq platform (Illumina Inc., San Diego, CA, USA). Paired-end reads from each sample were merged, screened for length, and filtered for quality using DNA Genotek's proprietary 16S pre-processing workflow. The sequence data were submitted to NCBI BioProject under accession number PRJNA579773.

Taxonomic Classification
High-quality sequences were aligned to the curated reference database at 97% similarity using the NINJA-OPS algorithm, version 1.5.1 [20]. At 97% sequence identity, each operational taxonomic unit (OTU) represents a genetically unique group of biological organisms. These OTUs were then assigned a curated taxonomic label based on the SILVA taxonomic database, version 123 [21]. The relative abundance of all taxa at the phylum and genus levels were plotted to visualize broad taxonomic differences between individual samples and between sample groups. Genera found at <1% mean abundance across samples were grouped as "other" for visualization purposes. MicrobiomeAnalyst, a web-based data analysis tool, was chosen to perform Univariate statistical analysis for features at the phyla and genera levels; features were considered significant based on their adjusted cut-off ≤ 0.05. This web-based tool has been reported and is currently hosted by the Xia lab at McGill University, QC, Canada [22].

Rarefaction
All samples were rarefied after taxonomic classification. The cutoff for rarefaction was set at 25,000 classified sequences per sample. However, no sample had less than 25,000 classified sequences, thus, all samples were included in the downstream analysis (see Figures A1 and A2).

Library Preparation and Sequence Amplification
Library preparation was performed with a customized dual index version of Illumina's Nextera XT protocol. The V3-V4 region of the 16S ribosomal subunit was amplified with custom polymerase chain reaction (PCR) primers and sequenced on an Illumina MiSeq.

Data Pre-Processing
Trimmomatic was used to remove sequencing adaptors, and low-quality reads [23]. The FLASH algorithm was used to read merging and automated rejection of low-quality sequences [24]. Quality screening for length and ambiguous bases was performed with proprietary scripts [23].

Data Analysis
We applied a comprehensive bioinformatics analysis approach integrating both robust exploratory data analysis and visualization methods focusing on taxonomic profiling, combined with standard statistical differential analysis approaches such as univariate analysis methods to identify statistically significant features in terms of their abundance between different smokers and non-smokers. We also applied the linear discriminant analysis (LDA) effect size (LEfSe) method to support high-dimensional class comparisons. Our data generation, data preprocessing, and data analysis workflow is shown in Figure 1.
differences between individual samples and between sample groups. Genera found at <1% mean abundance across samples were grouped as "other" for visualization purposes. MicrobiomeAnalyst, a web-based data analysis tool, was chosen to perform Univariate statistical analysis for features at the phyla and genera levels; features were considered significant based on their adjusted cut-off ≤ 0.05. This web-based tool has been reported and is currently hosted by the Xia lab at McGill University, QC, Canada [22].

Rarefaction
All samples were rarefied after taxonomic classification. The cutoff for rarefaction was set at 25,000 classified sequences per sample. However, no sample had less than 25,000 classified sequences, thus, all samples were included in the downstream analysis (see Figures A1 and A2).

Library Preparation and Sequence Amplification
Library preparation was performed with a customized dual index version of Illumina's Nextera XT protocol. The V3-V4 region of the 16S ribosomal subunit was amplified with custom polymerase chain reaction (PCR) primers and sequenced on an Illumina MiSeq.

Data Pre-Processing
Trimmomatic was used to remove sequencing adaptors, and low-quality reads [23]. The FLASH algorithm was used to read merging and automated rejection of low-quality sequences [24]. Quality screening for length and ambiguous bases was performed with proprietary scripts [23].

Data Analysis
We applied a comprehensive bioinformatics analysis approach integrating both robust exploratory data analysis and visualization methods focusing on taxonomic profiling, combined with standard statistical differential analysis approaches such as univariate analysis methods to identify statistically significant features in terms of their abundance between different smokers and non-smokers. We also applied the linear discriminant analysis (LDA) effect size (LEfSe) method to support high-dimensional class comparisons. Our data generation, data preprocessing, and data analysis workflow is shown in Figure 1. Workflow for the salivary microbiome data generation, pre-processing, and analysis. Figure 1. Workflow for the salivary microbiome data generation, pre-processing, and analysis.

Alpha (α) Diversity and Beta (β) Diversity
Taxonomic profiling: exploratory data analysis and visualization consisted of two main methods: (a) alpha diversity analysis for assessing diversity within a bacterial community or sample and (b) beta diversity analysis for determining the differences between microbial communities (i.e., between samples). Three different alpha diversity metrics (Shannon Index, Observed OTUs, Chao1 diversity) were calculated on rarefied OTU tables using the alpha_rarefaction.py workflow in QIIME 1.9.1 [25] and the results were determined by using Analysis of Variance for each alpha diversity metric. Tukey's honestly significant difference (HSD) was applied to the AOV for analysis of variance (ANOVA), it is an R function to determine group-to-group comparisons. The R version 3.3.2 (R Core Team, 2015) was used to perform the statistical analyses of alpha and beta diversity. Additionally, three beta metrics were used (Bray-Curtis, Weighted UniFrac, and Unweighted UniFrac) on the rarefied OTU table using the beta_diversity.py workflow in QIIME 1.9.1. Bray-Curtis dissimilarity was calculated on a species-level summarization of the rarefied OTU table [26]. Principal Coordinates Analysis (PCoA) was applied to each beta diversity distance matrix, using the dudi.pco function from the R made4 package (version 1.48.0). The first two principal coordinates explaining the majority of the difference in data were plotted using R's ggplot2 package, version 2.2.1), with the indicated percentage of variance by each axis explained.

Univariate Analysis
Two standard univariate tests, implemented in MicrobiomeAnalyst [22] from the Xia lab at McGill University in Canada, were applied to test for statistically significant abundant taxa between smokers and non-smokers. The tests were: (a) non-parametric Mann-Whitney test and (b) parametric t-test/ANOVA. Our differential analysis helps in identifying biologically or biochemically meaningful relationships or associations between taxa or features. The analyses were conducted at phylum and genus levels.

LDA Effect Size (LEfSe)
This method is specifically designed for biomarker discovery and explanation in high-dimensional metagenomic data [27]. It incorporates statistical significance with biological consistency (effect size) estimation. It performs a non-parametric factorial Kruskal-Wallis (KW) sum-rank test to identify features with significant differential abundance with regard to experimental factor or class of interest, followed by Linear Discriminant Analysis (LDA) to calculate the effect size of each differentially abundant features. The result consists of all the features, the logarithmic value of the maximum mean among all the groups or classes, and if the features are differentially significant, the group with the highest mean and the logarithmic LDA score (Effect Size). Features are considered to be significant based on their adjusted p-values (i.e., false discovery rate (FDR) values), applying an adjusted p-value cutoff = 0.05.

Demographic Data of the Study Subjects
One hundred subjects were recruited in this study to attain salivary specimens (57 males and 43 females; 51 non-smokers and 49 smokers) to perform 16S rRNA gene sequencing. A total of 1308 OTUs were identified within the salivary microbiota of the 100 subjects who participated in this study. A designed questionnaire (Table 1) was developed to collect demographic data in addition to smoking history. As in Table 1, the smoking status of 100 subjects of the study summarized as the following; 23% female non-smokers, 28% male non-smokers, 20% female smokers, and 29% male smokers. Other descriptive characteristics in Table 1 included data with more than 50% between 21-25 years of age, ethnicity (95% white), education (76% bachelor degree), smoking duration (20% from 6-10 years), number of cigarette (20% from 10-20 cigarettes), and teeth brushing (89% brushing teeth). The difference in age was statistically significant at p-value ≤ 0.05. No statistically significant difference in the number of smokers versus non-smokers was observed at p-value = 0.05. Male smokers (29) Female smokers (20) p-value 0.0245 * (age) 0.6693 (smokers versus non-smokers) n a is the number of samples-the asterisks (*) indicate a statistically significant p-value < 0.05. Note: non-smokers are defined as subjects who never smoked before.

Alpha (α) Diversity Metrics
To understand the variations in the salivary microbiota, alpha diversity metrics were calculated on rarefied OTU tables for comparison groups based on gender and smoking status. The α metrics included Chao1 (community richness), Observed OTUs (community uniqueness), and Shannon (community evenness or entropy) within each comparison group (male smokers, female smokers, male non-smokers, and female non-smoker). Statistically significant differences at adjusted p-value < 0.05 were determined using analysis of variance with alpha diversity as the response variable on the y-axis, and smoking and gender status as crossed predictor variables on the x-axis ( Figure 2). The Chao1 metric analysis showed statistically significant higher richness in smokers versus non-smokers, and interestingly, a statistically significant higher richness among female non-smokers versus male non-smokers ( Figure 2A). The metrics of observed OTUs represented the amount of unique OTUs found in each sample to measure the diversity within samples in an ecosystem. Relatively, there was a significant community uniqueness in non-smokers versus non-smokers, in addition to a considerable community uniqueness in females versus males regardless of the smoking status ( Figure 2B). In the last calculated α metric, Shannon metric considers the number and abundance of OTUs found in a sample together; the more evenly abundant the OTUs present in a sample, the higher the Shannon index. Shannon's analysis showed that smokers and non-smokers were evenly abundant ( Figure  observed OTUs (community uniqueness), and (C) Shannon (community evenness or entropy). Red boxplots represent females and green boxplots represent males. Statistically significant differences were determined using analysis of variance with alpha diversity as the response variable, and smoking status and sex as crossed predictor variables, with Tukey's HSD for group-specific differences. * Statistically significant at adjusted (p-value < 0.05), ** Statistically significant at adjusted (p-value < 0.01) and *** Figure 2. Boxplots of the three studied alpha diversity metrics: (A) Chao1 (community richness), (B) observed OTUs (community uniqueness), and (C) Shannon (community evenness or entropy). Red boxplots represent females and green boxplots represent males. Statistically significant differences were determined using analysis of variance with alpha diversity as the response variable, and smoking status and sex as crossed predictor variables, with Tukey's HSD for group-specific differences. * Statistically significant at adjusted (p-value < 0.05), ** Statistically significant at adjusted (p-value < 0.01) and *** Statistically significant at adjusted (p-value < 0.001).

Beta (β) Diversity Metrics
Principal Coordinates Analysis (PCoA) is a tool used to visualize the profiling of sample clustering based on the similarity to each other, and this helps determine whether changes identified during beta diversity analysis are directed changes or random noise. Plots for distance matrices were generated using three beta diversity metrics, Bray-Curtis, Weighted UniFrac, and Unweighted UniFrac, to highlight the separation of samples per group. Bray-Curtis compositional dissimilarity metric compared the abundance of each OTU between two samples to give a parameter between 0 and 1. This absolute metric quantified the difference in abundance between two communities and was useful for identifying how two microbiome samples are similar. Weighted (quantitative) UniFrac distance is a dissimilarity metric that uses the phylogenetic distribution and the abundance of the OTUs in a sample to calculate the distance between two samples. The unweighted (qualitative) UniFrac distance measures the phylogenetic distribution of the OTUs in a sample but relies on the presence or absence of OTUs rather than their abundance. The results of beta diversity are in Table 2. The Bray-Curtis analysis in Figure 3A showed that there are significant compositional dissimilarities among the different comparison groups, except for the male non-smokers versus female non-smokers. When considering the weighted UniFrac, results in Figure 3B showed significant compositional dissimilarities among the different comparison groups except for the male non-smokers versus female non-smokers and male smokers versus female smokers. When considering the Unweighted UniFrac, results in Figure 3C showed significant compositional dissimilarities among the different comparison groups except for male smokers versus female smokers. In general, most of bacterial OTUs were shared all-over the groups. All performed group comparisons of β-diversity were assessed with a pairwise permutational multivariate ANOVA (PerMANOVA) using the 'pairwise.adonis' function from the vegan R package, which can be found here: https://github.com/pmartinezarbizu/pairwiseAdonis. The pairwise function takes each pair and runs the adonis test independently. Then, it applies a Bonferonni correction with the 'p.adjust' function which divides the original α-values by the number of comparisons performed. In this study, we performed the following six comparisons: Nonsmoker_male versus smoker_male, nonsmoker_male versus nonsmoker_female, nonsmoker_male versus smoker_female, smoker_male versus nonsmoker_female, smoker_male versus smoker_female, and nonsmoker_female versus smoker_female. Therefore, the original p-values (p-value) were multiplied by 6 to get the Bonferroni-adjusted p-values (p-adj) reported in Table 2 Each principal coordinate explains a certain fraction of the variability (indicated by the percentage between brackets on each axis) observed in the data set. The principal coordinates PC1 and PC2 are plotted to create a visual two-dimensional (2D) representation of the multidimensional microbial community compositional differences between tested samples. Each sample is represented by a point and colored based on the smoking status and the sex of tested human subjects: Female non-smokers (red), male non-smokers (green), female smokers (teal green), and male smokers (magenta). The distance between the points represents the similarity of those samples (closer together = more similar). Each principal coordinate explains a certain fraction of the variability (indicated by the percentage between brackets on each axis) observed in the data set. The principal coordinates PC1 and PC2 are plotted to create a visual two-dimensional (2D) representation of the multidimensional microbial community compositional differences between tested samples. Each sample is represented by a point and colored based on the smoking status and the sex of tested human subjects: Female non-smokers (red), male non-smokers (green), female smokers (teal green), and male smokers (magenta). The distance between the points represents the similarity of those samples (closer together = more similar).

Visualization of the Taxonomic Profiling Results
Stacked bar charts (precisely, 100% stacked) were used for the visualization of the taxonomic profiling results generated from abundance data in OTU tables. These charts enabled the display of the composition and the distribution of most abundant bacterial taxa at the phyla ( Figure 4) and genera ( Figure 5) levels across 100 studied salivary samples from smokers and non-smokers. Two samples were dropped out of the analysis because of some missing demographic data about them to avoid any type of uncertainty errors. Both Figures 4 and 5 clearly show that the taxonomic identity and distribution of the salivary microbiota were conserved among all tested samples from smoker and non-smoker human subjects at phyla and genera levels. At the phylum level, our results revealed that that the salivary microbiome was largely predominated by Firmicutes, Bacteriodetes, Proteobacteria, Actinobacteria, Fusobacteria, and Saccharibacteria.

Visualization of the Taxonomic Profiling Results
Stacked bar charts (precisely, 100% stacked) were used for the visualization of the taxonomic profiling results generated from abundance data in OTU tables. These charts enabled the display of the composition and the distribution of most abundant bacterial taxa at the phyla ( Figure 4) and genera ( Figure 5) levels across 100 studied salivary samples from smokers and non-smokers. Two samples were dropped out of the analysis because of some missing demographic data about them to avoid any type of uncertainty errors. Both Figures 4 and 5 clearly show that the taxonomic identity and distribution of the salivary microbiota were conserved among all tested samples from smoker and non-smoker human subjects at phyla and genera levels. At the phylum level, our results revealed that that the salivary microbiome was largely predominated by Firmicutes, Bacteriodetes, Proteobacteria, Actinobacteria, Fusobacteria, and Saccharibacteria.

Visualization of the Taxonomic Profiling Results
Stacked bar charts (precisely, 100% stacked) were used for the visualization of the taxonomic profiling results generated from abundance data in OTU tables. These charts enabled the display of the composition and the distribution of most abundant bacterial taxa at the phyla ( Figure 4) and genera ( Figure 5) levels across 100 studied salivary samples from smokers and non-smokers. Two samples were dropped out of the analysis because of some missing demographic data about them to avoid any type of uncertainty errors. Both Figures 4 and 5 clearly show that the taxonomic identity and distribution of the salivary microbiota were conserved among all tested samples from smoker and non-smoker human subjects at phyla and genera levels. At the phylum level, our results revealed that that the salivary microbiome was largely predominated by Firmicutes, Bacteriodetes, Proteobacteria, Actinobacteria, Fusobacteria, and Saccharibacteria.

Univariate Analysis
To test if there are any statistically significant differences in the abundance of abundant taxa between smokers and non-smokers, we performed both parametric and non-parametric univariate analyses. Non-parametric univariate analyses relied on Mann-Whitney and/or Kruskal-Wallis (which is an extension to Mann-Whitney test to test for more than two samples) tests. Our results revealed that three abundant phyla including Proteobacteria, Firmicutes, and Fusobacteria, showed statistically significant differences in abundance between smokers and non-smokers. The false discovery rate (FDR) values reported in Table 3 were used to assess statistical significance of the univariate analysis results setting the cutoff at 0.05. The asterisks (*) and bold names indicate statistically significant p-values with a significance level set at 0.05. Note: when the asterisk is on the right side of the value, it means significantly higher among smokers; when it is on the left side of the value, it means significantly higher among non-smokers. Tables 3 and 4 for phyla and genera levels subsequently. Features were considered statistically significant based on their FDRs with a significance level set at 0.05. Table 3 showed the phylum Firmicutes is significantly more abundant in smokers versus non-smokers, while Proteobacteria and Fusobacteria were less abundant in the non-smokers' group. The asterisks (*) and bold names indicate statistically significant p-values with a significance level set at 0.05. Note: When the asterisk is on the right side of the value, it means significantly higher among smokers; when it is on the left side of the value, it means significantly higher among non-smokers.

Results of the univariate analyses are reported in
At the genera level, six specific genera consisting of Streptococcus, Prevotella, Vellionella, Rothia, Neisseria, and Haemophilus, dominated the salivary microbiota of all examined samples from both smokers and non-smokers ( Figure 5). Univariate analysis results (Table 4) showed that Streptococcus, Prevotella, and Veillonella were significantly more prevalent in smokers than in non-smokers, whereas Neisseria, a bacterial genus that is part of the healthy flora in the human oral cavity, was significantly lower in smokers versus non-smokers (Figure 6).
At the genera level, six specific genera consisting of Streptococcus, Prevotella, Vellionella, Rothia, Neisseria, and Haemophilus, dominated the salivary microbiota of all examined samples from both smokers and non-smokers ( Figure 5). Univariate analysis results (Table 4) showed that Streptococcus, Prevotella, and Veillonella were significantly more prevalent in smokers than in non-smokers, whereas Neisseria, a bacterial genus that is part of the healthy flora in the human oral cavity, was significantly lower in smokers versus non-smokers ( Figure 6).

LDA Effect Size (LEfSe)
Correlation analysis using LDA Effect Size (LEfSe) was performed at the phyla and genera taxonomy levels, and testing for five experimental factors, namely: class (smoker or non-smoker), sex (female or male), the number of smoking years, the number of cigarettes smoked (Figures 7-9) and teeth-brushing habits (brushing or no brushing). To get meaningful insight from the number of years of smoking and the number of cigarettes smoked, we binned all numbers into four and five classes subsequently (Table 5). For the number of smoking years we binned the numbers into four classes: class 0 for zero years of smoking (i.e., son-smoker), class 1 for smoking years from 1 to less than 5, class 2 for smoking years from 5 to less than 10, and class 3 for more than 10 years. For the number of cigarettes smoked per day, we binned the numbers into five classes: Class 0 for zero number of cigarettes, class 1 for a number of cigarettes from 1 to less than 10, class 2 from 10 to less than 20, class 3 from 20 to less than 30, class 4 more than 30 cigarettes per day. We also applied a t-test/ANOVA to study the effect of teeth brushing on the salivary microbiome (not tabulated). We studied the effects at the phyla and genera levels and identified the phylum Synergistetes (p-value = 0.0015, and FDR = 0.0133) as a statistically significant phylum distinguishing the human subjects that brush and those that do not brush their teeth. At the genus level however, we found there are five genera having to show statistically significant differences in abundance between smokers and non-smokers. The five genera were: (1)       showing statistically significant differences between smokers and non-smokers. LDA scores on the x-axis and genera on the y-axis. The color-coding in the squares of the right side of the plot refers to the cumulative abundance of each genus in each binned group, where red means high cumulative abundance and blue means low cumulative abundance.

Multivariate Analysis by Linear Models (MaAsLin)
In order to identify (or rule out the presence of) any confounding covariates that may contribute to the observed changes in microbiome composition with smoking status, we applied a multivariate statistical approach from MaAsLin [28] to relate smoking status to microbiome structure and function while accounting for potential correlates and confounding factors such as tooth-brushing and gender. Clades were tested for statistically significant associations with demographic metadata of interest by using a novel multivariate algorithm. Each clade was normalized with a variance-stabilizing arcsine square-root transformation and evaluated with a general linear model using the glm package in R. Model selection for sparse data was performed per clade using boosting from gbm package. A multivariate linear model associating all available metadata with each clade independently was boosted, and any metadata selected in at least 1% of these iterations was finally tested for significance in a standard generalized linear model. Within each metadatum/clade association independently, multiple

Multivariate Analysis by Linear Models (MaAsLin)
In order to identify (or rule out the presence of) any confounding covariates that may contribute to the observed changes in microbiome composition with smoking status, we applied a multivariate statistical approach from MaAsLin [28] to relate smoking status to microbiome structure and function while accounting for potential correlates and confounding factors such as tooth-brushing and gender. Clades were tested for statistically significant associations with demographic metadata of interest by using a novel multivariate algorithm. Each clade was normalized with a variance-stabilizing arcsine square-root transformation and evaluated with a general linear model using the glm package in R. Model selection for sparse data was performed per clade using boosting from gbm package. A multivariate linear model associating all available metadata with each clade independently was boosted, and any metadata selected in at least 1% of these iterations was finally tested for significance in a standard generalized linear model. Within each metadatum/clade association independently, multiple comparisons over factor levels were adjusted using a Bonferonni correction; multiple hypothesis tests over all clades and metadata were adjusted to produce a final Benjamini and Hochberg false discovery rate (i.e., Q-value). Significant association was considered below a q-value threshold of 0.25. MaAsLin Analysis did not result in any significant confounding variables that could explain the differences in microbiome composition, on the phyla and genera levels, between smokers and non-smokers. These results confirmed the that microbiome compositional differences between human subjects are attributed to the smoking status identified by univariate analyses.

Discussion
To the best of our knowledge, this paper represents a first of its kind report in Jordan, documenting statistically significant changes in the salivary human microbiome composition between smoker and non-smoker human subjects. Our methods relied on high throughput next-generation sequencing of the 16S rRNA marker gene determined in unstimulated salivary samples. Based on the outcomes of previous studies addressing the adverse effects of smoking on health in general, it was anticipated that pathogenic bacteria might be present at the expense of the commensal flora in smokers. This study showed that alpha and beta diversity displayed intra and inter-individual variations. However, the profile clustering direction for each study group (male smokers, female smokers, male non-smokers, and female non-smoker) was apparent with interesting overlapping Venn diagrams for male and female non-smokers versus male and female smokers. This implies a significant response to smoking regardless of gender, even with slight significant statistical variation between males and females in general. Firmicutes, Proteobacteria, and Bacteroidetes were found to have the highest relative abundance percentage of the community at phylum level in all samples. However, smoking had affected the Firmicutes, Proteobacteria, and Fusobacteria, as Firmicutes was statistically elevated in smokers at the expense of Proteobacteria and Fusobacteria in non-smokers. This implies that smoking has a critical impact on the homeostasis of human salivary microbiome. The biological meaning of these findings was not evident until we performed the analysis at genera level.
At the genus level, Streptococcus, Prevotella, Vellionella, Rothia, Neisseria, and Haemophilus predominated the salivary microbiota of all examined samples. Streptococcus, Prevotella, and Veillonella were the most significantly predominant genera among smokers at the expense of Neisseria that are healthy flora in the human oral cavity, which has been significantly decreased among smokers. The increased levels of Streptococcus and Veillonella and the reduced level of Neisseria were consistent with an extensive study of in a thorough survey of cigarette smoking and oral microbiome among American adults [8]. The reduced level of anaerobic Neisseria in this study is consistent with a human oral microbiota study [29], which might be related to the effect of oxygen deprivation in the oral cavity caused by smoking. The predominance of the anaerobic Veillonella and the facultative anaerobic Streptococcus may explain their success in tolerating the lack of oxygen in the smoking microenvironment. Elevated Prevotella was correlated to oral malodor (halitosis) [30], which can be caused by smoking in this study, which is consistent with a clinical review by Porter and Scully [31].
Since statistically significant taxa do not always convey the biological messages, we want to arrive in to make important discoveries later on. We performed additional statistical tests to build confidence in the prioritized taxa and make sure that these taxa are able to explain the differences between the studies classes of smokers and non-smokers. This, in addition to a subsequent related classification based on the number of years of smoking, the number of cigarettes smoked, and whether the human subjects brush or did not brush their teeth, in terms of teeth brushing the phylum Synergistetes were identified as a statistically significant phylum distinguishing the human subjects that brush and those that do not brush their teeth. Synergistetes has been reported in both periodontal health and disease [32]; thus, a further investigation at the species level of Synergistetes is needed. Our tests relied on LEfSe, which combines standard tests for statistical significance with additional tests encoding biological consistency and effect relevance. Based upon our LEfSe results analysis in Figures 7-9, we can see that there is a microbial signature distinguishing smokers from non-smokers, which is consistent with our univariate analysis except that the abundance of candidate division SR1 was statistically significant ( Table 5). The candidate division SR1 usually described in the literature as unknown or unaffiliated [33]. We did not see such a clear signature distinguishing the different classes of human subjects resulting from the binned numbers of years of smoking and the binned numbers of the number of cigarettes smoked.
We can suggest that there is microbial signature at the genera level can be used to classify smokers and non-smokers by LEfSe based on the salivary abundance of the 15 genera including, but not limited to, Streptococcus, Prevotella, and Veillonella, which are all more abundance in smokers relative to non-smokers, and Neisseria, which is more abundant in non-smokers relative to smokers.
It is worthy to note that infections are believed to be a cause of carcinogenesis, alongside other known risk factors such as smoking tobacco and consuming alcohol. The case for role infections in carcinogenesis is increasingly solidified with evidence that the inflammation bacteria can secrete endotoxins, which in turn might induce DNA damage in mouth epithelial tissue [34,35]. A positive correlation between proinflammatory cytokine levels and commensal bacteria was observed in smokers, but that correlation was not present for non-smokers. A previous study suggested that smoking affects both the composition of the nascent biofilm and the host reaction to this colonization [3]. The elevated abundance of Streptococcus, Prevotella, and Veillonella in this study should be considered in future research to explore the feasibility of being a salivary diagnostic predictor in Jordanian smokers for oral squamous cell carcinomas. For example, a previous report concluded that some specific taxa have a significant correlation with epithelial precursor regions and oral cancer, taxa such as Streptococcus spp., Veillonella, Porphyromonas, Fusobacterium, Prevotella, Actinomyces, Clostridium, Haemophilus, and Enterobacteriaceae [36].
Although we were successful in generating high-quality sequencing data that enabled the subsequent bioinformatics analysis to identify significant compositional differences in the salivary microbiome between smokers and non-smokers, this study has some limitations that we should disclose here. First, targeting only two hypervariable regions, V3-V4 on the 16S rRNA gene, might group closely related taxa into a single taxonomic unit. However, there is sufficient evidence in the biomedical literature indicating that the V3-V4 exploration is adequate to produce a reliable phylogenetic ranking at phyla and genus levels, but usually not at species level. Second, even though 16S rRNA remains the most efficient available approach to study microbial communities, it suffers mosaicism, intra-genomic heterogeneity, and lacks a universal threshold of what is known as sequence identity value [37]. Third, we could not control for other confounders for example lifestyle, exact and complete oral health beyond yes or no teeth brushing, alcohol consumption, drug abuse, and chemical and physical properties of saliva. Last, it is challenging to determine whether the existing microbial colonization is a long-term one or not.

Conclusions
The present investigation systematically combined NGS and bioinformatics to examine the effect of tobacco smoking on the unstimulated salivary microbiome in human subjects by targeting the 16S rRNA gene. The results of this investigation confirmed a proven impact of smoking on the core salivary microbiome between smokers and non-smokers in terms of relative abundance percentages. Six genera-Streptococcus, Prevotella, Vellionella, Rothia, Neisseria, and Haemophilus-predominated the salivary microbiota of all samples with different proportions. Three genera-Streptococcus, Prevotella, and Veillonella-showed significant differences between the comparison groups at the expense of Neisseria. Smoking has a definite impact on shifting the salivary microbiota in smokers. Further studies are needed to explore if dominant genera can be utilized as diagnostic biomarkers to predict or early detect the periodontitis and oral epithelial precursor lesions among smokers in Jordan. Herein, proteomics and metabolomics studies are recommended to fully understand the xenobiotic metabolism and its effect on the interrelationships among bacteria of the salivary microbiome and how they affect the growth of each other.  All the classified sequences that were used here passed the threshold that is represented by the dashed line in Figure A1.