Oral Microbiota Profile Associates with Sugar Intake and Taste Preference Genes

Oral microbiota ecology is influenced by environmental and host conditions, but few studies have evaluated associations between untargeted measures of the entire oral microbiome and potentially relevant environmental and host factors. This study aimed to identify salivary microbiota cluster groups using hierarchical cluster analyses (Wards method) based on 16S rRNA gene amplicon sequencing, and identify lifestyle and host factors which were associated with these groups. Group members (n = 175) were distinctly separated by microbiota profiles and differed in reported sucrose intake and allelic variation in the taste-preference-associated genes TAS1R1 (rs731024) and GNAT3 (rs2074673). Groups with higher sucrose intake were either characterized by a wide panel of species or phylotypes with fewer aciduric species, or by a narrower profile that included documented aciduric- and caries-associated species. The inferred functional profiles of the latter type were dominated by metabolic pathways associated with the carbohydrate metabolism with enrichment of glycosidase functions. In conclusion, this study supported in vivo associations between sugar intake and oral microbiota ecology, but it also found evidence for a variable microbiota response to sugar, highlighting the importance of modifying host factors and microbes beyond the commonly targeted acidogenic and acid-tolerant species. The results should be confirmed under controlled settings with comprehensive phenotypic and genotypic data.


Introduction
The gastro-intestinal canal is heavily colonized, with over 700 bacterial species or unnamed phylotypes in the oral cavity alone [1]. The bacteria form niche-specific ecosystems with patterns of bacterial cohabitation due to influences from factors like transmission, host receptor availability, pH, access of nutrients, oxygen and symbiosis or communication with nearby species [2,3]. Generally, these ecosystems are in equilibrium but shifts may occur due to medical or lifestyle changes [4][5][6].
Many studies report associations between host factors, including saliva quality, pH regulation, and genetics, and single or panels of bacteria [7][8][9], but few studies have evaluated the associations between host traits and the oral microbiota in an untargeted manner [10][11][12]. Carbohydrate, especially high sugar, intake is reported to correlate with the enrichment of acidogenic

16S rRNA Gene Amplicon Generation and Sequencing
Bacterial 16S rRNA gene amplicons were generated from the variable regions v1-v3 (27F-forward AGAGTTTGATCATGGCTCAG and 530R-reverse GTATTACCGCGGCTGCTG primers), and v3-v4 (357F-forward TACGGGAGGCAGCAG and 800R-reverse CCAGGGTATCTAATCC primers) from saliva and a mock community DNA. Library preparation, sequencing on the Illumina Miseq platform, and sequence demultiplexing were done at Eurofins Genomics (Ebersberg, Germany) according to their standard protocols. The company provided demultiplexed FASTQ-files, which were imported to QIIME2 [26], and DADA2 was used for denoising, pair end read fusion, chimeric sequence removal [27], and the identification of 100% identical amplicon sequence variants (ASVs) per sample [28]. Default parameters in DADA2 were used with left trimming of 13 bp for both forward and reversed reads, right trimming at 230 bp for the reversed sequences and 268 bp for the forward sequences. ASVs with >1 read were blasted against the extended Human Oral Microbiome Database (eHOMD, http://www.homd.org) for taxonomic annotation [29]. In the eHOMD blast, only ASVs with at least 98.5% identity with a named species or unnamed phylotype in eHOMD were retained, and those with the same HMT number were aggregated. The negative control contained <50 sequences and the positive control mock species were correct for representative sequences with 25 or more reads. Therefore, all comparisons were based on taxa with at least 25 reads. For simplicity, all taxa are referred to as species in the text. Sequencing failed for one saliva sample, leaving 175 samples in the final analyses.

Diet Recording
The participants reported their diet intake in a food frequency questionnaire (FFQ, http://www.matval.se). The FFQ is a semi-quantitative questionnaire, with questions on 93 food items/food aggregates selected to represent the habitual intake in Sweden and includes questions on alcoholic beverages. Participants were asked to report their typical intake in the last year. Intakes were reported on an increasing, nine-level scale, from never to four or more times a day. Portion sizes were estimated from photographs showing four portion sizes of staple foods (potatoes, rice, and pasta), meat/fish and vegetables, or standard food weights, such as for an egg or apple. Sucrose intake was estimated from nine questions, i.e., (i) fruit soup with or without a thickening agent, (ii) buns and biscuits, (iii) cookies and cakes, (iv) marmalade and jam, (v) ice cream, (vi) sodas, (vii) syrups, (viii) sugar, (ix) sweets including candies and chocolate. For sugar, the sum of mono and disaccharides (excluding lactose) from fruits, berries, vegetables, juices and honey was added. The FFQ included one question on how often the respondent ate or drank sweet products without sugar, i.e., with a sugar substitute.
Intake of energy and energy-providing nutrients was calculated by multiplying intake frequencies by portion sizes and weighting by the energy/nutrient contents in the food composition database at the National Food Administration (https://www.livsmedelsverket.se/en/food-and-content/naringsamnen).
A Healthy Diet Score that reflects healthy eating habits was calculated as previously described [30]. Briefly, frequency of intake per day was calculated for eight food/beverage groups. Favourable food groups included fish, fruits (except juices), vegetables (except potatoes) and whole grains. Unfavourable food/beverage groups included red or processed meats, desserts and sweets, sugar-sweetened beverages and fried potatoes. Intake frequencies were ranked within each sex in ascending quartile ranks for favourable foods/beverage groups, and in descending quartile ranks for unfavourable foods/beverage groups. The sum of all quartile ranks represents the Healthy Diet Score, with a minimum of zero and a maximum of 24, and with higher ranks indicating healthier food and beverage choices.
The relative validity of FFQ-derived intakes has been estimated against 24 h dietary records and/or biological markers [31][32][33][34]. For sucrose reliability, measured as correlations between registrations done one year apart, results were 0.80 for men and 0.75 for women, and the relative validity against 10 repeated 24 h recalls were 0.69 and 0.62 for sweet foods for men and women, respectively [31].
To reduce potential recording bias, energy-providing nutrients were expressed as energy standardized values (E%) and 11 participants with unrealistic reported energy intakes were excluded from analyses involving diet assessments. This was based on food intake level (FIL) scores calculated to estimate energy intake relative to minimal energy needs [35].

Recording of Medical and Other Lifestyle Conditions
Information on health status, oral hygiene, tobacco use, alcohol intake and most recent antibiotic exposure was obtained from a questionnaire. Dental caries was scored from visual and radiographic examinations in the dentist's office with optimal lightning. Tooth surfaces that were sound according to ICDAS [36], score 0, or had caries in the enamel (e) according to ICDAS scores =1 and 2, or had caries in the enamel with a localized breakdown with or without dentine involvement (D) according to ICDAS score ≥3, or with a filling (F), were recorded. The total numbers of decayed and filled tooth surfaces (DeFS) were calculated. The M component was not considered because tooth loss occurred for orthodontic reasons or severe hypomineralization in this study group.

Prediction of Functional Potential from the 16S rRNA Gene Information
Obtained representative ASVs were used to search for the potential molecular functions of the saliva microbiome using the 16S rRNA gene as marker gene, Phylogenetic Investigation of Communities by Reconstruction of Unobserved States (PICRUSt2) [37] and the Molecular Functions by orthology annotation (Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology database, KO, https://www.genome.jp/kegg/kegg1.html) [38,39]. The steps included (i) creating a closed reference feature table in QIIME2 using the trained Greengenes dataset gg-13-8-99-nb-classifier.qza (Greengenes http://greengenes.lbl.gov, [40], (ii) qiime diversity core-metrics analysis in QIIME2, and (iii) export of pathway abundances and the feature table for down-stream analyses in KO and multivariate modelling. Group separation was tested by Euclidean distances in permutational multivariate analysis of variance (PERMANOVA), Bonferroni-corrected p-values and 9999 permutations. Follow-up functional enrichment analyses were done using the STRING database (version 10.5, https://string-db. org/) [41]. The same procedure was also done for eHOMD-defined species.

Data Handling and Statistical Analyses
Unsupervised hierarchical clustering (Ward's method) was used to classify the 175 participants by the presence (or not) of ASVs and presence (or not) of species from eHOMD identification. The number of ASVs were standardized to the level of the sample with the fewest reads after DADA2 filtering (38,293 reads), and lowest per-sample abundance of eHOMD taxa, and transformed by inverse hyperbolic sine transformation, which defines log values, including for zero-values, which are prevalent for many ASVs and some species.
Continuous phenotypic variables were presented as means with 95% confidence interval limits (CI), and when adjusted for sex, age and body mass index (BMI) using generalized linear modelling. Differences were tested with non-parametric tests. For discrete measures, the percentages in groups were estimated and proportion differences tested with Chi 2 test. SPSS version 25 (IBM Corporation, Armonk, NY, USA) was used for these analyses. All tests were controlled by the Benjamini and Hochberg procedure, and those with p-values <0.05 yielding an FDR of 5% are presented.
Alpha-and beta-diversities with associated PERMANOVA tests and ASV proportions for bar charts were calculated in QIIME 2.
Multivariate modelling was performed by partial least-square regressions (PLS) (SIMCA P+ version 15.0, Sartrius Stedim Data Analytics AB, Malmö, Sweden). PLS identifies directions in an X-swarm that characterize X well and are related to Y. The software scales all variables to unit variance, and performs a K-fold cross-validation where 1/7 th of the data are systematically kept out to fit a model and predict it from the remaining data (Q 2 -values). The results are displayed in scatter plots illustrating the separation of observations, and loading column plots displaying the mean correlation coefficient, with 95% CI between each predictor and the outcome variable. CIs that do not include zero are considered statistically significant.
The linear discriminant analysis effect size (LEfSe) method [42] was used to identify taxa effect size. Species that were shared between groups were identified in a Venn diagram [43].

Study Group
The study included a similar number of males and females. The mean BMI was in the normal range but with variation, and approximately one in five participants had a BMI >25. Sex-adjusted mean energy intake (after exclusion of the 11 participants with unrealistic energy intake) was 1856 kcal/day, with 40.3% coming from carbohydrates (E% COH) ( Table 1). Starch was the dominant type of dietary carbohydrate (117 g/day) followed by sucrose (27 g/day), monosaccharides (19 g/day), and other sugars (23 g/day) ( Figure 1A). Reported median intake of sucrose corresponded to 5.8% of the total energy intake ( Figure 1B).

Overall Microbiota Assessment
For the 175 saliva samples, 14,317,039 sequences passed denoising and the removal of potential chimera with 1,685,662 reads referring to the v1-v3 and 12,631,377 sequences to the v3-v4 section. Under the present conditions, the v3-v4 sequencing yielded significantly more sequences per sample, better recognition of the mock species and higher diversity (Table S1). Further analyses were based on v3-v4 sequences.

Cluster Classifications Based on ASV Pattern
Four groups were classified from the hierarchical clustering of dichotomous ASVs (clusters ASV1-ASV4 (Figure 2A). The relative proportions of the 13 identified phyla and top 40 genera in the respective cluster group are presented in Figure 2B,C, respectively.
The ASV cluster groups differed significantly in reported sugar (p = 0.001) and sucrose (p = 0.008) intake and saliva flow rate (p = 0.031) ( Table 2). Further, allelic variation in the TAS1R1 (rs731024, p = 0.003), and the GNAT3 (rs2074673, p = 0.007 and rs11760281, p = 0.010) genes differed significantly between the cluster groups ( Table 2). None of the other gene variants or BMI or caries scores differed between cluster groups. Alpha diversity was lowest for the group with the highest reported sugar intake (ASV1) when estimated as ASVs in rarefaction curves ( Figure 3A), Shannon index ( Figure 3B), and evenness diversities ( Figure 3C), with statistically significant differences among (p-values from 1.5 × 10 -3 to 1.2x10 -13 ) and between groups (q-values from 1.3 × 10 -2 to 1.7 × 10 -10 ). Faith phylogenetic diversity index was less divergent between groups but still differed significantly (7.8 × 10 -5 , Figure 3D). Beta diversity differed significantly between all cluster groups (PERMANOVA p-value among groups and q-values between groups 0.001) with associated separation displayed in the Jaccard principal coordinates analysis (PCoA) plots ( Figure 3E). Table 2. Pheno-and genotypical characteristics for hierarchical cluster groups based on amplicon sequence variants (ASVs), and taxa aggregates based on identification in the eHOMD database. Underlying dendrograms are shown in Figure 2 and Figure 4, respectively. For eHOMD, only variables that differed significantly between the clusters are shown.

Predicted Functions in ASV Cluster Groups
Linkage of the 6171 retained 16S rRNA ASVs to pathways by KEGG functional orthology annotation predicted 10,543 KEGG orthologies. Multivariate analysis suggested a separation of the predicted function of cluster ASV1 (highest sucrose intake) from cluster ASV2 (lowest sucrose intake) and ASV3 (both p < 0.0006). The microbiota of cluster ASV1 (compared to both cluster ASV2 and ASV3) displayed several KEGG orthology related to starch and sucrose metabolism, and a functional enrichment of glycosidase capacity according to the STRING processing, indicating the ability to hydrolase glyosidic bonds in carbohydrates (Table S2).

Factors Associated with Belonging to the Species Level (eHOMD) Cluster Groups
PLS identified eHOMD species, host-related traits and lifestyle factors, and mutans streptococci and lactobacilli by influential culture, and classified them into the four cluster groups. Data for H1, H2 and H4 compared to H3 are shown in Table 3 and H3 from the PLS comparison including all four cluster groups simultaneously in Table S4. Table 3. Species significantly influential for being classified into cluster H1, H2 or H4, respectively, when compared with cluster H3. Species in bold are known to be acidogenic and acid-tolerant species. Data for cluster H3 (lowest sucrose intake) against the three other clusters simultaneously are found in Table S4.
Cluster H1 (n = 70) Cluster H2 (n = 33) Cluster H4 (n = 24) model R 2 = 50%, Q 2 = 50% model R 2 = 84%, Q 2 = 63% model R 2 = 80, Q 2 = 53 sucrose intake = 6.5 E% sucrose intake 6.4 E% sucrose intake 6.0 E% DeFS = 5.  Taxa in Cluster H1 (high sucrose intake) compared with H3 (lowest sugar intake) from LEfSe analysis is further illustrated in a clade diagram ( Figure 5A) with effect sizes in an LDA histogram ( Figure 5B). Species in the phyla Streptococcus and Actinomyces were associated with H1. At the species level, largely the same species as identified by PLS appeared. The strongest effect sizes (LDA scores of about 4) were seen for species in the genera Streptococcus and Prevotella ( Figure 5B). Sugar and sucrose intakes were influential for being in clusters H1, H2 and H4 with comparably higher sucrose intake. Intake of milk was influential for clusters H1 and H4 (Table 2). A wide panel of species, including species in Actinomyces and Selenomonas, Bifidobacterium dentium and Leptotrichia sp. HMT498, were influential for being in cluster H2, whereas being in cluster H1 (with slightly higher mean sucrose intake) was determined by significantly fewer species, which included two species in Actinomyces, Bifidobacterium longum, Scardovia wiggsiae, Streptococcus mutans, two species in Veillonella and lactobacilli and mutans streptococci by culture. Cluster H4 also had fewer influential species than cluster H2 and included the cariogenic Streptococcus sobrinus.
High scores in the Healthy Diet index and for protein-related food intake were associated with membership in Cluster H3, which was the cluster with the lowest reported sucrose intake (Table S4).

Predicted Functions in Species Level (eHOMD) Cluster Groups
Predicted functions in the microbiota associated with the four HOMD-based cluster groups separated them (p < 0.0001), with cluster H3 (lowest sucrose intake) separated from the two groups with the highest sucrose intakes, i.e., H1 (p = 0.0012) and H2 (p = 0.027). Similar to the sequence variant-based analyses, cluster H1 displayed several KEGG functional orthology related to starch and sucrose metabolism ( Figure 5C, Table S5) and a functional enrichment of the capacity to hydrolase glyosidic bonds ( Figure 5D).

Discussion
This study identified groups of people defined by similar oral microbiota profiles, targeted as dichotomized amplicon species variants (ASVs) or named species and unnamed phylotypes, and then explored lifestyle and host factors which were associated with these groups. The most striking difference between groups classified by unsupervised cluster analysis was for reported sucrose intake but associations were also found for total sugar intake, saliva flow rate and allelic variations in two taste-perception-associated genes. The group who reported the highest sucrose intake had the lowest species richness, but the microbiota in clusters with a higher sucrose intake were either defined by a pamphlet of species with no clear association with carbohydrate metabolic pathways or by a microbiota enriched for acidogenic and acid-tolerant species and carbohydrate degradation metabolic pathways. The latter species included several species that were suggested to be associated with the caries disease.
The prevailing ecological plaque hypothesis [44,45] describes an ecological shift (collapse) towards the enrichment of acidogenic and acid-tolerant species in low pH environments, such as after sugar consumption [13,16,46]. The present findings are consistent with this hypothesis, as several such species were found to be enriched in two of the three cluster groups with the highest sucrose intake. Among these were species in Actinomyces, Bifidobacterium and Veillonella, and S. wiggsiae, S. mutans, and S. sobrinus with documented relevance for dental caries in small children or adults [14,44,47]. In fact, although not statistically assured, these two cluster groups tended to have had the highest mean caries experience. Conversely, the study identified a third cluster group with similar sugar intake but not characterized by enrichment of the most acidogenic and acid-tolerant species, but rather a large number of taxa including several species in Capnocytophaga, Leptotrichia, Prevotella and Streptococcus. Thus, the effects of sugar on the salivary microbiota are potentially modified by host-related factors such as innate immunity peptides and buffer functions, but there are other possible explanations, for example, if the microbiota response to sugar is regulated by inter-bacteria communication or if there was differential measurement error in the reporting of sucrose intake in the different cluster groups. Overall, the present findings are well in line with our previous finding in the same population, where S. wiggsiae, S. mutans, B. longum, Leptotrichia sp. HMT498, and Selenomonas spp. in tooth biofilm samples discriminated caries-affected from caries-free adolescents [48], and support previous experimental findings in vitro [13] and in animals [49]. A direct comparison with the results from Anderson et al. 2018 [4] may not be appropriate as they used a different 16S rRNA segment and lower similarity requirement for taxa determination.
The strength of the present study is the comparably large sample and that the study group likely represents the underlying population with minimal selection bias since the attendance rate to the public dental clinics is very high, as care is free in the targeted age group and that participants who were enrolled consecutively agreed to participate. The major weakness relates to the inherent difficulties in measuring diet using questionnaires [50]. Monitoring sugar intake may be even more challenging in the dentist´s office, since patients are aware that sugar is bad for the teeth. This may be a source of systematic measurement error which may be manifested in the low self-reported sugar intake and the limited variation compared to other studies in the country [51]. However, energy-adjustment, excluding participants who reported the most implausible energy intakes and use of robust statistical methods based on ranking rather than reported consumption, was performed to reduce the impact of this error, but it cannot completely exclude a nullifying bias from the under-reporting of sucrose intake.
Factors influential for being classified in a cluster group were assessed using multivariate PLS regression, which has the advantage of accepting correlated variables. These models were strong, but it should be noted that this is partly an effect of the fact that the cluster groups were formed by cluster analysis of bacterial taxa. However, since clustering was unsupervised, the association with sucrose intake and which specific bacteria were present in each cluster is not a consequence of clustering per se and does not prevent the comparison of phenotypic characteristics of people in the cluster groups or species that were influential for being in the groups.
An alternative model to the idea that sugar intake and, indirectly, gene variations, drives oral microbiota transformation would be that the oral microbiota per se influences the taste phenotype of the host and, accordingly, the individual's food selection, as previously suggested [20][21][22][23]. In the present study, sucrose intake was the strongest explanatory diet factor for microbiota clustering, and supported by several in vitro and a few in vivo studies [4,7,52]. We suggest that the variant oral microbiota ecologies (here, the clusters) are a consequence of low pH due to sugar intake [42,45,52]. Thus, we suggest that the clinical implication of the present findings is that even a moderate difference in sugar intake may affect the oral microbiota into a more cariogenic composition, which may be reverted by sugar restriction. However, we cannot fully exclude that the alternative hypothesis is of relevance too.
We, and others, have reported that genetic variants located in or near genes related to taste perception and preference were associated with diet preference and caries [33], but studies have not evaluated variations in such genes in relation to the concerted oral microbiome. In this study, allelic variations in the TAS1R1 and GNAT3 genes were significantly associated with the pattern of oral bacteria at the group level. The TAS1R1 gene is involved in bitter taste sensations with known associations with sweet food preferences [53], whereas the GNAT3 gene, encoding the gustducin alpha-3 chain protein, can be involved in sweet, bitter or umami taste sensation depending on heterodimeric formations with various TAS proteins [54]. Besides taste perception, the gustducin protein functions as a sugar sensor in the gut with suggested effects on sugar absorption and metabolic syndrome [55]. Thus, hypothetically, the GNAT3 gene may affect the oral microbiota via dietary habits or secondary effects from diet-related metabolic disorders [56].
The present study also found that under the present conditions sequencing of the v1-v3 amplicons of the 16S rDNA gene yielded significantly fewer sequences and poorer recognition of the mock species than the v3-v4 amplicons. This is in line with previous reports [57,58] and stresses the importance of taking the sequencing protocol into consideration when data are compared from different studies.

Conclusions
In conclusion, this study supported in vivo associations between sugar intake and oral microbiota ecology, but it also confirmed variations in the apparent microbiota response to sugar, highlighting the importance of considering microbes beyond the commonly studied acidogenic and acid-tolerant species and modifying host factors. The result, need to be confirmed, preferentially under controlled forms, such as in a randomized clinical trial design and with the monitoring of host pheno-and genotypical characteristics and a potential biomarker for diet, such as untargeted metabolomics.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-6643/12/3/681/s1. Table S1: Taxa detected by Illumina MISeq sequencing of the v1-v3 versus the v3-v4 variable regions of 16S rDNA. Identities are from eHOMD blasting at 98.5% identity and cut-off at 25 reads per species. Table S2: Partial least square regression analysis (PLS-DA) was used to identify bacterial functions differentiating between ASV cluster 2, with the lowest sucrose intake (sugar 13.6 E%, sucrose 5.3E%), to that of cluster ASV 1, 3, and 4 with higher sucrose intakes. Bacterial functions were predicted by Phylogenetic Investigation of Communities by Reconstruction of Unobserved States (PICRUSt 2) in QIIME 2 and Kyoto Encyclopedia of Genes and Genomes (KEGG orthology, KO). Bold labelled functions are linked to Carbohydrate metabolism (e.g., starch, sucrose, and glycolysis). Table S3: Prevalence of 372 species identified from the eHOMD database at 98.5% identity and represented by at least 25 reads. Data are for cluster groups from hierarchical clustering of dichotomous variables. p-values for the prevalences are from Chi-square analyses. Species and p-values in bold are statistically significant at FDR 0.05. Table S4: Bacteria and other host-associated variables with a statistically significant association (PLS correlation coefficients with 95% CI that does not include 0) in PLS modelling. Cluster H3 (the cluster with the lowest reported sucrose intake) was modelled against cluster H1, H2, and H4 individually or together. Sucrose intake (E%) and caries (DeFS) are group mean values standardized for sex, age and BMI and sex and age, respectively. Variables are ordered by taxa names and other host factors at the end. Species in bold are species that commonly are referred to as acidophilic species. Table S5: Multivariate analysis (PLS-DA) was used to identify bacterial functions differentiating between cluster H3, with the lowest sucrose intake) to that of clusters H1, H2, and H4 with higher sucrose intake. Phylogenetic Investigation of Communities by Reconstruction of Unobserved States (PICRUSt 2) in QIIME 2 and Kyoto Encyclopedia of Genes and Genomes (KEGG orthology, KO) predicted bacterial functions. Functions in bold are linked to carbohydrate metabolism (e.g., starch, sucrose, and glycolysis).