1. Introduction
The upper gastrointestinal (GI) tract is the initial portion of the digestive system, encompassing the mouth (oral cavity), esophagus, stomach, and the first part of the small intestine [
1]. Beyond serving as a conduit for food passage, the upper GI tract plays a pivotal role in digestion, nutrient absorption, mucosal immune regulation, and host–microbe interactions [
2]. GI axis is characterized by marked physicochemical gradients, including variations in oxygen tension, pH, mucus composition, nutrient availability, and peristaltic flow, which may consist of various microbial communities [
2]. The human microbiome exerts diverse physiological functions essential for host health, including nutrient metabolism, vitamin synthesis, maintenance of epithelial barrier integrity, immune maturation, and modulation of inflammatory responses [
3]. Through dynamic host–microbe interactions, microbial communities contribute to metabolic homeostasis and immune balance across mucosal surfaces [
4]. Recent studies indicate that oral microbial dysbiosis is associated with both oral diseases (e.g., periodontitis and dental caries) and upper gastrointestinal disorders, including gastroesophageal reflux disease, gastritis, and upper GI malignancies [
5]. The oral cavity continuously releases microorganisms through saliva and dental plaque, and these microbes are repeatedly delivered to the esophageal and gastric mucosa via swallowing.
Microbial crosstalk among organs is increasingly recognized as an important indicator of human health and systemic homeostasis [
6]. Emerging evidence suggests that the upper GI microbiome plays a significant role in the pathophysiology of esophageal and gastric diseases, including reflux esophagitis, Barrett’s esophagus, chronic gastritis, and gastric cancer [
7]. In particular, the gastric lumen represents an extreme and highly selective niche due to acid exposure [
8], but its microbial architecture relative to gastric mucosal communities has not been comprehensively evaluated. Recent meta-analyses demonstrate that periodontal disease is associated with an approximately 17% increased risk of gastric adenocarcinoma, suggesting that chronic oral inflammation and sustained microbial exposure may contribute to gastric carcinogenesis [
9]. Nevertheless, despite growing recognition of inter-organ microbial interactions, it remains unclear how the oral microbiome correlates with microbial communities in other organs. Multiple studies have detected oral-associated taxa in the esophagus and stomach, yet whether these organisms undergo functional restructuring remains incompletely understood.
In the present study, we comprehensively characterized microbial communities across the oral cavity, esophagus, gastric mucosa, and gastric juice using publicly available 16S rRNA sequencing data reported by She et al. [
10]. While studies of regional microbial variation have revealed important physiological roles of the microbiome, only a limited number of investigations have addressed the heterogeneity of microbial communities across different anatomical sites within the same individual [
10]. She et al. advanced this field by profiling the microbiome from lumen mucosa, gastric juice, and surface samples from 53 sites of 7 surface organs (oral cavity, stomach, esophagus, small intestine, appendix, large intestine, and skin) were collected from 33 subjects to give a total of 1608 samples [
10]. Although She et al. provided a comprehensive body-wide microbiome atlas across multiple surface organs, their analysis was designed primarily to describe broad inter-organ and intra-organ biogeographical patterns [
10]. In the present study, we re-analyzed the same public dataset with a more focused emphasis on the oral–upper gastrointestinal axis, including the oral cavity, esophagus, gastric mucosa, and gastric juice. This focused framework allowed us to examine taxonomic continuity, site-specific divergence, and niche-associated microbial enrichment across anatomically connected upper GI compartments. In addition, rather than using PacBio full-length 16S rRNA data as the main quantitative abundance dataset, we used PacBio-derived full-length ASVs to construct an optimized reference database for taxonomic assignment of Illumina V3–V4 amplicon data. Therefore, the novelty of this study lies in its focused oral–upper GI ecological framework and its PacBio-informed taxonomic classification strategy for V3–V4-based microbiome profiling.
By integrating alpha- and beta-diversity analyses with taxonomic profiling at the phylum, genus, and species levels, we aimed to characterize microbial community variation along the oral–upper gastrointestinal axis. Specifically, this study sought to (1) assess taxonomic overlap and ecological continuity between oral and upper gastrointestinal communities, (2) identify site-specific taxonomic signatures and abundance patterns across anatomical regions, and (3) determine the discriminative taxa associated with organ-specific community structures. Through this ecological framework, we aimed to provide a comprehensive view of the continuity and divergence of microbial communities across the upper gastrointestinal tract.
2. Materials and Methods
2.1. Data Retrieval
The raw sequencing data have been retrieved from NCBI GenBank BioProject ID PRJNA1049979. The samples were collected from 33 subjects from different anatomical sites. For each subject, samples were collected from multiple anatomical sites along the oral–upper gastrointestinal axis. The analyzed dataset consisted of 198 oral samples from six oral sites, 110 esophageal samples from four esophageal sites, 117 gastric mucosal samples from four stomach sites, and 33 gastric juice samples, resulting in a total of 458 samples. The oral sites included the left buccal mucosa (LC, n = 33), right buccal mucosa (RC, n = 33), upper hard palate (UM, n = 33), lower hard palate (LM, n = 33), upper lip (UL, n = 33), and lower lip (LL, n = 33). The esophageal sites included the thoracic esophagus (ESOM, n = 32), abdominal esophagus (ZA1, n = 25), zigzag line (Z, n = 29), and cardiac orifice (ZB1, n = 24). The gastric mucosal sites included the fundus (SF, n = 33), body (SB, n = 29), antrum (SA, n = 29), and pylorus (PY, n = 26), and gastric juice (GJ, n = 33) was analyzed as a separate anatomical group. The Illumina V3–V4 16S rRNA sequencing dataset was used as the primary dataset for diversity analysis, taxonomic profiling, and differential abundance analysis. PacBio full-length 16S rRNA sequencing data were used to construct an optimized, study-specific reference database.
2.2. Reference Database Construction and Taxonomic Assignment
PacBio full-length 16S rRNA data served as a reference-enhancement resource rather than as the primary dataset for community abundance comparisons. For PacBio full-length 16S rRNA data, primers were removed and reads were oriented using Divisive Amplicon Denoising Algorithm 2 (DADA2) (v1.26.0). Reads were filtered according to quality and length criteria, denoised using the DADA2 PacBio error model, and chimeric sequences were removed to generate high-resolution full-length amplicon sequence variants (ASVs). The PacBio-derived ASVs were utilized to construct an optimized, study-specific reference database [
11]. To obtain representative taxa and optimize database size, phylogenetic trees were constructed using align-to-tree-mafft-fasttree, and terminal branches were trimmed at a threshold of 0.0005 using the drop.tip function in the ape package. Taxonomic assignment of these ASVs was conducted through a hierarchical BLAST (v2.16.0) search against the NCBI reference database. Sequences with a homology hit >97% were assigned to the Species level. ASVs failing this threshold were further blasted against the SILVA database. Taxonomic nomenclature was defined based on homology thresholds: >97% for Species, >95% for Genus, and >90% for Family. The optimized PacBio-derived ASV sequences and their taxonomy were then used to train a Naive Bayes classifier for taxonomic assignment of Illumina V3–V4 ASVs. It should be noted that 16S rRNA gene sequencing primarily characterizes bacterial communities; therefore, the ecological patterns described in this study mainly reflect bacterial community structure and composition rather than the full diversity of microorganisms.
2.3. 16S rRNA Sequence Processing and QIIME2 Workflow
For Illumina V3–V4 16S rRNA data, paired-end reads were processed in QIIME2 (v2023.9.0) using DADA2 to generate ASVs. The Illumina ASVs were classified using the PacBio-informed optimized reference classifier. The resulting Illumina feature table, taxonomic classification table, rooted phylogenetic tree, and metadata were imported into phyloseq for downstream analyses. Therefore, all diversity analyses, taxonomic abundance comparisons, Permutational Multivariate Analysis of Variance (PERMANOVA), LEfSe, and ALDEx2 analyses were performed using the Illumina V3–V4 abundance table with taxonomy assigned using the optimized PacBio-informed reference database.
2.4. Bioinformatics Processing and Data Integration
Sequence data were processed and integrated using the phyloseq (v1.46.0) package in R. Feature tables, taxonomic classifications, and rooted phylogenetic trees were imported from QIIME2 artifacts using the qiime2R package. To ensure data quality, samples with fewer than 1000 total reads were excluded from downstream analysis. Five samples were removed after quality control. The final number of analyzed samples remained 453. Detailed preprocessing statistics, including input, filtered, denoised, merged, and non-chimeric read counts for each anatomical group and sampling site, are provided in
Supplementary Table S1. Rarefaction curves were generated to assess whether sequencing depth was sufficient to capture ASV richness across anatomical sources. Taxonomic filtering was performed to remove sequences unclassified at the Phylum level. For phylogenetic visualizations, a circular tree was constructed using the top 2000 taxa (ranked by relative abundance) and annotated with Phylum-level classifications and a heatmap of mean relative abundances across anatomical sources.
2.5. Alpha and Beta Diversity Analysis
Microbial community richness and evenness were assessed using multiple alpha diversity indices, including Chao1 and the Shannon index and beta diversity was evaluated based on Bray–Curtis distance matrices. Because multiple samples were obtained from the same subjects, repeated sampling was accounted for in the statistical analyses. For alpha-diversity analysis, Chao1 and Shannon indices were additionally evaluated using linear mixed-effects models, with anatomical source or site as a fixed effect and subject ID as a random intercept. These analyses were performed to confirm that the observed site-associated differences were not driven by treating repeated samples from the same subject as independent observations. For beta-diversity analysis, Bray–Curtis distance matrices were analyzed using PERMANOVA with 999 permutations, with permutations restricted within subject identity to account for non-independence among samples from the same individual. Pairwise PERMANOVA comparisons were also performed using subject-restricted permutations, and p-values were adjusted using the Benjamini–Hochberg method. Homogeneity of multivariate dispersion was assessed using PERMDISP based on Bray–Curtis distances. Distances to group centroids were calculated using the betadisper function in the vegan package, and significance was tested using permutation tests with 999 permutations. PERMDISP was performed both among anatomical sources and among sampling sites within each anatomical source. Principal Coordinates Analysis (PCoA) was employed to visualize community clustering, with 95% confidence ellipses calculated for each primary source group.
2.6. Taxonomic Composition and Commonly Observed Microbes
Relative abundances were calculated by normalizing read counts at the Phylum, Genus, and Species levels. Community composition was visualized using stacked bar plots for the top 10 Phyla and top 20 Genera and Species. The commonly observed microbes across the upper GI tract was identified using Venn diagrams generated with the ggVennDiagram package. Minimum abundance for taxa were based on mean relative abundance thresholds: >0.1% for Phyla and >0.01% for Genera and Species.
2.7. Differential Abundance and Statistical Modeling
Biomarkers associated with specific GI sources were identified with Linear Discriminant Analysis Effect Size (LEfSe) using CPM-normalized data with an LDA score threshold of >3.0 and a significance level of α = 0.05. Robustness of differential abundance was further validated using ALDEx2 (v1.38.0), employing a Dirichlet-multinomial model to account for the compositional nature of the data. For targeted analysis of specific taxa (e.g., Helicobacter pylori, Akkermansia muciniphila), differences in relative abundance were assessed using the Kruskal–Wallis test followed by Dunn’s post hoc test with BH correction.
4. Discussion
In this study, we comprehensively characterized the microbial landscape along the oral–upper gastrointestinal (GI) axis by integrating diversity and compositional analyses. Although the oral cavity, esophagus, and stomach are anatomically continuous, our results clearly demonstrate that microbial communities exhibit both continuity and site-specific divergence across upper GI microbial communities.
When alpha diversity was compared, the esophageal microbiome exhibited significantly higher alpha diversity compared to the oral cavity and stomach, suggesting that the esophagus may not merely serve as a passive transit pathway but instead harbors a distinct and independent microbial ecological niche (
Figure 1D). Despite its anatomical complexity, the continuous circulation of saliva and constant inter-surface contact contribute to the formation of an integrated microbial ecosystem [
12]. The clear separation of oral samples from esophageal and gastric samples in beta diversity analysis further supports the presence of distinct ecological constraints along the oral–upper GI axis. The gastric microbiome showed greater variability, particularly in gastric fluid samples, which may reflect fluctuations in environmental conditions such as gastric acidity and dietary intake (
Figure 1C). This heterogeneity likely reflects the strong influence of acid exposure, oxygen tension, nutrient availability, and host-derived factors within the gastric lumen [
13]. Collectively, these findings support the notion that each region of the upper gastrointestinal tract possesses distinct physiological conditions and colonization environments, leading to the establishment of site-specific microbial community structures.
At taxonomic level, site specificity became more pronounced. Although five dominant phylum were shared across all sites, their relative proportions differed markedly (
Figure 2A). This pattern is consistent with previous reports suggesting that the upper GI tract shares a common microbial composition derived largely from the oral microbiome, yet undergoes site-dependent restructuring driven by local physicochemical conditions [
14,
15]. The gradual decline of
Alloprevotella along the oral–esophageal–gastric axis, accompanied by a progressive increase in
Sarcina ventriculi, suggests a shift from biofilm-associated oral anaerobes to acid-tolerant fermentative organisms (
Figure 2C).
Sarcina ventriculi is known to survive under acidic conditions and has been associated with delayed gastric emptying and mucosal injury [
16]. The increasing abundance of
Sarcina in the stomach and gastric juice therefore likely reflects ecological adaptation rather than passive translocation. Phylogenetic tree analysis showed that closely related bacteria had different abundance patterns across anatomical sites, indicating that evolutionary similarity does not necessarily lead to similar ecological roles in the upper GI tract (
Figure 3B).
LEfSe analysis revealed strong enrichment of classical periodontal-associated taxa (e.g.,
Porphyromonas gingivalis, Prevotella spp.,
Alloprevotella spp.,
Parvimonas micra) in oral samples (
Figure 4). These taxa are well-recognized anaerobic biofilm-associated organisms and key contributors to periodontal dysbiosis [
17,
18]. Their marked reduction in esophageal and gastric samples indicates that, despite continuous swallowing, many strict anaerobes may not efficiently colonize downstream environments. Nevertheless, detection of oral-derived species in the esophagus supports the concept that oral microbiota contributes to the upper GI microbial pool. Emerging evidence links oral pathogens such as
P. gingivalis to esophageal and gastric pathologies, potentially through inflammatory or carcinogenic pathways [
19]. Although abundance decreased downstream, even low-level persistence may be biologically relevant.
When several species of interest were further evaluated, the selected species formed coherent site-dependent gradients: oral enrichment of periodontal pathogen (Alloprevotella tannerae, Alloprevotella sp. HMT 473, Campylobacter concisus, Porphyromonas gingivalis, Prevotella denticola), esophageal enrichment of gut-associated taxa (Bacteroides fragilis, Bifidobacterium longum, Faecalibacterium butyricigenerans), gastric mucosal enrichment of organisms (Akkermansia muciniphila, Helicobacter pylori), and gastric luminal enrichment of acid-tolerant/fermentative or opportunistic organisms (Sarcina ventriculi, Fusobacterium periodonticum, Prevotella melaninogenica, Clostridium perfringens). These gradients support a model in which disease-relevant ecological filters (inflammation, acidity, and motility) shape the oral–esophageal–gastric microbial axis.
The highest relative abundance of classical periodontal-associated genera and species (e.g.,
Porphyromonas/Porphyromonas gingivalis, Prevotella/Prevotella denticola, Alloprevotella spp.) have been implicated beyond oral disease. For example,
P. gingivalis has been detected in esophageal squamous cell carcinoma and has been experimentally linked to tumor progression through host signaling and inflammatory pathways, suggesting that oral pathobionts may contribute to esophageal carcinogenic processes when ecological or host barriers are compromised [
19]. In addition,
Campylobacter concisus has been frequently associated with Barrett’s esophagus and reflux-related esophageal diseases. This bacterium is microaerophilic and produces lipopolysaccharide (LPS), which may promote inflammatory response [
20]. These characteristics suggest that such bacteria may preferentially grow in inflamed esophageal environments. Importantly, the clear decrease of these oral anaerobes in downstream sites suggests that most periodontal bacteria mainly remain in the oral cavity and are less able to stably colonize the esophagus and stomach. However, even small amounts of these bacteria may still be clinically important, as growing evidence links oral microbes with esophageal diseases. Gastric mucosa was enriched with
Akkermansia muciniphila and
Helicobacter pylori, organisms known to colonize the mucosal layer and interact closely with host epithelial and immune system [
20,
21,
22].
B. fragilis have a toxigenic factor capable of driving mucosal inflammation and tumor-promoting signaling in the intestine [
23]; therefore, its enrichment in the upper GI tract requires cautious interpretation or functional analyses.
H. pylori is a well-established driver of chronic gastritis and gastric carcinogenesis, and it is also known to remodel the gastric microbial community, potentially enabling or suppressing co-colonizing taxa [
24]. In contrast, gastric juice was enriched with
Sarcina ventriculi and
Fusobacterium periodonticum, taxa which can be detected in gastric environments and may be associated with acidic or fermentative conditions [
25,
26]. These results indicate that the stomach should not be regarded as a single homogeneous environment but rather as comprising at least two ecologically distinct niches: the mucosal-associated microbiome and the luminal microbiome. The strong inter-sample variability observed in gastric juice further supports the dynamic and highly selective nature of the gastric lumen.
Several limitations are that this study relied on publicly available 16S rRNA sequencing data, which limited control over host-related factors such as diet, medication use, oral health status, and underlying diseases. In addition, 16S rRNA analysis provides limited functional and strain-level resolution, preventing detailed characterization of microbial activities. Although the use of a PacBio-informed full-length 16S rRNA reference database improved taxonomic assignment and selected species-level ASVs were concordant with BLAST results against both NCBI 16S and PacBio-derived reference databases, short-read V3–V4 sequencing has inherent limitations in resolving closely related bacterial species. Therefore, species-level findings, including assignments to H. pylori, A. muciniphila, S. ventriculi, and C. perfringens, should be interpreted cautiously as 16S-based taxonomic inferences. Confirmation by full-length 16S sequencing, shotgun metagenomics, targeted qPCR, or culture-based methods would be required for definitive species- or strain-level validation. Future studies integrating metagenomic or multi-omics approaches will be necessary to better understand functional interactions within the upper GI microbiome.
In conclusion, our findings demonstrate that although the oral cavity, esophagus, and stomach are anatomically continuous, microbial communities exhibit both shared taxa and clear site-specific divergence. These patterns suggest that physicochemical gradients and host–microbe interactions shape distinct ecological niches along the upper GI tract. Collectively, this study provides new insights into the ecological organization of the oral–upper GI microbiome and highlights the importance of considering inter-organ microbial interactions in understanding upper GI health and disease.