The Gut Microbiome of an Indigenous Agropastoralist Population in a Remote Area of Colombia with High Rates of Gastrointestinal Infections and Dysbiosis

An Indigenous agropastoralist population called the Wiwa from the Sierra Nevada de Santa Marta, in North-East Colombia, shows high rates of gastrointestinal infections. Chronic gut inflammatory processes and dysbiosis could be a reason, suggesting an influence or predisposing potential of the gut microbiome composition. The latter was analyzed by 16S rRNA gene amplicon next generation sequencing from stool samples. Results of the Wiwa population microbiomes were associated with available epidemiological and morphometric data and compared to control samples from a local urban population. Indeed, locational-, age-, and gender-specific differences in the Firmicutes/Bacteriodetes ratio, core microbiome, and overall genera-level microbiome composition were shown. Alpha- and ß-diversity separated the urban site from the Indigenous locations. Urban microbiomes were dominated by Bacteriodetes, whereas Indigenous samples revealed a four times higher abundance of Proteobacteria. Even differences among the two Indigenous villages were noted. PICRUSt analysis identified several enriched location-specific bacterial pathways. Moreover, on a general comparative scale and with a high predictive accuracy, we found Sutterella associated with the abundance of enterohemorrhagic Escherichia coli (EHEC), Faecalibacteria associated with enteropathogenic Escherichia coli (EPEC) and helminth species Hymenolepsis nana and Enterobius vermicularis. Parabacteroides, Prevotella, and Butyrivibrio are enriched in cases of salmonellosis, EPEC, and helminth infections. Presence of Dialister was associated with gastrointestinal symptoms, whereas Clostridia were exclusively found in children under the age of 5 years. Odoribacter and Parabacteroides were exclusively identified in the microbiomes of the urban population of Valledupar. In summary, dysbiotic alterations in the gut microbiome in the Indigenous population with frequent episodes of self-reported gastrointestinal infections were confirmed with epidemiological and pathogen-specific associations. Our data provide strong hints of microbiome alterations associated with the clinical conditions of the Indigenous population.

About a dozen samples were included as control, originating from Colombians living in Valledupar, a larger city in the Department Cesar.
The previous study by our group [22] documented the high burden of gastrointestinal pathogens in the tested Indigenous individuals. Over 93% of all stool samples contained at least one of the tested pathogens. Overall, 79% of all stool samples contained protozoa, 69% helminths, and 41% bacteria. G. intestinalis (48%), Necator/hookworm (27%), and enteroaggregative E. coli (EAEC, 68%) were found to be the most dominant pathogens [22].
As known from previous assessments [23][24][25], however, molecular proof of pathogens, in particular, is not necessarily associated with clinical disease manifestation in highendemicity settings [26], making adaptation processes likely. The composition of the gut microbiome might be an influencing factor for adaptation; however, it could also be directly associated with the presence of eukaryotic and prokaryotic enteric pathogens [13]. Geographic variation in gut microbiome compositions is a well-described phenomenon [27], most interestingly with widely dispersed resistance genes even in highly diverse gut microbiomes of uncontacted Indigenous populations [28]. Therefore, apart from the detection of single or multiple intestinal pathogen species, it remains unknown if the overall gut microbiomes of Colombian Indigenous individuals differ from those of larger city populations with higher socio-economic standards and access to healthcare facilities. It is further unclear if the presence of pathogens shapes the gut microbiomes of pathogen-colonized and/or -infected individuals. It could also be envisioned that dietary and environmental factors mainly shape the gut microbiomes of Indigenous populations and, thereby, the specific individual gut microbiome composition could be a risk factor for a higher pathogen burden.
To approach these questions epidemiologically, an analysis of the specific gut microbiome of Wiwa was conducted and associated with both self-reported clinical features and detected pathogenic microorganisms, as reported recently [22]. The aim was to characterize specific features of the gut microbiome of an isolated Indigenous Colombian population in comparison with non-Indigenous individuals, and to identify potential associations with and differences between village locations, gender, and age, in addition to clinical features such as the reporting of gastrointestinal symptoms. Dissimilarities in the bacterial distribution and the pathogen specific functional profiles of the microbiomes were also the focus of this study.

Study Population
The study population has been described previously [22]. In short, stool samples were collected from Indigenous volunteers from the remote Indigenous villages Tezhumake (n = 81, estimated population of 200, Department César) and Siminke (n = 45, estimated population of 60, Department La Guajira) between July and November 2014. In addition, 11 stool samples were provided by the Laboratory of Christian Gram (n = 7), the Hospital Rosario Pumarejo de Lopéz (n = 2), and voluntary Colombians (n = 2, non-Indigenous people) living in Valledupar. Two samples came from a health care center in Valledupar, serving the needs of the Wiwa, and were provided by Wiwa from Tezhumake.
Complete anthropometric, clinical, and physical examinations were performed by a physician in 102 cases and incomplete datasets were provided in an additional 34 cases. Specific assessed data of patients comprised date of birth, age, gender, height (cm), weight (kg), BMI/z-scores, location, stool pathogens (number of different species and identification on genus or species level), and specific symptoms. Official tables by the Organización Mundial de Salud 2006-2007 for the Colombian population were applied to evaluate average size, weight, and growth. Corrected by gender, individuals aged 0-19 years were scored by weight-size, weight-age, and size-age tables; further, body mass index (BMI) was calculated in line with WHO recommendations.

Library Preparation and Next Generation Sequencing (NGS) Analysis
The DNA extraction from all stool samples was previously described [22]. Prior to 16S rRNA gene amplicon library preparation, DNA concentration was again determined using Qubit and Nanodrop protocols. The amplicon PCR was started with 5 ng/µL of template DNA in 10 mM Tris buffer at pH 8.5. The V3/V4 region primers were used for the amplification of the 16S rRNA encoding gene [32], resulting in roughly 450 bp fragments. All other library preparation steps were performed according to the Illumina "16S Metagenomic Sequencing Library Preparation protocol".
Further steps including PCR clean-up, Index PCR, PCR clean-up 2, library quantification, normalization, and pooling followed the above-mentioned protocols. Bioanalyzer DNA 1000 chips (Agilent Technologies, Santa Clara, CA, USA) and Qubit kits (Thermo Fischer Scientific, Waltham, MA, USA) allowed quantity and quality control of the final individual stool sample libraries and the final library pool. Two libraries were prepared from all samples. Ten percent PhiX control was spiked into each final pool and 5 pM of the final libraries were initially loaded on two 50 cycle V2 chemistry kits. An optimal cluster density and an even-read distribution among all samples was verified after sequencing both pools. Finally, each pool was subjected to full length sequencing on a 600 cycle V3 chemistry kit using an Illumina MiSeq machine. During each run, roughly 600 (k/mm 2 ) clusters were sequenced. A total of 95% of all clusters passed the filter specs and sequencing lead to a mean of 16 Mio reads per run, of which again 95% passed the quality control. Amplicon sequencing of the 135 samples from the three locations generated over 22.31 million reads (6.39 × 10 9 bp total), with 165,313 on average per sample. These reads were assembled, producing more than 9.84 million amplicons for the whole dataset (average 72,940 amplicons per sample).

Bioinformation Assessment of the NGS Reads
16S amplicons were assembled from the Illumina sequencing runs using the pandaseq program (version 2.11) [33]. Operational taxonomic units (OTUs) were then identified from these amplicons with USEARCH (version 6.1.544) [34], called from within the QI-IME 1.9.1 pipeline [35]. To this end, the similarity of the amplicons to fragments in the 16S rRNA Greengenes (version 13.8) [36] database was used. In all cases, a cutoff of 97% identity was applied. Afterwards, low confidence OTUs were excluded via the re-move_low_confidence_otus.py script [37]. Furthermore, contaminant OTUs, such as those classified as chloroplasts, mitochondria, or nonbacterial, were removed from the data. The obtained dataset was then processed with phyloseq (version 1.22.3) [38], a bioconductor package used to evaluate the ordination, abundance, and composition of the bacterial communities in the samples, within the R environment (version 3.4.3). Statistical significance Microorganisms 2023, 11, 625 5 of 28 of these analyses was assessed through the permanova (adonis) and Kruskal-Wallis tests, as implemented in the vegan package (version 2.4.6). Moreover, statistically significant differences between the communities from the different sample locations at the genus level were analyzed with the DESeq2 R package [39], by applying a Wald test and the Benjamin-Hochberg correction for the p-values (false discovery rate cutoff: FDR < 0.01). Moreover, to observe the genera shared between the different communities, we followed the protocol developed in the microbiome R package (version 1.9.96), in order to calculate the core microbiomes for each sample group [40,41]. Finally, a PICRUSt analysis was carried out to predict the relative abundance of functional genes and pathways in the microbial communities (version 2.11) [42]. The obtained OTU (operational taxonomic unit) tables were then normalized by 16S rRNA copy number, and functional genes were predicted from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database [43]. Finally, the results from the PICRUSt analysis and KEGG predictions were processed with STAMP (version 2.1.3) [44].

Data Deposit
The raw sequencing files have been deposited at the European Nucleotide Archive (ENA, https://www.ebi.ac.uk/ena, accessed on 24 January 2023) under accession number PRJEB43871.

Ethical Clearance
The study was performed in line with the declaration of Helsinki. It was approved by the Ethics Committee of Valledupar, César, Colombia on 11 August 2014 (Acta No 0022013). Written informed consent was obtained from each participant or from the parent or legal guardian of a child before participation. All participants were informed about their results and received treatment, if appropriate.

Brief Demographic Sample Background
Demographic details of the assessed populations have previously been reported and are freely accessible [22]. Briefly, there was a 1:1 female-to-male ratio without associations with symptoms or pathogen detections in the previous study. A mean age (±standard deviation (SD)) of 24.6 (±18.2) years was recorded for the study population. Although only 12% of the tested individuals reported symptoms such as diarrhea (10%) or diarrhea combined with abdominal pain (2%), 82% of collected stool samples were fluid, pappy, or had a mucous texture, and 4 samples with PCR-confirmed diagnosis of Entamoeba histolytica were even bloody. Detections of bacterial, helminthic, and protozoan pathogens ranged between 39% and 60% [22] without association of reported symptoms and pathogen detections, while multiple infections were frequent in symptomatic individuals. At least 1 pathogen was detected in 93% of the assessed individuals, and a maximum of 9 were detected in 1 assessed volunteer, who was a 7-year-old boy. This corresponded well to the findings of higher proportions of both protozoan infections as well as multiple infections in children.

General Features of the Microbiome Dataset
The assembled amplicons were assigned to OTUs upon comparison with data from the Greengenes database (version 13.8), and filtered to keep only the high confidence assignments. From these, 43.92% of the total assembled fragments were attributed to Greengenes' accessions with high-confidence assignments when using the open reference assignment method. Therefore, and to avoid spurious amplicons that might add bias to the results, low-quality samples (below 20,000 amplicons, with low read counts and a small number of assigned amplicons) were identified and excluded from further analyses. All follow-up analyses were carried out with data from the closed reference assignments, to maintain an agreement between the community-related results and those from the pathway analyses. The most predominant phyla in the individual samples that remained for further analyses were the Proteobacteria, with an average relative abundance of 23.85% (range from 0 to 87.89%), Firmicutes (range 6.46-70.36%; average 40.45%), and Bacteroidetes (average: 28.09%; range 0 to 55.88%). The whole microbiome of the selected samples also entailed 112 genera. Ten out of these 112 genera are shared by 80% of the subjects for a given group at the minimum detection threshold of 0.1% relative abundance, and therefore constitute the core microbiome: Bacteroides, Prevotella, Clostridium, Coprococcus, Faecalibacterium, Oscillospira, Ruminococcus, Phascolarctobacterium, Succinivibrio, and Treponema [41]. Specifically, the top five most abundant genera in the remaining samples were Prevotella (average between samples: 15.41%; range from 0.0 to 34.42%), Faecalibacterium (average 9.32%, range 0.0-46.07%), Succinivibrio (average 9.17%, range 0.0-35.16%), Bacteroides (average 3.31%, range 0.0 to 50.09%), and Treponema (average 2.92%, range 0.0 to 14.61%).

Composition of the Bacterial Communities by Location
First, we analyzed the relative abundance at the phylum level for the whole group. Bacteroidetes, Firmicutes, Proteobacteria, and Spirochaetes were among the most commonly represented phyla in all locations ( Figure 1A). As shown in Figure 1A and Table A1, analysis of the Indigenous locations revealed that the top five most abundant phyla for the Siminke population were the Bacteroidetes (38.42%), Proteobacteria (28.25%), Firmicutes (25.51%), Spirochaetes (7.44%), and Euryarchaeota (0.22%). The most common phyla for the Indigenous population from Tezhumake were Firmicutes (35.04%), Bacteroidetes (31.18%), Proteobacteria (25.20%), Spirochaetes (5.84%), and Actinobacteria (1.33%; see Table A1 and Figure 1A). This result already highlights differential abundances, particularly of the Firmicutes, between Indigenous villages with the same agropastoralist subsistence. The urban population of Valledupar had the same top five most prominent phyla as Siminke, although with different relative abundances: Bacteroidetes (50.48%), Firmicutes (33.31%), Proteobacteria (7.92%), Fusobacteria (2.51%), and Euryarchaeota (2.52%; see Table 1 and Figure 1A). Of note, the urban microbiomes are remarkably dominated by Bacteriodetes (50.48%), whereas overall the abundance of Proteobacteria is 3-4 times higher in the Indigenous samples in relation to the urban village.
Furthermore, when considering the F/B ratio (Table A1), the urban samples are identical to the Siminke samples (0.66 in both populations). In the Tezhumake population the F/B ratio is twice as high compared to Siminke and Valledupar.
Extending the overview to all genera with relative abundance above 1% (Table A2), it is apparent that Dialister and CF231 (a genus within the Paraprevotellaceae family recently associated with fatty liver and higher BMI [45,46]) are missing in this category in the Tezhumake population stool samples, whereas Klebsiella, Bifidobacteria, Clostridia, Brachyspira, and Blautia can be found exclusively in the Tezhumake sample collection when considering relative abundance above 1%. Ruminobacter and Acinetobacter are not found among the genera with relative abundance above 1% in Valledupar individuals, whereas Extending the overview to all genera with relative abundance above 1% (Table A2), it is apparent that Dialister and CF231 (a genus within the Paraprevotellaceae family recently associated with fatty liver and higher BMI [45,46]) are missing in this category in the Tezhumake population stool samples, whereas Klebsiella, Bifidobacteria, Clostridia, Brachyspira, and Blautia can be found exclusively in the Tezhumake sample collection when considering relative abundance above 1%. Ruminobacter and Acinetobacter are not found among the genera with relative abundance above 1% in Valledupar individuals, whereas Fusobacteria, Odoribacteria, Parabacterioides, and Sutterella can only be found in the urban stool sample collection with the 1% cut-off used for this particular analysis. Consistent with these results, none of the exclusively found genera from the Tezhumake and Valledupar stool samples could be identified in the Siminke samples in the category with relative abundance above 1%.
Overall, the location-specific associations were calculated with a 75.76% accuracy employing a Random Forest Classifier (RFC) model.
Next, the core microbiome (basically the number of taxa shared by a minimum number of individuals within one group) was analyzed. At the genus level, the Indigenous samples entailed the following nine genera: Prevotella, Faecalibacterium, Succinivibrio, Phascolarctobacterium, Ruminococcus, Treponema, Bacteroides, Oscillospira, and Clostridium. The core microbiome from the Siminke samples contained twelve genera without counting contested taxa ( Figure 2A): Prevotella, Faecalibacterium, Succinivibrio, Phascolarctobacterium, Treponema, Ruminococcus, Roseburia, CF231, Campylobacter, Bacteroides, Parabacteroides, and Oscillospira. Similarly, the core microbiome in the Tezhumake Indigenous population consisted of ten genera ( Figure 2B): Prevotella, Faecalibacterium, Succinivibrio, Phascolarctobacterium, Ruminococcus, Treponema, Bacteroides, Clostridium, Oscillospira, and Coprococcus. In contrast, the core microbiome in the urban population (Valledupar, Figure 2C) entailed the genera Faecalibacterium, Bacteroides, Prevotella, Ruminococcus, Parabacteroides, Oscillospira, Lachnospira, Coprococcus, and Blautia. Three genera were found distinctively in the core microbiomes of the Indigenous populations: Phascolarctobacterium, Succinivibrio, and Treponema, while Blautia and Lachnospira were predominant in the core microbiome of the urban individuals. Table 1. Differentially abundant taxa from the comparison between the different population groups. Comparison specifies the pair analyzed: Valledupar vs. Indigenous (VI), Valledupar vs. Tezhumake (VT), Valledupar vs. Siminke (VS), and Tezhumake vs. Siminke (TS). LFC indicates the Log2-fold changes between both sample groups, and the adjusted p-value was calculated with the Benjamin-Hochberg method to control the False Discovery Rate (FDR < 0.1), as implemented in the DESeq2 R library [39]. The OTU identifiers correspond to those pertaining to the Greengenes database, version 13.8 [47].  Furthermore, to extend our knowledge of the significant differences between these populations at the genus level, we carried out differential abundance tests over the OTU data. Upon applying a false discovery rate correction (FDR < 0.1), significant differences were found for all four pairs of comparisons (Table 1). Four genera were identified as Furthermore, to extend our knowledge of the significant differences between these populations at the genus level, we carried out differential abundance tests over the OTU data. Upon applying a false discovery rate correction (FDR < 0.1), significant differences were found for all four pairs of comparisons (Table 1). Four genera were identified as significantly different between the urban (Valledupar) versus the Indigenous groups, three for the Valledupar and Tezhumake pair, five between Valledupar and Siminke, and finally four for Tezhumake versus Siminke (Table 1). In the first comparison, we observed that the genera Dialister and Odoribacter are significantly more abundant in the urban stool microbiomes from Valledupar with respect to the Indigenous samples. However, the genera Parabacteroides and Suterella where significantly less abundant in the Valledupar stool samples. Similarly, the genera Dialister and Odoribacter are more commonly found in the Valledupar samples (24.59 and 5.51 LFC, respectively), and Suterella is significantly more abundant in the Tezhumake Indigenous people. Differential comparisons between samples from Valledupar and Siminke revealed Parabacteroides, Butyrivibrio, and three OTUs belonging to the Prevotella genus were all significantly less abundant in the Valledupar stool microbiomes compared to those of the Siminke population. In addition, Suterella, Faecalibacterium, and Clostridium were significantly more abundant in the Tezhumake group, whereas the Prevotella genus was found to be less abundant when compared to the Siminke stool microbiome samples (Table 1).

Microbial Gut Diversity of the Indigenous and Urban Populations
Next, we calculated and compared the Shannon diversity to identify possible significant differences in phylogenetic alpha diversities (within sample variation) between these different population groups. We found the non-Indigenous group (Valledupar) has less alpha diversity compared to the Indigenous individuals, indicating a rather uniform microbiome composition among all individual Valledupar samples ( Figure A1A). The mean Shannon diversity was also lower when comparing Valledupar against the two individual Indigenous groups separately (Siminke and Tezhumake), while the overall diversity was larger in the Tezhumake versus the Valledupar. Conversely, the overall diversity was larger in the Valledupar than in the Siminke individuals ( Figure A1B). Of note, the Kruskal-Wallis tests deemed these differences not significant in both cases (p-value < 0.7623 and p-value < 0.6399 for Figure A1A and B, respectively). Therefore, we extended our analyses to determine the community ß-diversity with unweighted UniFrac distances (qualitative approach based on presence/absence of taxa). Results shown in Figure 3A,B revealed the microbiome of the Indigenous population was significantly distant from that of the Valledupar group (permanova p-value < 1 × 10 −4 ). These observations were confirmed when considering taxa abundances using weighted UniFrac metrics (based on presence/absence of taxa and relative abundance of taxa, permanova p-value < 5 × 10 −4 ). In fact, the Siminke and Tezhumake microbiomes are similarly distant to those from Valledupar (unweighted UniFrac, permanova p-value 5 × 10 −4 ; Figure 3A,B). However, when taking into account taxa abundances, the Tezhumake samples were more similar to those from the non-Indigenous Valledupar population than to the Siminke samples (weighted UniFrac, permanova p-value 5 × 10 −4 ; Figure 3C,D).

Functional Composition of the Microbial Communities by Location
As a next step, we performed a PICRUSt analysis [42] to identify genera with significant differences between the three locations at the functional level. This allows inferring potential genetic capabilities and specific contributions of bacterial taxa to the imputed metagenome of all samples. Multiple group comparisons were carried out from the OTU genera data and their corresponding pathway-encoding genes for each genus. Employing a DESeq2 test with Benjamin-Hochberg FDR correction, we found five differentially abundant KEGG metabolic pathways (Table A3): Arginine and proline metabolism (KEGG id KO00330; adjusted p-value < 0.0266), Lipopolysaccharide biosynthesis (KO00540; adjusted p-value < 0.0379), Glycosphingolipid biosynthesis-ganglio series (KO00604; adjusted p-value < 0.0247), Porphyrin and chlorophyll metabolism (KO00860; adjusted p-value < 0.0102), and Glycosyltransferases (KO01003; adjusted p-value < 0.0297). Two significant pathways related to information processes were also identified: KO02042 Bacterial toxins and KO03110 Chaperones and folding catalysts. Genes related to the arginine and proline metabolism, as well as those from glycosphingolipid biosynthesis, porphyrin, and chlorophyll metabolism and bacterial toxins, were all enriched in the Valledupar samples ( Figure A2A, C, D, and G, respectively). Those from the lipopolysaccharide biosynthesis, glycosyltransferases, and chaperones and folding catalysts were predominantly found in the Siminke population ( Figure A2B, E, and F, respectively). The four significant pathways that were more represented in the Valledupar samples were linked to 143, 56, 52, and 38 unique genera for the glycosphingolipid biosynthesis, bacterial toxins, arginine and proline metabolism, and porphyrin and chlorophyll metabolism, respectively. Common to all these pathways were the following ten genera: Bacillus, Citrobacter, Comamonas, Edwardsiella, Erwinia, Klebsiella, Paenibacillus, Pseudomonas, Serratia, and Trabulsiella. In a similar manner, in the enriched pathways in the Siminke samples we identified 145, 134, and 12 unique genera for the chaperones and folding catalysts, lipopolysaccharide biosynthesis, and glycosyltransferases pathways, respectively. Eleven genera were common to these three pathways: Achromobacter, Alloscardovia, Bacillus, Comamonas, Giesbergeria, Hydrogenophaga, Lysinibacillus, Paracoccus, Pigmentiphaga, Pseudomonas, and Sphingomonas, while Elusimicrobium was exclusive to the lipopolysaccharide biosynthesis pathway, and another eleven genera were unique to the chaperones pathway (Alkaliphilus, Butyrivibrio, Candidatus, Arthromitus, Fibrobacter, Finegoldia, Methanobrevibacter, Mogibacterium, Moryella, Proteiniclasticum, and Synergistes). Next, when comparing the genera present in the significant pathways from Siminke against the genera from the pathways more abundant in the Valledupar group, we encountered 142 genera in common, four unique to the Siminke enriched pathways (Actinobacillus, Methanobrevibacter, Methanosphaera, and the Archaea genus vadinCA11), and a single genus exclusive to the Valledupar enriched pathways (Peptococcus). Finally, we filtered these genera for the corresponding sample groups (instead of via enriched pathways), and found that Methanobrevibacter, Actinobacillus, and vad-inCA11 can be all found in individual samples of both Siminke and Valledupar, although Methanobrevibacter and Actinobacillus are present in more samples in the Siminke group (20 versus 4 samples for Methanobrevibacter; 3 versus 1 for Actinobacillus), while vadinCA11 can be identified in one sample for both groups, and Methanosphaera and Peptococcus appear only in Tezhumake individuals.

Microbial Communities in the Different Gender Groups
As a next step we analyzed the dataset for gender-specific differences in the microbiomes. The top four phyla in the female Indigenous samples were Bacteriodetes, Proteobacteria, Firmicutes, and Spirochetes ( Figure 4A, Table A4). Actinobacteria and Euryarcheota have a higher abundance in females from Tezhumake compared to Siminke. The most abundant phyla from the female population of urban Valledupar are Bacteriodetes, Firmicutes, Euryarcheota, and Verucomicrobiota. The latter is found only below 1% abundance in Indigenous females, whereas Proteobacteria, Spirochaetes, and Actinobacteria are below 1% abundance in female urban stool microbiomes ( Figure 4A, Table A4). No obvious differences among females from Siminke and Tezhumake were noted.
Male microbiomes from Tezhumake have an abundance of Actinobacteria and Euryarcheota below 1% compared with the females from the same location. No such differences were found among males and females from Siminke ( Figure 4A, Table A4). Males from the urban site Valledupar differ in terms of higher abundance of Proteobacteria, Spirochaetes, and Fusobacteria, as well as lower abundances of Euryarcheota from female subjects.
These observations were confirmed when considering taxa abundances using weighted UniFrac metrics (based on presence/absence of taxa and relative abundance of taxa, permanova p-value < 5 × 10 −4 ). In fact, the Siminke and Tezhumake microbiomes are similarly distant to those from Valledupar (unweighted UniFrac, permanova p-value 5 × 10 −4 ; Figure  3A,B). However, when taking into account taxa abundances, the Tezhumake samples were more similar to those from the non-Indigenous Valledupar population than to the Siminke samples (weighted UniFrac, permanova p-value 5 × 10 −4 ; Figure 3C,D).  From these results we calculated the Firmicutes-Bacteroidetes (FB) ratio for each gender in each population group, and found that the average FB ratios for females were 0.7061, 4.2882, and 0.9591 for the Siminke, Tezhumake, and Valledupar individuals, respectively, while in males these values were 0.7339, 1.3882, and 0.6299, respectively, for these same population groups. Kruskal-Wallis tests carried out from this data deemed differences in FB ratios between the three population groups as significant between genders (p-values: 0.0131 for the FB ratios in females; 1.3971 × 10 −5 in males).

Functional Composition of the Microbial Communities by Location
At the genera level, no difference in the abundance of the 10 most prevalent genera for the females in the Indigenous populations was noted ( Figure 4B, Table A5). However, Dialister, CF231, Bacteroides, Parabacteroides, and Streptococcus are genera with an abundance above 1% in Siminke females compared to Tezhumake, where these genera did not appear in this category. Similarly, Bifidobacterium, Brachyspira, Methanobrevibacter, Blautia, Clostridium, and Klebsiella are more dominant in Tezhumake females compared to the Siminke female population ( Figure 4B, Table A5). Odoribacter and Akkermansia are the only two genera which have an abundance above 1% exclusively in the urban female samples from Valledupar, whereas Succinivibrio, Treponema, Ruminobacter, Acinetobacter, Phascolarctobacterium, Campylobacter, Streptococcus, Bifidobacterium, Brachyspira, Clostridium, and Klebsiella are not found with an abundance above 1% in this urban female population. Genera abundance in males from all three locations is more diverse compared to the female microbiomes. Male Siminke microbiomes contain 12 genera with an abundance above 1%, whereas Tezhu-make and Valledupar male microbiomes comprise 16 genera in that category ( Figure 4B, Table A5). Ruminobacter, Acinetobacter, Clostridium, Klebsiella, and Rummeliibacillus are not found in Valledupar male samples in the abundance category above 1%. Similarly, Dialister, Odoribacter, Coprococcus, Fusobacterium, and Sutterella are not found in Tezhumake male samples, whereas Acinetobacter, Parabacteroides, Clostridium, Klebsiella, Odoribacter, Coprococcus, Fusobacterium, Rummeliibacillus, and Suterella are not found with an abundance above 1% in Siminke male microbiomes ( Figure 4B, Table A5). Of note, Clostridium and Klebsiella are exclusively found above 1% abundance in male and female Tezhumake samples, whereas Odoribacter is exclusively categorized in the above 1% abundance category in female and male microbiomes from Valledupar. Some differences were noted by comparing females and males from the same location. In Siminke, Acinetbacter, CF231, Parabacteroides, and Streptococcus are found in female samples with an abundance above 1% if compared to the male microbiomes. In Tezhumake, Bifidobacterium, Brachyspira, Methanobrevibacter, and Blautia are more abundant in female microbiomes compared to male samples. More differences were noted among the samples from Valledupar, where the female microbiomes have a higher abundance of CF231, Methanobrevibacter, Blautia, and Akkermansia compared to male individuals. Males from Valledupar have a higher abundance in the category above 1% of Succinivibrio, Treponema, Phascolarctobacterium, Campylobacter, Coprococcus, Fusobacterium, and Sutterella ( Figure 4B, Table A5).   Male microbiomes from Tezhumake have an abundance of Actinobacteria and Euryarcheota below 1% compared with the females from the same location. No such differences were found among males and females from Siminke ( Figure 4A, Table A4). Males from the urban site Valledupar differ in terms of higher abundance of Proteobacteria, Spirochaetes, and Fusobacteria, as well as lower abundances of Euryarcheota from female subjects.
From these results we calculated the Firmicutes-Bacteroidetes (FB) ratio for each gender in each population group, and found that the average FB ratios for females were

Microbial Communities Associated with Different Age Groups
For simplicity, age groups were classified into children under 5 years of age (C1 age group), children between 5 and 17 years of age (C2 age group), and adults 18 years and older (C3 age group). When comparing the Indigenous populations versus Valledupar, we found that Proteobacteria, Bacteroidetes, and Firmicutes were the prevailing phyla in both groups (Table A6, Figure 5A). Bacteroidetes were present in larger proportions in the C2 and C3 age groups from the urban population of Valledupar, while the Firmicutes were most prevalent in the C1 age group for the same population (Table A6, Figure 5A). Further, we noticed the percentage of Firmicutes increased with age in the Indigenous groups, while it decreased from C1 to C2, and then remained more or less stable in the urban samples. The mean FB ratios increased with age in the Indigenous samples and decreased considerably from the C1 to C2 and C3 children and adults in Valledupar. Spirochaetes were proportionally more dominant in all Indigenous groups. Of note, no Spirochaetes were found in the urban children when filtering low abundant genera (below 1%). Actinobacteria were exclusively found in the adult C3 group from Indigenous locations, whereas Fusobacteria were exclusively found in the urban microbiome from Valledupar, if abundances lower than 1% were filtered out (Table A6, Figure 5A).
At the genera level, Prevotella, Faecalibacterium, and Bacteroides were found in all age groups in both Indigenous and urban populations (Table A7, Figure 5B). C1 children from Valledupar revealed a remarkably higher proportion of Faecalibacterium, which is equal in all other age groups from both locations. The proportion of Bacteroides is constantly larger in all age groups from the urban Valledupar microbiomes compared to the Indigenous samples. Ruminobacter was exclusively found in all Indigenous age groups within the above 1% abundance category, while Campylobacter is exclusive for the C1 and C3 Indigenous age groups and only appears in the microbiomes of both locations in the adult age group C3. For Dialister it was the opposite, as it is exclusive for the C1 and C2 age groups of the urban site and appears in both sites among the adult C3 microbiomes. Bifidobacterium, Veillonella, Acinetobacter, Acrobacter, and Klebsiella are exclusive in the Indigenous and C3 and C1 groups, respectively. Brachyspira and Clostridium were identified only in the Indigenous C2 and C3 age groups if the above 1% abundance level is considered. Two genera, Odoribacter and Suterella, are present only in urban samples in the above 1% abundance category. Further genera found exclusively in urban samples were Lachnospira (only observed in C1 children), Coprococcus and Oscillospira solely in C2 children, and Fusobacterium and CF23 in adults.

Microorganisms 2023, 11, x FOR PEER REVIEW 15 of 30
Fusobacteria were exclusively found in the urban microbiome from Valledupar, if abundances lower than 1% were filtered out (Table A6, Figure 5A).  When analyzing the data individually for the two Indigenous populations, Siminke and Tezhumake displayed different abundances of Bacteroidetes, Proteobacteria, and Firmicutes, with Bacteroidetes always the most prevalent group in all ages for Siminke, and different phyla being the most common in the different age groups for Tezhumake: Proteobacteria in C1 children, Bacteroidetes in C2 children, and Firmicutes in C3 adults (Table A8, Figure 5A). Consequently, the mean FB ratio is almost twice as high in Tezhumake C1 children compared to C1 children from Siminke, while increasing from C1 to C2 children and then slightly increasing into adulthood for both Siminke and Valledupar. The differences in FB ratios between C2 children and C3 adults from the three populations are significant, with Kruskal-Wallis p-values of 0.0328 and 1.0214 × 10 −5, respectively.
As for genera exclusively present in Indigenous groups, Streptococcus appears in Siminke C2 children and Tezhumake C3 adults; Klebsiella in both children groups for Tezhumake; Clostridium in Tezhumake C2 children and C3 adults; Brachyspira also in Tezhumake C2 children and C3 adults, and also in Siminke adults; Acinetobacter in Siminke C1 and C2 children, and in Tezhumake C1 children and C3 adults; and, finally, Ruminobacter appears in all ages for both Indigenous groups (Table A9, Figure 5B).
There were significant differences when comparing the age-associated relative abundances of genera for the three populations (data not filtered for low abundant genera): Kruskal-Wallis p-values 3 × 10 −4 for C1 children, 1 × 10 −4 for the C2 children, and 0.0166 for C3 adults.
Overall, the age-specific associations were calculated with a 50% accuracy employing a Random Forest Classifier (RFC) model. Furthermore, we refrained from a comparative analysis of microbial communities from the different locations in association with BMI, as our study did not provide enough individual samples to statistically and evenly cover all possible BMI categories (underweight, normal, pre-obesity, obesity, and overweight). Details are provided in Table A10.

Association between Specific Genera and Demographic and Clinical Metadata
Finally, and as data about co-occurring infections were available [22], we combined the microbiome OTU genera data with the infection data (symptoms, presence of pathogens, and parasites) to identify associations between dysbiosis and co-occurring infections. We found that changes in relative abundances of genera for the three populations were related to ETEC-positive (Kruskal-Wallis p-value < 9 × 10 −4 ), EAEC-negative (p-value < 0.0058), and Giardia-positive (p-value < 0) individuals. Later, we combined the microbiome OTU genera data with all demographic (gender, age, location) and clinical data (BMI, symptoms, presence of pathogens and parasites) to assess the association of the microbiome data with these factors through generalized mixed models [48]. For simplicity, in these tests we utilized only the differentially abundant genera: Sutterella, Faecalibacterium, Clostridium, Prevotella, Parabacteroides, Butyrivibrio, Dialister, and Odoribacter, and the demographic and clinical data as response variables. As a result, we found that Sutterella was associated with individuals who tested positive for EHEC (p-value < 4.31 × 10 −2 ) and EPEC (p-value < 4.64 × 10 −2 ). Faecalibacterium was linked to Salmonellosis (p-value < 9.28 × 10 −4 ), EPEC (p-value < 1.34 × 10 −2 ) and Hymenolepis (p-value < 2.61 × 10 −2 ). Parabacteroides, Prevotella, and Butyrivibrio were significant in those individuals with Enterobius, EPEC, and Salmonellosis, respectively ( Table 2). In addition, we found that Butyrivibrio, Sutterella, Dialister, and Faecalibacterium were enriched in both Tezhumake and Valledupar populations, whereas Clostridium and Prevotella were Tezhumake-specific. Odoribacter and Parabacteroides were significantly present in the Valledupar individuals (Table 2). Finally, we identified genera that were enriched in the whole cohort for specific demographics, i.e., Dialister for those with symptoms and Clostridium in children under 5 years old (Table 2). Table 2. Genera significantly associated with demographic and clinical data. OTU genera abundances (log10-corrected) were combined with demographic and clinical data ("category") and evaluated through generalized mixed linear models. Results are shown for genera under the significance cutoff (p-value < 0.05).

Genus
Category p-Value

Discussion
Several studies investigated age, gender, and nutritional associations with gut microbiomes of non-Western, urban populations from larger Colombian cities [49]. However, the Indigenous tribe called Wiwa, a restricted living and agropastoralist population in Colombia who lives in small villages under extremely poor nutritional, health, and hygiene conditions, have not been studied to date. The microbiomes of these Indigenous people potentially contain important bacterial species that have been lost during domestication and urbanization. However, this particular population in Colombia suffers from gastrointestinal infections that are rather common and widespread among these people. Consequently, in this research we investigated for the first time the stool microbiomes of the Wiwa population in two separate villages (Tezhumake and Siminke) to study the bacterial and functional diversity of their gut microbiomes. Furthermore, possible associations of morphometric data, epidemiological data, and gastrointestinal diseases with microbiome composition, always compared to a local urban population (Valledupar), were investigated.
The investigation provided a number of insights. First, locational-, age-, and gender-specific differences in the Firmicutes/Bacteriodetes ratio, core microbiome, and overall genera-level microbiome composition were shown, confirming previous reports on lifestyle-dependent enteric microbiomes in populations with a traditional lifestyle [10,[12][13][14][15][17][18][19][20]27,28,[50][51][52] or individuals exposed to nutrition associated with such a lifestyle [53]. Similar to the observations of earlier assessments of other Indigenous populations, alpha-and ß-diversity separated the urban site from the Indigenous locations in the assessment presented here. In part, poor sig-nificance was associated with the low number of samples taken from Colombians with an urban lifestyle. Urban microbiomes were dominated by Bacteriodetes, whereas Indigenous samples revealed a four times higher abundance of Proteobacteria. Even differences among the two compared Indigenous villages were noted, suggesting a considerable influence of minor peculiarities in community-specific lifestyle or prevalence of medical concomitant conditions, as suggested elsewhere [54].
When looking at the overall alpha-diversity (within sample variation), the urban individuals from Valledupar have a rather uniform microbiome composition across all individual samples. Odoribacter and Parabacteroides were exclusively identified in the microbiomes of the urban population of Valledupar. Individuals from both Indigenous Wiwa villages, in contrast, have a higher microbiome diversity. This is a consistent finding and emerging fact from many other global studies observing stool microbiomes in uncontacted Amerindian Yanomami [28], Papua New Guineans [55], and Tanzanian Hadza huntergatherers [56]. Such high diversity in Indigenous samples is clearly linked to traditional lifestyle, nutritional uptake, and lack of sanitation and personal hygiene [52]. In our collective, these alpha-diversity differences did not reach statistical significance, most likely due to uneven group sample sizes (126 Indigenous versus 9 urban). Thus, unweighted UniFrac distance analysis was conducted and showed with significance that the gut microbiome of both Wiwa villages was equally different from that of the urban samples. Including taxa abundance in a weighted UniFrac analysis confirmed these results, which also reached significance; however, the Tezhumake samples were more similar to those of Valledupar. These results matched well with observations from studying BaAka rainforest hunter-gatherers with their agriculturalist Bantu neighbors in the Central African Republic, which were compared to US American microbiome datasets from the Human Microbiome Project [19]. However, in our study, ß-diversity levels allowed a clear separation of urban from Indigenous populations.
Focusing on age and gender effects, it is noteworthy that Clostridia were exclusively found in children under the age of 5 years. This finding is in line with the previous observation that Clostridioides difficile is virtually absent in stool samples of Wiwa, irrespective of the abundance or absence of complaints of diarrhea [57][58][59].
One clear limitation of our and many other studies is the focus on pure taxonomy, as this does not immediately allow identification of causal members of the microbiota, and thereby does not fulfil the classical and molecular postulates of Koch [60]. However, even "omics" based microbiome studies, which at least allow a functional investigation of diverse microbiota, have a limitation as they rely on correctly and constantly curated gene annotation databases. PICRUSt (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States) [42] is a computational method allowing inference of potential genetic functions and capabilities using our dataset. We found four pathways enriched in the urban microbiomes from Valledupar. Among them, Arginine-Proline metabolism was found to be enriched in microbiomes of animal-protein-rich food consumers (Western diet) [16], which fits with the urban lifestyle in Valledupar. Peptococcus (now renamed as Peptostreptococcus) was the single genus exclusive to the enriched pathways from Valledupar microbiomes. These bacteria are normal inhabitants of the gut; however, in humans with underlying disease they can turn into pathogenes causing infections. The genera Actinobacillus, Methanobrevibacter, and Methanosphaera, and the Archaea genus vadinCA11, were unique to the Siminke enriched pathways. Most likely these pathway-specific and location-predicted genera are associated with lifestyle and nutrition. Metagenomic shotgun sequencing would be a better method to identify such functional associations in microbiomes of diverse populations.
Focusing on associations of microbiome peculiarities and detections of enteric pathogens, a number of associations were observed with high predictive accuracy. In summary, Sutterella was associated with the abundance of enterohemorrhagic Escherichia coli (EHEC), Faecalibacteria was associated with enteropathogenic Escherichia coli (EPEC) and helminth species Hymenolepsis nana and Enterobius vermicularis. Finally, Parabacteroides, Prevotella, and Butyrivibrio were associated with salmonellosis, EPEC, and helminth infections. Of note, similar observations have been reported previously, both from epidemiological assessments and experimental veterinary medical assessments [61][62][63][64][65]. Such coincidences in the literature confirm the findings of the assessments presented here, although potential causal relations are still largely unknown. Other microbiome associations, e.g., those previously reported for Entamoeba infections [66], were, however, not observed in our study. Further, associations of the abundance of Dialister with gastrointestinal symptoms were recorded. The potential protective effects of Dialister against enteropathogens as observed in a veterinary challenge study might explain this observation [67].
The study has a number of limitations. First, the number of available samples was low, particularly regarding the Colombians with an urban lifestyle. This limitation weakened several significance levels. Second, limited available funding affected the sequence depth. Third, complaints of diarrhea are socially discouraged in the Wiwa community, so any associations with reported gastrointestinal disease have to be interpreted with care. In line with this, the study aim was focused on epidemiological associations with gastrointestinal disease of such high severity that it was considered as worth reporting by the Wiwa. Thereby, mild forms of gastroenteritis, which were considered as mere annoyances rather than real diseases by the Indigenous population, will most certainly have gone undetected by this approach. As, however, such mild disease variants are in turn a less severe threat to the Wiwa, and considering the nevertheless considerable proportion even of self-reported gastroenteritis cases, we feel confident that the chosen case definition meets the needs of the assessed study population quite well.

Conclusions
Dysbiotic alterations of the gut microbiome in the Indigenous population of the Colombian Wiwa with frequent episodes of self-reported gastrointestinal disorders were confirmed. Thereby, epidemiological and pathogen-specific associations were shown. The information provided by this explorative assessment suggests microbiome alterations associated with the clinical conditions of the Indigenous population of the Wiwa. Confirmatory studies are needed to assess potential causal relationships. Future applicability of artificial intelligence applications will most likely be helpful for the assessment of complex associations at the microbiome level. In particular, regarding the fact that the samples were collected 10 years ago, a follow-up assessment might provide further insights into the temporary stability of the local microbiome compositions, especially due to the fact that the living conditions of the Wiwa have not principally changed in the meantime. In the case of a stable situation, the results could be used as baseline values in order to monitor potential effects and side effects induced by preventive medical interventions against gastrointestinal infection risks to the microbiomes of the Wiwa. Apart from their general microbiome dysbiosis, all volunteers were treated according to their results. In addition, a mass deworming took place, which is supposed to be repeated on a regular basis. The implementation of water filters was processed and evaluated, but had only a small impact [57]. Informed Consent Statement: Written informed consent was obtained from each participant or from the parent or legal guardian of a child before participation. All participants were informed about their results and received treatment, if appropriate. Data Availability Statement: All relevant data are provided within the manuscript and its appendices. Raw data can be made available on reasonable request. The raw sequencing files have been deposited at the European Nucleotide Archive (ENA, https://www.ebi.ac.uk/ena, accessed on 24 January 2023) under the accession number: PRJEB43871.
Acknowledgments: Jana Bull, Jessica Hansen, Annett Michel, and Simone Priesnitz are gratefully acknowledged for excellent technical assistance.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.
Appendix A Table A1. Phylum distribution in the three locations. Data indicate relative abundances (percentages of OTU counts). Phyla with relative abundances lower than 1% were filtered out.