Gut Microbiota Profiles Differ among Individuals Depending on Their Region of Origin: An Italian Pilot Study

Background and aims: Microbiota heterogeneity among humans is mainly due to genetic background, age, dietary habits, lifestyle and local environments. In this study we investigated whether the gut microbiota profile of Italian healthy volunteers could differ based on their geographical origin. Materials and Methods: 16S rRNA gene sequencing was employed to analyze the gut microbiota of 31 healthy volunteers from three different Italian regions: Apulia (South), Lazio (Center) and Lombardy (North). Results: Differences in microbiota composition were detected when the study participants were grouped by their region of origin and when they were classified based on age classes (p-values < 0.05). Also species richness was significantly different both according to Italian Regions (median richness: 177.8 vs. 140.7 vs. 168.0 in Apulia, Lazio and Lombardy; p < 0.001) and according to age classes (median richness: 140.1 vs. 177.8 vs. 160.0 in subjects < 32, 32–41 and > 41 years; p < 0.001), whereas the Shannon index and beta diversity did not change. Conclusions: This study identified differences in the gut microbiota composition and richness among individuals with the same ethnicity coming from three different Italian regions. Our results underline the importance of studies on population-specific variations in human microbiota composition leading to geographically tailored approaches to microbiota engineering.


Introduction
The human microbiota with its 10 14 symbiotic and pathogen microorganisms living within host's body, mostly (99%) in the gut [1], of almost 1.8 kg in weight, was considered the "forgotten" or "hidden" organ [2] due to its involvement in several physiological and pathological processes [3]. Although one third of our gut microbiota is in common with most of the people, the remaining two thirds is

Bioinformatic Analysis
Sequence data generated as FASTQ files, deposited in the Arrayexpress repository under accession code E-MTAB-8136, were analyzed using the 16S Metagenomics GAIA 2.0 software (http://www. metagenomics.cloud, Sequentia Biotech, Barcelona, Spain, 2017; Benchmark of Gaia 2.0 using published datasets available online at: http://gaia.sequentiabiotech.com/benchmark) which performs the quality control of the reads/pairs (i.e., trimming, clipping and adapter removal steps) through FastQC and BBDuk. The reads/pairs are mapped with BWA-MEM against the custom databases (based on NCBI). The average number of reads per samples was 203,516.6 (SD +/− 81922). Rarefaction curves indicate that an adequate sequence depth was achieved (Supplementary Figure S1). For each sample, the software provided the calculation of the relative abundance of bacterial taxa, as well as the Shannon alpha diversity index, the Chao1 richness estimator and the Bray-Curtis beta-diversity index at the species level. Phylogenetic Investigation of Communities by Reconstruction of Unobserved States (PICRUSt) analysis to predict functional profiles of the three microbial communities (grouped by region) based on 16S rRNA sequencing data was performed [28]. Predicted functions were categorized at Kyoto Encyclopedia of Genes and Genome orthology (KEGG) level 2. One-way ANOVA test was performed to assess significant differences in functional categories among the three regions.

Statistical Analysis
Participants' characteristics were reported as mean ± standard deviation or absolute frequency and percentages for continuous and categorical variables, respectively. For each continuous variable, the assumption of normality distribution was checked by means of Q-Q plots and Shapiro-Wilks test. For skewed continuous variables, medians along with interquartile range (i.e., IQR, first-third quartiles) were reported instead of means and non-parametric tests were performed. Overall comparisons between groups were assessed using ANOVA models (or Kruskal-Wallis test as appropriate) or Fisher exact test for continuous and categorical variables, respectively. The participants of the study were divided into three age classes with equal size (i.e., tertiles) (<32 years, between 32 and 41 years, and >41 years, respectively) and the relative abundances of microbial taxa were compared among region of origin, age classes and other lifestyle factors. The presence of a linear trend between the subject's age (as continuous variable) and relative abundance was estimated by Spearman correlation coefficient. p-values from all statistical tests were adjusted for multiple comparisons, within each taxonomic level, controlling the False Discovery Rate (FDR) at level 0.05, using Benjamini-Hochberg step-up procedure. Since participants' age distributions were not similar among regions (i.e., subjects in Lazio were significantly younger than the others), evidence of statistically differences among regions can be wrongly inferred. To this purpose, all those abundances of microbial taxa which were significantly associated with the region of origin were considered as "spurious results" if they also were significantly associated with subject's age classes. If found, all spurious results were discarded. In order to evaluate the species richness and diversity of the microbial communities in the fecal samples of the study participants, the Chao1 and the Shannon indices were calculated for each sample, respectively. Moreover, to measure the inter-individual dissimilarity of the gut microbiota, beta-diversity measures were calculated, through the Bray-Curtis metric, which describes how many species are shared between samples. All Bray-Curtis dissimilarities were summarized by means of Kruskal's non-metric MultiDimensional Scaling (MDS) method and were graphically represented into a bi-dimensional plot, where each point (defined by X and Y coordinates) represents an individual. Subjects who belonged to different groups (e.g., regions) were marked by different colors. To measure how similar an individual is to its own group (cohesion) compared to other groups (separation), the individual silhouette was estimated and the mean of all individual silhouettes within each group was computed. The individual silhouette ranges from −1 to +1, where a high value indicates that the individual is well matched to its own group and poorly matched to neighboring groups. If most individuals have a high silhouette value, then the clustering configuration is appropriate. On the contrary, if many individuals have a low or negative value, then the clustering configuration may have too many or too few groups.
A p-value < 0.05 was considered for statistical significance. Statistical analyses and plots were performed using the computing environment R (R Development Core Team, Vienna, Austria 2008, version 3.5.1.)

Microbiota Profile in Italian Subjects from Different Regions
Demographic and behavioral characteristics of the study participants grouped by regions were reported in Table 1.   The microbiota profile at all taxonomic levels was characterized for each of the 31 healthy individuals, and the mean relative abundance of bacterial taxa in Apulia, Lazio and Lombardy were represented in Figure 1   The microbiota profile at all taxonomic levels was characterized for each of the 31 healthy individuals, and the mean relative abundance of bacterial taxa in Apulia, Lazio and Lombardy were represented in Figure 1 ((a)-phylum level, (b)-class level, (c)-order level) and in Figure 2 ((a)family level, (b)-genus level; species level not shown)).  No significant differences were found among the three groups as for gender distribution, body mass index (BMI), type of diet, smoking habits, alcohol consumption and physical activity whereas a statistically significant difference was found with respect to participants' age (p = 0.005), where subjects in Lazio were significantly younger than the subjects who belonged to the other regions  The microbiota profile at all taxonomic levels was characterized for each of the 31 healthy individuals, and the mean relative abundance of bacterial taxa in Apulia, Lazio and Lombardy were represented in Figure 1 ((a)-phylum level, (b)-class level, (c)-order level) and in Figure 2 ((a)family level, (b)-genus level; species level not shown)).  No significant differences were found among the three groups as for gender distribution, body mass index (BMI), type of diet, smoking habits, alcohol consumption and physical activity whereas a statistically significant difference was found with respect to participants' age (p = 0.005), where subjects in Lazio were significantly younger than the subjects who belonged to the other regions No significant differences were found among the three groups as for gender distribution, body mass index (BMI), type of diet, smoking habits, alcohol consumption and physical activity whereas a statistically significant difference was found with respect to participants' age (p = 0.005), where subjects in Lazio were significantly younger than the subjects who belonged to the other regions (median: 37 vs. 26 vs. 42 years in Apulia, Lazio and Lombardy, respectively). When the participants were divided according to the three age classes, significant associations between relative abundances of microbial taxa and age classes were found for the families of Acholeplasmataceae, Bacillaceae, Peptostreptococcaceae, Pseudomonadaceae, for the genera of Acetivibrio, Bacillus, Defluviitalea, Eggerthella, Fenollaria, Hydrogenoanaerobacterium, Lachnotalea, Lutispora, Natranaerovirga, Paludibacter, Porphyromonas, Pseudomonas, Raoultibacter and for the species of Alistipes finegoldii, Anaerotruncus rubiinfantis, Bacteroides acidifaciens, Bacteroides clarus, Bacteroides sp. ANH 2438, Bifidobacterium sp. 113, Blautia luti, Butyricimonas sp. 180-3, Butyrivibrio crossotus, Clostridium sp. Culture Jar-19, Dialister sp. GBA27, Eubacterium coprostanoligenes, Faecalibacterium prausnitzii, Fenollaria timonensis, Flintibacter butyricus, Prevotella sp. 109, Robinsoniella sp. MCWD5, Ruminococcus sp. DJF_VR70k1 and Ruminococcus sp. ID1 (Table 2).
Furthermore, Spearman's correlation coefficients were estimated to investigate the presence of a linear association between subjects' age (considered as continuous variable) and the relative abundances of microorganisms. Indeed, the correlation analysis corroborates group comparisons analysis (with respect to age classes): while correlation achieves the highest statistical power in the detection of a linear trend, the group comparison analysis allow the detection of any potential non-linear associations. Statistically significant positive correlations with the age were found for Acholeplasmataceae, Bacillaceae, Peptostreptococcaceae, Acetivibrio, Bacillus, Defluviitalea, Eggerthella, Lachnotalea, Natranaerovirga, Paludibacter, Raoultibacter, Bacteroides clarus, Bacteroides sp. ANH. 2438, Eubacterium coprostanoligenes, Flintibacter butyricus, Robinsoniella sp. MCWD5 and Ruminococcus sp. ID, while a negative correlation was found for Alistipes finegoldii, Bifidobacterium sp. 113, Blautia luti, Butyricimonas sp. 180-3, Butyrivibrio crossotus, Dialister sp. GBA27 and Ruminococcus sp. DJF_VR70k1 (Table 2). These results strongly support that age influences gut microbiota composition. Conversely, no statistically significant difference in the microbial pattern was detected according to gender, alcohol consumption, BMI, type of diet (if Mediterranean or other), smoking habits and physical activity (data not shown). Once assessed that no other registered/collected demographic or behavioral characteristics of the subjects but age affected the fecal microbiota, we next analyzed its composition in relation to the geographical origin of the participants. The phyla of Cyanobacteria and Nitrospirae, the classes of Epsilonproteobacteria, Nitrospira, Oligosphaeria and Sphingobacteriia and the orders of Alteromonadales, Anaeroplasmatales, Bacillales, Corynebacteriales, Desulfobacterales, Desulfurellales, Micrococcales, Myxococcales, Nautiliales, Nitrospirales, Sphingobacteriales, Streptomycetales, Synergistales, Syntrophobacterales, Thermoanaerobacterales, Tissierellales, Veillonellales and Vibrionales resulted differently represented among the Apulia, Lazio and Lombardy groups (see Table 3). Table 2. Gut microbiota in subjects grouped by age classes (i.e., tertiles) and Spearman correlation coefficients. Data were reported as median along with interquartile range (first-third quartiles). Only significant results (i.e., p-values < 0.05 from any statistical test) were reported. All p-values were adjusted controlling for the False Discovery Rate at 0.05 level within each taxonomic level.        Many differences emerged also at lower taxonomic levels, some of which completely overlapping with the ones found in the classification by age groups (Figure 3).

Bacterial Diversity
The species richness (i.e., Chao1 index) significantly differed when subjects were classified by their region of origin (median richness: 177.8 vs. 140.7 vs. 168.0 in Apulia, Lazio and Lombardy; p = 5.7 × 10 −5 ), age classes (median richness: 140.1 vs. 177.8 vs. 160.0 in subjects < 32, 32-41 and > 41 years; p = 1.6 × 10 −4 ) and physical activity (median richness: 179.5 vs. 162.5 vs. 149.2 in subjects who performed "none", "little" and "moderate" physical activity; p = 6.8 × 10 −4 ), whereas an interesting trend was observed when individuals were grouped according to their adherence to Mediterranean diet rather than to other non-Mediterranean styles (p = 0.07) (Figure 4). Having excluded possible spurious results (i.e., all those associations which were also significant with respect to age classes), Finally, a total of 117 bacterial species were significantly different among the three regions (see Table 3, possible spurious results at the species level are underlined).
Conversely, independently from the classification criterion, no statistically significant difference was found for the Shannon index, which takes into account the number and the relative abundance of the species within each sample (data not shown). Furthermore, as for beta-diversity, Kruskal's non-metric MultiDimensional Scaling (MDS) plots of the Bray-Curtis dissimilarities did not reveal any significant clustering neither by regions ( Figure 5A) and age classes ( Figure 5B) nor by other classification criteria, as indicated by mean silhouette values around zero. The lack of Shannon index and beta diversity association may due to the sampling of multiple village sites and multiple cities within the study. versely, independently from the classification criterion, no statistically significant d d for the Shannon index, which takes into account the number and the relative ab cies within each sample (data not shown). Furthermore, as for beta-diversity, Krusk ultiDimensional Scaling (MDS) plots of the Bray-Curtis dissimilarities did not re t clustering neither by regions ( Figure 5a) and age classes (Figure 5b) nor b tion criteria, as indicated by mean silhouette values around zero. The lack of Shann diversity association may due to the sampling of multiple village sites and multip e study.  Conversely, independently from the classification criterion, no statistically significant difference was found for the Shannon index, which takes into account the number and the relative abundance of the species within each sample (data not shown). Furthermore, as for beta-diversity, Kruskal's nonmetric MultiDimensional Scaling (MDS) plots of the Bray-Curtis dissimilarities did not reveal any significant clustering neither by regions ( Figure 5a) and age classes (Figure 5b) nor by other classification criteria, as indicated by mean silhouette values around zero. The lack of Shannon index and beta diversity association may due to the sampling of multiple village sites and multiple cities within the study. In order to understand the functional meaning of the microbiota differences observed among the three regions, a PICRUSt prediction of the metagenomes was performed. A total of five pathways, namely infectious diseases, membrane transport, metabolism, replication and repair and signaling molecules and interactions emerged as significantly changed ( Figure 6). In order to understand the functional meaning of the microbiota differences observed among the three regions, a PICRUSt prediction of the metagenomes was performed. A total of five pathways, namely infectious diseases, membrane transport, metabolism, replication and repair and signaling molecules and interactions emerged as significantly changed ( Figure 6).

Discussion
It is well recognized that ethnicity and geographical locations are key factors influencing the composition and diversity of the gut microbiota [5]. However, beside understandable divergences across wide international geographical areas characterized by different socio-economic settings [16][17][18][19], important differences are emerging even in people with similar genetic and cultural background [21]. In this regard, the aim of the present study was to characterize the gut microbiota composition of healthy people belonging to three different Italian regions, namely Apulia, Lazio and Lombardy from the South, Center and North of the peninsula, respectively. All the study participants had the same ethnicity and were quite homogeneously distributed across the three regions as regards gender, BMI, physical activity, dietary, smoking and drinking habits. Only the age was not distributed similarly among the three regions, and since the age is an established factor influencing the composition of microbiota [12,19,29], it was taken into account as a potential confounding factor. However, it should be considered that, although gut microbiota composition changes throughout life, major shifts are described to occur in the transition from infancy to adulthood and then to old age, while it is documented to be quite stable through different stages of adult life [30][31][32]. Furthermore, differences emerged in the species richness of gut microbial communities according to the region, age class and physical activity criterion. Nevertheless, when we grouped our study population by age classes, differences in composition within the bacterial families, genera and species emerged. Although divided in three classes of age, our study population is mainly composed of young adults and adults (between 24 and 47 years of age with the exception of two individuals of 64 and 65 years old respectively), and includes only 1 subject older than 70 (which is considered the threshold age for defining an individual as elderly). We speculate that the significant differences observed in Chao1 index according to age are likely due to the acquisition of a mature (not aged)

Discussion
It is well recognized that ethnicity and geographical locations are key factors influencing the composition and diversity of the gut microbiota [5]. However, beside understandable divergences across wide international geographical areas characterized by different socio-economic settings [16][17][18][19], important differences are emerging even in people with similar genetic and cultural background [21]. In this regard, the aim of the present study was to characterize the gut microbiota composition of healthy people belonging to three different Italian regions, namely Apulia, Lazio and Lombardy from the South, Center and North of the peninsula, respectively. All the study participants had the same ethnicity and were quite homogeneously distributed across the three regions as regards gender, BMI, physical activity, dietary, smoking and drinking habits. Only the age was not distributed similarly among the three regions, and since the age is an established factor influencing the composition of microbiota [12,19,29], it was taken into account as a potential confounding factor. However, it should be considered that, although gut microbiota composition changes throughout life, major shifts are described to occur in the transition from infancy to adulthood and then to old age, while it is documented to be quite stable through different stages of adult life [30][31][32]. Furthermore, differences emerged in the species richness of gut microbial communities according to the region, age class and physical activity criterion. Nevertheless, when we grouped our study population by age classes, differences in composition within the bacterial families, genera and species emerged. Although divided in three classes of age, our study population is mainly composed of young adults and adults (between 24 and 47 years of age with the exception of two individuals of 64 and 65 years old respectively), and includes only 1 subject older than 70 (which is considered the threshold age for defining an individual as elderly). We speculate that the significant differences observed in Chao1 index according to age are likely due to the acquisition of a mature (not aged) microbiota ( Figure 4B).
Physical exercise is known to be associated to a healthier and more diverse microbiota, which seems to be in contrast with the results from our study. It should be considered, however, that in our study population people performing little and moderate physical activity overlapped with younger age, which would explain the reduced richness observed. All these and many other differences, also within higher taxonomic levels, were observed when the 31 subjects were grouped by regions. Noteworthy is that many taxa, which were represented in almost all the members of one or two regions, were instead completely absent in the other(s), supporting that gut microbiota and geographic localization are closely interlinked; this is, for example, the case of Cyanobacteria which were unique to Lombards, Nitrospirae only found in Apulians, and many other lower taxa. A predictive analysis of the functional pathways affected by the microbiota diversity among the three regions revealed five categories significantly changed. It should be kept in mind, however, that PICRUSt only performs predictions and that more accurate functional profiles require metagenomic approaches which will be the subject of future studies.
Our data, emphasize the importance of the selective pressure in shaping gut microbial ecology by numerous common environmental exposures. This could be the reflection of the different exposure of these individuals to industrial presence [33] (considering the north part more industrialized as compared to the south of Italy) agricultural chemicals like fertilizers (as the southern part of Italy is an agriculture-based Economy) or the natural source of water people drinks. As specified above, our study population was homogeneous from the ethnical point of view, so we can isolate the geographical effect from the ethnical factor. A very interesting point of view in trying to understand the bases of such a spatial microbiota variability comes from the application of the ecological theory according to which local diversification of microbiota could be introduced by processes of community ecology, such as dispersal, diversification, environmental selection and ecological drift, as exhaustively discussed by Costello et al. [34]. Further investigation would be required to shed light on the factors underlying such a difference in microbiota composition among the Italian regions.

Conclusions
Overall, our results point out the existence of a variability in the microbiota composition of populations closely related from geographical point of view. This interesting link between small-scale geography and gut microbiota deserves further investigation and poses important implications for the development of microbiota-based clinical approaches. Indeed, the role of microbiota in the onset, progression and response to therapies in a large number of diseases, is increasingly recognized both as a diagnostic marker [22,24,26] and as a manipulable target for improving the clinical course of pathologies [23,35]. Therefore, considering the existence of a variability within a limited geographic area could be particularly important in order to set up tailored therapeutic approaches. Moreover, attention should be paid when setting a specific microbial pattern as a reference for health or disease, since it may be strongly influenced by the population used to generate the data [21]. Further efforts should be devoted to identify the factors underlying the association between microbiota and small-scale geography [36]. Funding: This research was supported by the "Ricerca Corrente RC2018" and the "Ricerca Corrente RC2019" fundings granted by the Italian Ministry of Health to VP.