Physical Activity and Dietary Composition Relate to Differences in Gut Microbial Patterns in a Multi-Ethnic Cohort—The HELIUS Study

Physical activity (PA) at recommended levels contributes to the prevention of non-communicable diseases, such as atherosclerotic cardiovascular disease (asCVD) and type 2 diabetes mellitus (T2DM). Since the composition of the gut microbiota is strongly intertwined with dietary intake, the specific effect of exercise on the gut microbiota is not known. Moreover, multiple other factors, such as ethnicity, influence the composition of the gut microbiota, and this may be derived by distinct diet as well as PA patterns. Here we aim to untangle the associations between PA and the gut microbiota in a sample (n = 1334) from the Healthy Life In an Urban Setting (HELIUS) multi-ethnic cohort. The associations of different food groups and gut microbiota were also analyzed. PA was monitored using subjective (n = 1309) and objective (n = 162) methods, and dietary intake was assessed with ethnic-specific food frequency questionnaire (FFQ). The gut microbiota was profiled using 16S rRNA gene amplicon sequencing, and the functional composition was generated with the Phylogenetic Investigation of Communities by Reconstruction of Unobserved States (PICRUSt2). Associations were assessed using multivariable and machine learning models. In this cohort, a distinct gut microbiota composition was associated with meeting the Dutch PA norm as well as with dietary intake, e.g., grains. PA related parameters such as muscle strength and calf circumference correlated with gut microbiota diversity. Furthermore, gut microbial functionality differed between active and sedentary groups. Differential representation of ethnicities in active and sedentary groups in both monitor methods hampered the detection of ethnic-specific effects. In conclusion, both PA and dietary intake were associated with gut microbiota composition in our multi-ethnic cohort. Future studies should further elucidate the role of ethnicity and diet in this association.


Introduction
Urbanization and associated behavioral changes have led to humans being physically less active [1,2]. Insufficient physical activity (PA), defined as less than 150 min of moderate aerobic activity or 75 min of vigorous aerobic activity throughout the week, characterizes

Characterization of the Study Population by Physical Activity Level
Characteristics of the included complete study population (n = 1334) and groups stratified by the subjective monitor (Short Questionnaire to Assess Health-enhancing physical activity (SQUASH), n = 1309, sedentary n = 441, active n = 868) and by the objective monitor (ActiHeart, n = 162, sedentary n = 100, active n = 62) are presented in Table 1. The subjective monitor data show that 66% of the study population meets the Dutch PA guideline targets. The active participants were older than the sedentary (53.4 ± 10.5 vs. 49.0 ± 10.5 years of age, t(1307) = −7.138, p < 0.001), had a lower body mass index (BMI) (26.7 ± 4.5 vs. 28.0 ± 5.2 kg/m 2 , t(772,552) = 4.605, p < 0.001) and relative fat mass (30.0 ± 9.1 vs. 31.7 ± 9.5%, t(1307) = 3.171, p = 0.002), as well as smaller waist circumference (93.3 ± 11.7 vs. 95.3 ± 13.6 cm, t(779,811) = 2.591, p = 0.007). The active participants had a higher maximum muscle strength than the sedentary participants (212.3 ± 76.6 vs. 201.0 ± 77.8 N, t(1281) = −2.493, p = 0.001), as well as higher creatinine levels (73 µmol/L [95% CI 72.7-76.5] vs. 75 µmol/L [95% CI 75.1-77.2], Mann-Whitney U = 208,719, p = 0.007). In regard to ethnicity, individuals of European descent were relatively overrepresented in the active group while other ethnicities were more frequently represented in a sedentary group, except Moroccans. Only waist-to-hip-ratio (WHR) was significantly lower in the active group compared to the sedentary group, when stratified by objective monitoring of the physical activity levels (PAL) (0.91 ± 0.1 vs. 0.93 ± 0.1, t(159) = 2.133, p = 0.034). Table 1. Characterization of the study population by the adherence to the Dutch physical activity (PA) guideline based on the Short Questionnaire to Assess Health-enhancing physical activity (SQUASH), and by PA level based on ActiHeart. Chi-Square for categorical variables, t-test for parametric variables and non-parametric Mann-Whitney test were used for statistical testing to compare the active vs. sedentary group. Data are expressed as means ± SD, absolute numbers with percentages or medians with 95% CI.

Dietary Intake in Relation to Physical Activity
The summarized data on dietary intake retrieved from the ethnic-specific semi quantitative food frequency questionnaires (FFQ) did not show significant differences between participants in the active group compared to those in the sedentary group, apart from a slightly higher daily intake of dietary fiber and alcohol consumption in the active group (Table 2). When taking total daily energy intake into consideration, the significance of the increased fiber intake by the active group disappears (2.7 ± 0.7 vs. 2.6 ± 0.7 g/MJ, t(1307) = −0.884, p = 0.38). When stratifying alcohol consumption by gender, we saw a significantly higher consumption in men than in women (2.9 ± 6.2 vs. 1.7 ± 3.5 g/d, t(1051,566) = 6.644, p < 0.001). On the other hand, when analyzing the dietary data in a more detailed manner by food groups, we saw a significantly higher consumption of fruits, mixed foods, dairy and non-alcoholic beverages in the active group compared to the sedentary group after correction for multiple testing ( Figure 1A,B, Supplementary Figure S2, Wilcoxon test with Benjamini-Hochberg p-value adjustment).

Dietary Intake in Relation to Physical Activity
The summarized data on dietary intake retrieved from the ethnic-specific semi quantitative food frequency questionnaires (FFQ) did not show significant differences between participants in the active group compared to those in the sedentary group, apart from a slightly higher daily intake of dietary fiber and alcohol consumption in the active group (Table 2). When taking total daily energy intake into consideration, the significance of the increased fiber intake by the active group disappears (2.7 ± 0.7 vs. 2.6 ± 0.7 g/MJ, t(1307) = −0.884, p = 0.38). When stratifying alcohol consumption by gender, we saw a significantly higher consumption in men than in women (2.9 ± 6.2 vs. 1.7 ± 3.5 g/d, t(1051,566) = 6.644, p < 0.001). On the other hand, when analyzing the dietary data in a more detailed manner by food groups, we saw a significantly higher consumption of fruits, mixed foods, dairy and non-alcoholic beverages in the active group compared to the sedentary group after correction for multiple testing ( Figure 1A,B, Supplementary Figure S2, Wilcoxon test with Benjamini-Hochberg p-value adjustment).

Physical Activity Associates with Gut Microbiata Composition
To study the interaction between PA, dietary intake and gut microbiota composition, we analyzed the β-diversity with a permutational analysis of variance (PERMANOVA), that indicates differences in microbial communities between individuals using both nonphylogenetic (Bray-Curtis) and phylogenetic (unweighted UniFrac and weighted UniFrac) dissimilarities. In a model adjusting for age, sex, BMI and ethnicity, the dissimilarity metrics Bray-Curtis (p = 0.012, R2 = 0.14%) and weighted UniFrac (p = 0.047, R2 = 0.17%) (which take microbial abundances into account) were significantly different between the sedentary and active groups, indicating a PA-driven association. The PA groups had different phyologenetic dissimilarity metrics (unweighted UniFrac, p = 0.027, R2 = 0.12%) in a model without any covariates. However, when adding the above mentioned covariates, this did not remain significant (p = 0.136, R2 = 0.09%) (Figure 2). In a more extensive model, adjusting for age, sex, BMI, ethnicity and diet covariates (fiber, energy and macronutrient intake), Bray-Curtis dissimilarity index (p = 0.022, R2 = 0.13%) and weighted UniFrac (p = 0.039, R2 = 0.16%) remained significant (details in Supplementary These measures lost significance when adjusting for the following covariates: age, sex, BMI, and ethnicity ( Figure 3A,B). The Simpson index showed the opposite, it was significant when adjusting for covariates (p = 0.096, W statistic = 180,626, covariate adjusted p = 0.002, W statistic = 171,050). Detailed analyses revealed that the relative abundance of 173 gut microbial taxa differed significantly (adjusted with covariates age, sex, BMI, ethnicity and diet) between the participants stratified by PA, indicating a PA-driven association (Supplementary Table S2). The abundance of members of Firmicutes, including Lachnospiraceae and Veillonella, were significantly higher in the active group whereas Enterobacteriales Enterobacteriaceae, Escherichia/Shigella and Klebsiella belonging to the phylum of Proteobacteria were more abundant in the sedentary group. Members from the phylum of Bacteroidetes, such as Prevotella_2, and members from the phylum of Firmicutes, such as Roseburia hominis, Erysipelatoclostridium and Lachnoclostridium, were also more abundant in the sedentary group.

The Gut Microbiome Predicts Subjectively and Objectively Monitored Physical Activity-A Machine Learning Model
To investigate whether the gut microbiota can predict PA monitored by either subjective or objective methods, we employed a machine learning model. The model built on gut microbiota abundance was able to predict the objective monitoring (ActiHeart, n = 162) and the subjective monitoring (SQUASH, n = 1309), with an area under the curve (AUC) of 0.81 ± 0.08 and 0.69 ± 0.02, respectively ( Figure 4A,B). Accordingly, gut microbiota abundance was able to predict objective monitoring without participants with antibiotic use (AUC 0.74 ± 0.08, n = 151) (Supplementary Figure S3). Irrespective of the type of monitoring, the most predictive bacterial taxa belong to phylum of Firmicutes, Bacteroidetes, Proteobacteria, Actinobacteria and Lentisphaerae. Specifically, Blautia and Lachnospiraceae, both members from the phylum Firmicutes belonging to the family of Lachnospiraceae, were predictive by both monitoring methods.
Metabolites 2021, 11, x FOR PEER REVIEW 7 of 22 p = 0.9). Wilcoxon test was used as a statistical test to compare active group vs. sedentary group. Covariates used: ethnicity, age, sex and body mass index (BMI).

The Gut Microbiome Predicts Subjectively and Objectively Monitored Physical Activity-A Machine Learning Model
To investigate whether the gut microbiota can predict PA monitored by either subjective or objective methods, we employed a machine learning model. The model built on gut microbiota abundance was able to predict the objective monitoring (ActiHeart, n = 162) and the subjective monitoring (SQUASH, n = 1309), with an area under the curve (AUC) of 0.81 ± 0.08 and 0.69 ± 0.02, respectively ( Figure 4A,B). Accordingly, gut microbiota abundance was able to predict objective monitoring without participants with antibiotic use (AUC 0.74 ± 0.08, n = 151) (Supplementary Figure S3). Irrespective of the type of monitoring, the most predictive bacterial taxa belong to phylum of Firmicutes, Bacteroidetes, Proteobacteria, Actinobacteria and Lentisphaerae. Specifically, Blautia and Lachnospiraceae, both members from the phylum Firmicutes belonging to the family of Lachnospiraceae, were predictive by both monitoring methods.

Parameters Related to Physical Fitness Associate with Variance of the Gut Microbiome
Since the active and the sedentary participants differed with respect to physical parameters such as calf circumference, muscle strength and creatinine (see Table 1), we next investigated whether these PA-related parameters were associated with gut microbiota composition ( Table 3). As estimated by linear regression, approximately 18% of the variance in richness (p = 0.038) and the inverse Simpson index (p = 0.030) were explained by muscle strength in a model adjusted for ethnicity, age, sex, BMI and diet (energy, macronutrients and fiber). Calf circumference explained approximately 18% of the variance in richness (p = 0.005), 20% of the variance in Shannon index (p = 0.021) and 25% of the variance in Faith's phylogenetic diversity (p = 0.004) in the model with ethnicity, age, BMI, sex and diet, but also in the model without dietary adjustment, indicating association independent of diet. Finally, approximately 18% of the variance in richness, 25% of variance in Faith's phylogenetic diversity, 11% of variance in the Simpson index and 18% of variance in the inverse Simpson index were explained by creatinine in the covariate adjusted model (including covariates ethnicity, age, sex, BMI and diet). Moreover, PA-related parameters correlated (Spearman) significantly with multiple different taxa (Supplementary Table S3).

Functionality of the Gut Microbiome in Relation to Physical Activity and Related Parameters
Due to the differences in the gut microbiota composition between the PA levels, and the PA-related parameters associated with gut microbiota composition (see Table 3, Supplementary Table S3), we analyzed taxonomically-linked metabolic pathways from the gut microbiome data by using the Phylogenetic Investigation of Communities by Reconstruction of Unobserved States (PICRUSt2) [33]. Stratified by PA levels based on SQUASH, and after adjusting for ethnicity, BMI, age and sex and upon correcting for multiple testing, 31 microbial metabolic pathways differed between the groups; when adding diet as a covariate, 23 were significant (FDR-p < 0.05) ( Figure 5, Wilcoxon test of residuals after adjusting for covariates with linear regression. Details in Supplementary Table S4). The active group was inferred to have lower abundance of pathways related to microbial arginine metabolism (L-arginine degradation II [AST pathway], superpathway of L-arginine, putrescine, and 4-aminobutanoate degradation [ARGDEG pathway], superpathway of L-arginine and L-ornithine degradation [ORNARGDEG pathway], superpathway of L-ornithine degradation [ORNDEG pathway]). Specific taxa, such as Enterobacteriales and Enterobacteriaceae, correlated significantly with these arginine pathways (r = 0.99, FDR-p < 0.001), as well as Escherichia/Shigella (r = 0.86, FDR-p < 0.001), Klebsiella (r = 0.50, FDR-p < 0.001), Proteobacteria (r = 0.34, FDR-p < 0.001) and Veillonella (r = 0.18, FDR-p < 0.001). These taxa, apart from Veillonella, were more abundant in the sedentary group than in the active group (see Supplementary Table S1). Furthermore, the AST pathway (r = −0.10, FDR-p = 0.000), ARGDEG pathway (r = −0.13, FDR-p = 0.000), ORNARGDEG pathway (r = −0.13, FDR-p = 0.000), and ORNDEG pathway (r = −0.13, FDR-p = 0.000) correlated negatively with CK, which in turn correlated negatively with taxa such as Enterobacteriales, Enterobacteriaceae, Escherichia/Shigella and Klebsiella (Supplementary Table S5). Pathways related to carboxylate degradation (D-galactarate degradation I (GALACTARDEG pathway), and the superpathway of D-glucarate and D-galactarate degradation (GLUCARGALACTSUPER pathway)) were lower in the active group compared to the sedentary group. These pathways correlated positively with members of the Bacteroides geus (r = 0.51, FDR-p < 0.001). Muscle strength was another PA-related parameter, which correlated negatively with carboxylate degradation pathways (r = −0.091, FDR-p = 0.015).

Diet and Specific Food Groups Characterize the Composition of the Gut Microbiome
Dietary intake is one of the main drivers of gut microbiome composition [17]. Therefore, we questioned how much of the differences in gut microbiome composition were attributable to differences in dietary intake. We first explored the linear regression of parameters of dietary intake with the microbial α-diversity measured by the Shannon index. After adjusting for ethnicity, age, sex and BMI, the average intake of fat (E-%), carbohydrates (E-%) and grains had the strongest association with the Shannon index ( Figure 6A). Specifically, foods including low-fiber (LF) rice and pasta, olive oil, vegetables, other oils and salad dressings within these major food groups explained a large part of the variance. Interestingly, when excluding ethnicity itself as a covariate, ethnic-specific foods, such as roti and wine-leaves explain a large part of the variance, potentially indicating how ethnic-specific foods come to act as a proxy for the effect of ethnicity on the Shannon index ( Figure 6B). Of note, the largest percentage of explained variance in the Shannon index by any food, without adjusting for ethnicity, was approximately 10% (df = 4). When adding ethnicity as a covariate, the explained variance increases to approximately 20% (df = 8), suggesting that ethnicity had a rather strong effect on the analysis of food groups. Next, we investigated if specific taxa correlated with the 13 major food groups and found that different food groups show distinct correlations with different taxa (Supplementary Table  S6). Vegetarian products correlated significantly with 256 taxa including species such as Akkermansia muciniphila (r = 0.31, FDR-p < 0.001). Grains correlated with 22 different taxa, while vegetables correlated (Spearman) significantly with five bacterial taxa, including Clostridiales Lachnospiraceae (r = 0.10, FDR-p = 0.01) and Butyricicoccus (r = 0.09, FDR-p =

Diet and Specific Food Groups Characterize the Composition of the Gut Microbiome
Dietary intake is one of the main drivers of gut microbiome composition [17]. Therefore, we questioned how much of the differences in gut microbiome composition were attributable to differences in dietary intake. We first explored the linear regression of parameters of dietary intake with the microbial α-diversity measured by the Shannon index. After adjusting for ethnicity, age, sex and BMI, the average intake of fat (E-%), carbohydrates (E-%) and grains had the strongest association with the Shannon index ( Figure 6A). Specifically, foods including low-fiber (LF) rice and pasta, olive oil, vegetables, other oils and salad dressings within these major food groups explained a large part of the variance. Interestingly, when excluding ethnicity itself as a covariate, ethnic-specific foods, such as roti and wine-leaves explain a large part of the variance, potentially indicating how ethnic-specific foods come to act as a proxy for the effect of ethnicity on the Shannon index ( Figure 6B). Of note, the largest percentage of explained variance in the Shannon index by any food, without adjusting for ethnicity, was approximately 10% (df = 4). When adding ethnicity as a covariate, the explained variance increases to approximately 20% (df = 8), suggesting that ethnicity had a rather strong effect on the analysis of food groups. Next, we investigated if specific taxa correlated with the 13 major food groups and found that different food groups show distinct correlations with different taxa (Supplementary  Table S6). Vegetarian products correlated significantly with 256 taxa including species such as Akkermansia muciniphila (r = 0.31, FDR-p < 0.001). Grains correlated with 22 different taxa, while vegetables correlated (Spearman) significantly with five bacterial taxa, including Clostridiales Lachnospiraceae (r = 0.10, FDR-p = 0.01) and Butyricicoccus (r = 0.09, FDR-p = 0.03). Food groups that also showed multiple significant correlations were fruits (29 taxa); eggs (32 taxa); alcoholic beverages (89 taxa); meat, poultry and fish (30 taxa); mixed foods (28 taxa); nuts and seeds (27 taxa); sweet and savory foods (16 taxa); and salad dressings (3 taxa). The food group of non-alcoholic beverages correlated with only one taxa and dairy products did not correlate with any taxa.
(a) (b) Figure 6. Explained variance of Shannon α-diversity index (n = 1308) by the major 13 foods groups (including macronutrients, carbohydrate E-%, protein E-% and fat E-%) and 52 foods in multivariable linear regression: (a) model adjusted for ethnicity, sex, age, and body mass index (BMI), 8 df; (b) model adjusted for sex, age, and BMI, 4 df. Significance level colored with blue when FDR-p < 0.05, grey when FDR-p > 0.05. E-%, energy intake expressed as energy percentage of total energy; HF cheese, high-fat cheese; HF dairy products, high-fat dairy products; HF grains, high-fiber grains; HF rice pasta, high-fiber rice pasta; LF cheese, low-fat cheese; LF grains, low-fiber grains; LF dairy products, low-fat dairy products; LF rice and pasta, low-fiber rice and pasta. Figure 6. Explained variance of Shannon α-diversity index (n = 1308) by the major 13 foods groups (including macronutrients, carbohydrate E-%, protein E-% and fat E-%) and 52 foods in multivariable linear regression: (a) model adjusted for ethnicity, sex, age, and body mass index (BMI), 8 df; (b) model adjusted for sex, age, and BMI, 4 df. Significance level colored with blue when FDR-p < 0.05, grey when FDR-p > 0.05. E-%, energy intake expressed as energy percentage of total energy; HF cheese, high-fat cheese; HF dairy products, high-fat dairy products; HF grains, high-fiber grains; HF rice pasta, high-fiber rice pasta; LF cheese, low-fat cheese; LF grains, low-fiber grains; LF dairy products, low-fat dairy products; LF rice and pasta, low-fiber rice and pasta.

Discussion
In the present study, we performed a detailed exploration on the association between fecal microbiota composition and PA, as well as dietary intake in the multi-ethnic HELIUS cohort. We found that gut microbiota composition differs between participants adhering to Dutch PA guidelines (active group) compared to the non-adhering (sedentary group). In addition to this link between PA, PA-related parameters and gut microbiota composition, the intake of specific dietary components, such as grains, was a strong factor explaining the variance in the gut microbiota composition.
We have shown that PA is associated with differences in the gut microbial diversity, and that the gut microbiota composition accurately predicts PA when measured objectively or subjectively. Yet, the direction of the associations and the most important taxa were similar in both methods. In other studies, it has been found that participants with objectively monitored high PA level have different gut microbiota diversity compared to sedentary participants [34]. The same has been described for college students whose activity was monitored by questionnaires [35], and several microbial taxa differed between active and sedentary. In our cohort, Lachnospiraceae was associated with PA, as well as Blautia belonging to the phylum of Firmicutes, with relative abundances being higher in active participants [36]. Lachnospiraceae has previously been associated with vigorous PA in the cohort of college students [35], and it has been associated together with higher richness to increased cardiorespiratory fitness (oxygen consumption [VO2]), that is likely to increase upon vigorous physical activity [37]. Lachnospiraceae has also been linked to the production of short-chain fatty acids (SCFA) upon exercise; the concentrations of fecal SCFA correlated positively with Lachnospiraceae in lean women upon an aerobic exercise program [38]. Moreover, our pairwise comparison detected Veillonella to be more abundant in participants adhering to the PA guideline, which is in line with recent findings that Veillonella is responsible for catalyzing intestinal lactate into propionate (and acetate) thereby improving the performance of athletes [21], and it also associates with vigorous physical activity in another cohort [13].
In general, higher α-diversity and richness have been associated with greater metabolic health and insulin sensitivity, which is in line with the results from our study; calf circumference associated with α-diversity and richness, and muscle strength associated with richness [39]. Improved glucose homeostasis or increased muscle endurance induced by exercise might be mediated via the gut microbiota, as seen in exercised conventional versus gut microbiota-depleted mice [40]. Muscle strength has been associated with lower risk for T2DM in the HELIUS cohort [41], and in other cohorts it was a positive predictor of survival together with calf circumference [42,43], and associated with gut microbiota composition in the elderly [44]. Beneficial metabolic effects of muscle strength can be partly due to the gut microbial changes upon exercise, but well-controlled human studies with sufficient sample size and metabolic outcomes (e.g., insulin sensitivity) are lacking. Some evidence comes from translational in vivo mice work where a fecal microbial transplant (FMT) with feces from high-functioning elderly individuals improved murine muscle strength while feces from low-functioning elderly individuals did not improve body mass and exercise endurance [45].
Consistent with previous studies, we found a positive association between plasma creatinine and microbial richness as well as α-diversity [46]. Although CK was previously associated with increased microbial diversity in rugby players, we did not find an association between gut microbiota and CK [18,22]. Interestingly, we did find an association between CK and arginine and ornithine microbial pathways. These pathways had a strong association with specific taxa such as Enterobacteriales Enterobacteriaceae, Eschericia/Shigella and Klebsiella, which were more abundant in the sedentary participants compared to the active participants. Hypothetically, PA might shape microbial metabolism to the needs of the host, e.g., production of energy precursors while exercising [18,21]. Yet, there is a need for more well-controlled intervention studies in different populations in order the conclude the independent effect of PA in the modulation of the gut microbiota, and its subsequent influence on metabolism.
Regarding the diet, we have shown that the intake of macronutrients was similar between sedentary and active participants, but the consumption of various food groups differed between the two groups. It has been reported that lifestyle behavior tends to cluster; health-conscious people tend to eat a healthier diet and exercise more than people with poorer lifestyle behavior, which further associates with disease prevalence [47]. Additionally, physically active individuals tend to have different dietary habits based on the type of sports, i.e., bodybuilders have higher intake of protein and lower intake of carbohydrate whereas runners show the opposite [48].
In our cohort, approximately 5% of the variance in the α-diversity Shannon index could be explained by the intake of carbohydrates, especially grains. Often, the modulatory impact of grains and carbohydrates on the gut microbiota is explained by fiber [49,50]. Fiber may influence the production of SCFA [51,52], which in turn has been associated with beneficial effects on a variety of metabolic and cardiovascular parameters, such as insulin and blood pressure [53,54]. However, in our cohort, the intake of fiber did not explain a significant part of the variance in Shannon index. Thus, the impact of foods on the Shannon diversity may not solely be driven by the fiber from grains. In contrast to the effects of fibers on the Shannon index, the intake of fat, especially oils rich in PUFA and MUFA, largely explained α-diversity as shown in another cohort [55]. Previous studies already showed that the total fat intake correlates with the Shannon index and richness where saturated fatty acids associate negatively with richness, phylogenetic diversity and number of observed taxa, while MUFA correlate negatively with number of taxa and phylogenetic diversity, and PUFA associate negatively only with phylogenetic diversity [49]. It is likely that food groups have the potential to explain the variation of the gut microbiota by capturing the synergy of different components in foods [17], such as biologically active compounds, e.g., phytochemicals and flavonoids in cereals [56,57] or chemical structures of fats [49,58]. Yet, the composition of the food groups cannot be overlooked. It is likely that some important associations were not identified since some specific foods were classified to larger groups, e.g., the food group 'meat, poultry and fish' combines both processed and unprocessed proteins of animal origin, though they may have distinct impacts on the gut microbiota and health [59,60]. However, the categories were kept rather wide because of the primary aim to identify associations between PA and gut microbiota. All in all, ours as well as previous studies show that dietary intake greatly explains the variation in the gut microbiota composition [14]. However, a healthy lifestyle includes both PA and healthy foods, and both are likely to influence the gut microbiota composition.
The main strength of this study is its unique multi-ethnic population, which includes participants from five different ethnic backgrounds living in the same geographical location. Another strength is the sample size of our study, the overall sample being sufficient enough, potentially one of the largest among studies investigating PA and its relationship to the gut microbiome. A large sample size is preferable when investigating gut microbiota, diseases and lifestyle [61]. However, in our cohort, the distribution of ethnicities between two PA extremes was uneven; nearly 40% of participants meeting the recommended amount of PA are of Dutch origin, influencing our analysis into the gut microbiome. We have previously shown that the ethnic background influences the composition of the gut microbiota [30]. This may be derived by distinct eating [29] as well as PA patterns [31]. However, detailed analyses comparing different ethnicities in relation to physical activity or diet were not conducted in this study due to small sample size when stratifying per ethnicity. We tried to overcome ethnic differences in dietary habits by using ethnic-specific FFQs, and by adjusting the analyses for ethnicity, but the modulatory impact of exercise on the gut microbiota is likely to be dependent on the population studied, their traditional diet and the type of exercise they perform [13].
This study has major limitations. Firstly, the cross-sectional analysis precluded derivation of causality. Secondly, self-reported PA by the SQUASH in this study is susceptible to reporting bias, and the correlation between objective and subjective PA are not as strong as previously shown [62]. Furthermore, most of the participants who wore an ActiHeart were of Dutch origin potentially influencing our analysis on the gut microbiota. Additionally, cardiorespiratory fitness is not assessed in the HELIUS cohort, and thus, we are not able to analyze this. Thirdly, 16S rRNA sequencing of the gut microbiome profiles the taxonomic composition but does not allow for a direct assessment of the functional microbial profiles. This type of sequencing is less powerful in detecting biologically relevant taxa when compared to shotgun metagenomics [63,64]. Moreover, there is no information recorded whether the stool samples were frozen or fresh when received from the study participants. This could potentially influence the results because the storage conditions of the stool samples influence the microbiota profile of the samples [65].
In conclusion, PA was associated with a distinct composition of the gut microbiome in a multiethnic population. The gut microbiota was also associated with the intake of specific dietary elements, most notably grains, independent of ethnicity. PA-related parameters such as muscle strength, calf circumference and creatinine correlated with the gut microbiota diversity. Furthermore, specific microbial pathways may be enriched in the gut microbiota of participants with different levels of PA. Together, this calls for further investigation of the influence of PA on the gut microbial composition and gut microbial metabolism, in relation to diet.

Study Population
The HELIUS study comprises 24,789 adult individuals (18-70 years of age) randomly sampled by ethnic origin from Amsterdam area in The Netherlands from 2011 to 2015 [32]. Participants were of Dutch, South-Asian Surinamese, African Surinamese, Ghanaian or Turkish or Moroccan origin. After a positive response, participants received a confirmation letter of the appointment for the physical examination, including a digital or paper version of the questionnaire (depending on the preference of the subject). Participants who were unable to complete the questionnaire themselves were offered assistance from a trained ethnically-matched interviewer.
This cross-sectional study was conducted with a subsample of the HELIUS cohort, namely including those of whom data on physical activity, dietary data and gut microbiota composition were present. The subsample was divided into two sets: the participants with subjective monitor data included 1309 participants (sedentary n = 441, active n = 868) who had completed SQUASH, and who had not used antibiotics in the past three months before the fecal sample (participants where this information was missing were also excluded from this set); and the participants with objective monitoring data of 162 participants (sedentary n = 100, active n = 62) who had worn an accelerometer (participants with antibiotic use (n = 11, unknown n = 14) were included in this dataset).

Body Composition, Function, and Biochemistry
Anthropometrics, including weight, height, BMI, waist and hip circumference and WHR, as well as calf and thigh circumferences were assessed in the study visit. Body composition was assessed by arm-to-leg bioelectrical impedance analysis (BIA) that measures impedance, resistance, and reactance in Ohm at 50 Hz (Bodystat 1500 analyzer, Bodystat Ltd., Isle of Man, Cronkbourne, Douglas, UK). Plasma creatinine and CK levels were determined from fasting venous blood samples using standard laboratory techniques. Muscle strength was measured as handgrip strength with the Citec handheld dynamometer (CIT Technics, Haren, The Netherlands). The average of the two highest measurements in Newton (N) from both hands within one minute intervals was used for the final value as previously described [41].

Subjective Physical Activity Monitor
PA was monitored by the SQUASH, which was developed by the Dutch National Institute of Public Health and the Environment (RIVM) [66]. It assesses self-reported daily activities, including commuting in leisure, household and occupation time (i.e., walking and cycling) as well as other exercise habits (i.e., gardening and swimming); it also indicates whether PA was in accordance with the Dutch PA guideline. Self-reported activities per day were converted to minutes per week (min/week). If weekly PA was more than 30 min per session and it was carried out at least five days per week (in total of 150 min/week), it was considered to meet the Dutch PA guidelines, which is in accordance with the international PA guideline of the World Health Organization (WHO) for a general population [3,67].

Objective Physical Activity Monitor
Participants in a subsample of the population (n = 162) wore a validated accelerometer with electrocardiography (ECG) electrodes (ActiHeart, CamNtech Ltd., Papworth, UK) [68] to objectively monitor PAL for four consecutive days. In this study, PA is considered sedentary when PAL is below or equal 1.69 and active when PAL is above or equal 1.70.

Dietary Intake and Food Groups
Information on dietary intake was derived from FFQs [69] Specifically, the daily average intake of energy (kilocalories per day (kcal/d)), macronutrients and fatty acids (energy percentages (E-%)) and dietary fiber (grams per day [g/d]; fiber per energy intake (g/MJ) was calculated by transforming kcal to MJ and dividing fiber intake with MJ) were retrieved. Additionally, the FFQ included approximately 200 food items classified into 52 food groups based on similarity in nutrient profile or culinary according to the Dutch food composition database (NEVO) constructed by the RIVM. These 52 foods were further classified into 13 different food groups (Supplementary Figure S1).

Fecal Gut Microbiome Composition and Functionality
Stool samples were received by members of the study staff in the morning of a physical examination within six hours after the collection, or the next morning after the physical examination [30]. In the latter case, participants were asked to store the stool sample in their freezer until bringing it to the research location. There is no information available whether samples were received fresh or frozen. Stool samples were transported daily to −80 degree freezers at the Academic Medical Center (AMC) for storage from a temporary storage of −20 degrees at the Academic Medical Center (AMC) for storage from a temporary storage of −20 degrees at the research location.

Profiling of Fecal Microbiota Composition
Library preparation and sequencing of the gut microbiota was performed at the Wallenberg Laboratory (Sahlgrenska Academy, the University of Gothenburg, Sweden). For this, total genomic DNA was extracted from a 150 mg fecal sample aliquot using a repeated bead beating method as previously described [30]. In order to profile the composition of fecal microbiota, the V4 region of the 16S rRNA gene were sequenced on a MiSeq system (RTA v. 1.17.28, bundled with MCS v. 2.5; Illumina, San Diego, CA, USA) with 515F and 806R primers (2 × 250 bp paired-end reads). Amplification of 16S rRNA genes were done in duplicate reactions with a reaction mixture containing 1 × Five Prime Hot Master Mix (5PRIME GmbH), total of 400 nM of reverse and forward primers, 0.4 mg/mL bovine serum albumin (BSA), 5% dimethylsulfoxide, and 20 ng of genomic DNA (total volume of 25 µL). PCR steps were run as presented in Table 4. Combined duplicates were purified (NucleoSpin Gel, PCR Clean-Up kit, Macherey-Nagel, Düren, Germany), and quantified (Quant-iT PicoGreen dsDNA kit, Invitrogen, Waltham, MA, USA). Purified PCR products were diluted to 10 ng/µL and pooled in equal amounts, and to remove short amplicons, those were purified again (Ampure magnetic purification beads, Agencourt, Beverly, MA, USA). The absence of detectable PCR products in negative controls was confirmed with gel electrophoresis. The protocol used to analyze the samples was optimized using mock samples; and thus, there were no positive controls. Libraries for sequencing were prepared by mixing the pooled amplicons with PhiX control DNA (Illumina) resulting in a concentration of 3 pM input DNA (15% PhiX). It generated around 700 K clusters/mm 2 . Quality score was over 30 in 70% of the bases. Analytical procedures were blinded (non-randomized) for ethnicity.  [70]. For the paired-end merging, the following parameters were used: fast_mergepairs and maxdiffs = 30, fastq_filter and fastq_maxee = 1. After the merging and quality filtering, contigs were dereplicated and unique sequences were denoised using UNOISE3, to obtain amplicon sequence variants (ASVs). Then all merged reads were mapped to the resulting ASVs to generate the ASV table. The ASVs that did not match the expected amplicon length (ASVs longer than 260 base pair or shorter than 250 base pair) were filtered out. The 'assign Taxonomy' function from the dada2 R package (v 1.12.1) and the SILVA (v. 132) reference database were used to assign taxonomy [71,72]. MAFFT (v. 7.427) with default settings was used to align ASVs [73,74]. The 'double precision' build of FastTree (v. 2.1.11) was used to build a phylogenetic tree based on the multiple sequence alignment, with a generalized time-reversible model ('-gtr') [75]. These components (phylogenetic tree, taxonomy and ASV table) were integrated with the 'phyloseq' R package (v. 1.28.0) [76]. The 'vegan' R package (v 2.5-6) was used to rarefy the ASV table was rarefied to 14,932 counts per sample. 24 of the 6056 sequenced samples had <5000 counts per sample and were excluded at the rarefaction stage. The final dataset contained 6032 samples and 22,532 ASVs. The functional composition data from the 16s sequencing data were generated using Phylogenetic Investigation of Communities by PICRUSt2 (v. 2.3.0-b) [33], and specific pathways and their classification were identified using the MetaCyc database.

Characteristics of Gut Microbiota Composition
The dissimilarities in gut microbiota composition between individuals (β-diversity) were assessed with the Bray-Curtis dissimilarity index, as well as weighted and unweighted UniFrac distance calculated at the ASV level (function 'vegdist' vegan v. 2.5-6 R package for Bray-Curtis and function 'UniFrac' phyloseq v 1.30.0 R package for weighted and unweighted UniFrac [77]. Unconstrained Principal Coordinate Analysis was used to plot the Bray-Curtis dissimilarities (function pcoa ape v. 5.4-1 R package). The α-diversity of gut microbiota for each individual was assessed with several indices calculated at the ASV level: richness (number of unique ASVs: function 'estimate' vegan R package), the Shannon index, the Simpson index, Inverse Simpson index, the inverse Simpson index (function diversity 'vegan' R package) and Faith's phylogenetic diversity (function pd 'picante' v. 1.8.2 R package).

Microbial Data Preparation
The microbial data were summarized to phylum, family, genus, species and ASV level, this was then filtered to only keep taxa with >20 counts in >5% of participants. For the machine learning analyses, an unfiltered ASV table for either 1309 participants or 162 participants was used as input data. Fisher T-tests, and categorized variables were analyzed using chi square test. Skewed distributed variables were tested with a Mann-Whitney test. Pearson correlation was used for correlation of PA levels from the different PA monitor methods (objective and subjective), and Pearson chi square was used for correlation between the categories (active and sedentary). The statistical significance level was set to p < 0.05. An unconstrained principal component analysis of the food groups of the set of 1309 participants was created, after removing missing values (using function prcomp 'stats' v. 3.6.3 R package, with scale = TRUE in R).

Statistical Analysis of Gut Microbiota Composition
B-diversity was tested with a PERMANOVA ('adonis', permutations = 1000, in R). Analyses were adjusted for the following covariates: age, sex, BMI, and ethnicity, and if applicable, where noted for dietary intake (fat (E-%), protein (E-%), carbohydrate (E-%), fiber (g/d) and total daily energy intake (kcal/d)). For the small set of 162 participants (where PAL outcomes were used; sedentary PAL < 1.69, active PAL > 1.70), antibiotic use in the past three months was also included as a covariate. Residuals of taxa, pathway and gene abundances as well as α-diversity indices were computed using 'lm' in R, adjusting for the above mentioned covariates, which were compared between binary outcomes with a Wilcoxon test. Spearman correlation tests were performed on the covariate adjusted residuals between taxa abundances and continuous outcomes. In all analyses, an adjusted p-value (p.adjust with Benjamini-Hochberg [78] method in R) < 0.05 was considered significant. The explained variance of the Shannon index α-diversity by the different outcomes (foods, food groups and macronutrients) was calculated using 'lm' in R, adding covariates for age, sex, BMI and where noted, ethnicity.

Machine Learning
To predict the binary outcomes (objective monitoring method with two PAL levels (ActiHeart) and subjective monitoring method (SQUASH) with two activity levels stratified based on adherence to the PA guideline), the input features were the microbial data (see above section on Microbial Data Preparation). The input variables (ASV abundances) were preprocessed by an initial filtering on minimum occurrence ratio (the taxa must be present (>1 count) in >5% of participants), and a univariate feature selection (SelectPercentile with f_classif). Thereafter the data was split into an 80% training set and 20% testing set, which was randomly shuffled using Stratified Shuffle Split over 10 iterations to ensure stable predictions, and the average AUC of all iterations was reported. In each shuffle, XGBoost XGBregressor (objective = 'reg:squarederror') was used to predict binary outcomes (SQUASH and PAL activity levels) with GridSearch using roc_auc as the scoring function. A random variable was appended in each shuffle to serve as a threshold for relevant features selected by the model. Internal five-fold cross validation was performed in the hyperparameter search. The parameter grid used was: 'max_depth': (3)(4)(5), 'learning_rate':

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/10 .3390/metabo11120858/s1, Figure S1: Food group taxonomy, Figure S2: Comparisons of food groups between the sedentary and active participants, Figure S3: The machine learning of the population without participants with antibiotic use stratified by PA based on the objective method, Table S1: Beta diversity results, Table S2: Comparisons of microbial taxa between the sedentary and active participants, Table S3: Correlations of physical activity related parameters with microbial taxa, Table S4: Comparisons of microbial pathways between the sedentary and active groups, Table S5: Correlations of microbial metabolic pathway with physical activity related parameters, Table S6: Correlations of microbial taxa with food groups. Informed Consent Statement: Informed consent was obtained from all participants involved in the study.

Data Availability Statement:
The HELIUS data are owned by the Amsterdam University Medical Centers, located at the AMC in Amsterdam, The Netherlands. Any researcher can request the data by submitting a proposal to the HELIUS Executive Board as outlined at http://www. heliusstudy.nl/en/researchers/collaboration, accessed on 5 December 2021, by email: heliuscoordi-nator@amsterdamumc.nl. The HELIUS Executive Board will check proposals for compatibility with the general objectives, ethical approval and informed consent forms of the HELIUS study. There are no other restrictions to obtaining the data and all data requests will be processed in the same manner.