Butyrate Levels in the Transition from an Infant- to an Adult-Like Gut Microbiota Correlate with Bacterial Networks Associated with Eubacterium Rectale and Ruminococcus Gnavus

Relatively little is known about the ecological forces shaping the gut microbiota composition during infancy. Therefore, the objective of the present study was to identify the nutrient utilization- and short-chain fatty acid (SCFA) production potential of gut microbes in infants during the first year of life. Stool samples were obtained from mothers at 18 weeks of pregnancy and from infants at birth (first stool) at 3, 6, and 12-months of age from the general population-based PreventADALL cohort. We identified the taxonomic and SCFA composition in 100 mother-child pairs. The SCFA production and substrate utilization potential of gut microbes were observed by multiomics (shotgun sequencing and proteomics) on six infants. We found a four-fold increase in relative butyrate levels from 6 to 12 months of infant age. The increase was correlated to Eubacterium rectale and its bacterial network, and Faecalibacterium prausnitzii relative abundance, while low butyrate at 12 months was correlated to Ruminococcus gnavus and its associated network of bacteria. Both E. rectale and F. prausnitzii expressed enzymes needed for butyrate production and enzymes related to dietary fiber degradation, while R. gnavus expressed mucus-, fucose, and human milk oligosaccharides (HMO)-related degradation enzymes. Therefore, we believe that the presence of E. rectale, its network, and F. prausnitzii are key bacteria in the transition from an infant- to an adult-like gut microbiota with respect to butyrate production. Our results indicate that the transition from an infant- to an adult-like gut microbiota with respect to butyrate producing bacteria, occurs between 6 and 12 months of infant age. The bacteria associated with the increased butyrate ratio/levels were E. rectale and F. prausnitzii, which potentially utilize a variety of dietary fibers based on the glycoside hydrolases (GHs) expressed. R. gnavus with a negative association to butyrate potentially utilizes mucin, fucose, and HMO components. This knowledge could have future importance in understanding how microbial metabolites can impact infant health and development.

minutes. Each reaction consisted of 1× FirePol Master Mix Ready to Load (Solis BioDyne, Germany), 0.2µM forward & reverse primers, nuclease free-water (VWR, USA) and 1µl DNA. The DNA concentration was quantified following Qubit's protocol, normalized and pooled on a Biomek 3000. The pooled sample was split in two for quantification and sequencing. Samples for quantification were first subjected to droplet generation using BioRad QX200 TM -Droplet Generator, before being amplified at 95°C for 5 minutes followed by 40 cycles of 95°C for 30 seconds, 60°C for 30 seconds, and 72°C for 45 seconds before the last two steps at 4°C for 5 minutes and 90°C for 5 minutes before quantification on BioRad QX200 -Droplet Reader. The reactions contained 1× Super mix for EvaGreen (BioRad, USA), 0.2µM Illumina colony forward & reverse primer, 2.4µl DNA template and PCR water. The second part of the sample was diluted to 6 pM DNA with 15% PhiX following Illumina's instructions, with the exception of using nuclease-free water instead of Tris and sequenced on Illumina MiSeq.

Quantitative Insights Into Microbial Ecology (QIIME)
The 16S rRNA data were analyzed with Quantitative Insights Into Microbial Ecology (QIIME) pipeline [3]. QIIME v.1.9.1 was used to assemble forward and reverse reads and split them into their respective samples. Usearch v8 was used to check reads for chimeras, and OTUs with a 97% or higher 16S rRNA identity were created and assigned taxonomy by the SILVA 128 database [4]. Two sequencing runs were performed resulting in 30 878 312 ssDNA fragments. The cut-off was set at 5 000 dsDNA fragments, resulting in 352 samples with sufficient depth and quality. This was distributed as follows: meconium n=10, 3 months n=79, 6 months n=76, 12 months n=94 and mother n=93.

Shotgun metagenome sequencing
Six samples were selected for Shotgun sequencing, based on the presence and absence of E. rectale, R. gnavus and butyrate. The samples were processed using the Nextera XT DNA Library Preparation Kit (Illumina Inc, San Diego, CA, USA), following the manufacturer's instructions and sequenced on the Illumina MiSeq platform twice, resulting in approximately 4.7 Gb.

Protein extraction and quantification
Intracellular proteins were extracted from the gut bacteria derived from the same six children used for shotgun sequencing. All samples were run in technical duplicates, excluding one due to an insufficient amount of fecal material.
An indirect double filtering process was performed on 0.2g fecal material. Fecal samples were suspended in 10mL TBS (Tris-based saline buffer), before being filtered through a 20µm filter (Merck TM Nylon-Net Steriflip TM Vacuum Filter Unit, Fisher Scientific), homogenized at 30 000 rpm for 60s (VDI12, VWR), centrifuged at 4000g for 10 minutes and thereafter resuspended in 10mL TBS and filtered through a 0.22µm nitrocellulose filter [13]. The filter was cut into smaller pieces, mixed with lysis buffer, and bacteria were mechanically lysed as explained in the 16S rRNA sequencing section.
The lysis buffer contained 50mM Tris-HCl, 200mM NaCl, 0.1% Triton-X100, 10mM Dithiothreitol (DTT) and 4% Sodium Dodecyl Sulfate (SDS). Protein quantification was performed using the BCA-DC kit, following the manufacturer's protocol, before applying 40µg protein to an SDS-gel. The gel was run for approximately 2 minutes. The proteins were fixed using 50% methanol and 10% glacial acetic acid for 1 hour with gentle agitation, and thereafter stained using 0.1% Coomassie Brilliant Blue R-250 with 50% methanol and 10% glacial acetic acid, and destained with 40% methanol and 10% glacial acetic acid.

In-gel reduction, alkylation and digestion
The gel band containing proteins was cut into approximately 1×1mm pieces and de-colored with Milli-Q water (MQ) and incubated at 15 minutes in room temperature (RT). MQ was removed and 50% acetonitrile (ACN) with 25 mM ammonium bicarbonate was added and incubated 15 minutes at RT. 100% ACN was added and incubated for 5 minutes at RT, before it was removed and then airdried. Reduction and alkylation were performed by adding 10mM DTT with 100 mM ammonium bicarbonate for 30 minutes at 56ºC. The samples were cooled before the solution was removed and 55mM iodoacetamine (IAA) with 100mM ammonium bicarbonate was added and incubated in the dark for 30 minutes in RT. The IAA solution was removed, and 100% ACN was added and incubated for 5 minutes in RT before the solution was removed and air-dried for 2 minutes. For digestion 40ng of Trypsin solution was added and gel-pieces were incubated overnight at 37ºC, thereafter 1% trifluoroacetic acid (TFA) was added and the gel-pieces were sonicated in a water bath for 15 minutes. The resulting peptides were desalted using C18 solid phase ZipTips according to the manufacturer's instructions.
The MS raw files were analyzed, identified and quantified using MaxQuant version 1.6.6.0, with the MaxLFQ algorithm [14,15] implemented for label-free quantitative detection of proteins. In brief, the raw files were searched against the sample-specific protein sequence database and against the human genome (Homo sapiens, 73952 sequences). The sequences database was complemented with common contaminants, such as human keratin, trypsin and bovine serum albumin, as well as reversed sequences of all protein entries to estimate the false discovery rate. Oxidation of methionine's, protein N-terminal acetylation, deamination of asparagine and glutamine, and conversion of glutamine to pyro-glutamic acid were used as variable modifications, while carbamidomethylating of cysteine residues was used as a fixed modification. Two missed cleavages of trypsin were allowed, and all identifications were filtered in order to achieve a protein false discovery rate of 1% using the target-decoy strategy in Perseus version 1.6.6.0 [16], resulting in 2215 proteins. Functional annotation of the proteins was determined by a combination of the InterProScan database [11], Pfam database [17] and manual search using protein BLAST. Proteins were annotated as GH or CE from the DBCan meta server v8 [18] and all GH and CE annotated equally by at least two of the three databases were included (DIAMOND, HMMER, Hotpep).

Statistical analysis
Statistical analysis was performed in Rstudio [19] and MatLab [20]. Kruskall-Wallis-Dunns test was performed with R version 3.4.3 and R-package PMCMR plus (2018). Correlation of bacterial profiles to SCFAs was performed by using Spearman correlations, with a p-value less than 0.05, FDR corrected using the Benjamini-Hochberg method in the MatLab programming environment [20]. Correlation of the metadata to 16S using ASCA-ANOVA was performed in the MatLab environment, while metadata correlation to children with the bacterial networks was performed using a chi-square test. indexes. The PCoA plot shows the OTU dissimilarities between the different infant age groups and mothers. The age groups are divided by colors, shown at the top right, with the number of samples per age group in parentheses. Figure S4. Total short-chain fatty acids per 16S rRNA gene copy. The bar-chart illustrates the logarithmic (log10) abundance of short-chain fatty acids per 16S rRNA gene copy in the infant for all age groups (A) and by the two dominant bacterial networks with a positive or negative correlation to butyrate (B). The bacterial load for each sample was determined by calculating 16S copy number based on Cq-values resulting from qPCR. Asterisks represent a significant difference (p<0.05, Mann-Whitney-Wilcoxon test). Asterisks are not included in A) as all SCFAs were significant between the age groups. Figure S5. Bacterial correlation to SCFA in all age groups. The figure illustrates the correlation pattern between bacteria and relative amount of SCFAs for 3 months of age (A), 6 months of age (B), and mothers (C). The illustration shows all OTUs from 16S rRNA represented as nodes, with color indicating their correlation to SCFAs; Blue = no correlation, red = negative correlation to butyrate, green = positive correlation to butyrate, black = positive correlation to propionate and purple represents a positive correlation to both propionate and acetate. The three different node sizes represent the general abundance of the respective bacteria. The thickness of the lines between nodes represents a correlation between the bacteria, of which a thick line is a strong correlation. Blue lines indicate a positive correlation between the bacteria, while brown indicate a negative correlation.