Phenotypic and Nodule Microbial Diversity among Crimson Clover ( Trifolium incarnatum L.) Accessions

: Crimson clover ( Trifolium incarnatum L.) is the most common legume cover crop in the United States. Previous research found limited genetic variation for crimson clover within the National Plant Germplasm System (NPGS) collection. The aim of this study was to assess the phenotypic and nodule microbial diversity within the NPGS crimson clover collection, focusing on traits important for cover crop performance. Experiments were conducted at the Beltsville Agricultural Research Center (Maryland, USA) across three growing seasons (2012–2013, 2013–2014, 2014–2015) to evaluate 37 crimson clover accessions for six phenotypic traits: fall emergence, winter survival, ﬂowering time, biomass per plant, nitrogen (N) content in aboveground biomass, and proportion of plant N from biological nitrogen ﬁxation (BNF). Accession e ﬀ ect was signiﬁcant across all six traits. Fall emergence of plant introductions (PIs) ranged from 16.0% to 70.5%, winter survival ranged from 52.8% to 82.0%, and growing degree days (GDD) to 25% maturity ranged from 1470 GDD to 1910 GDD. Biomass per plant ranged from 1.52 to 6.51 g, N content ranged from 1.87% to 2.24%, and proportion of plant N from BNF ranged from 50.2% to 85.6%. Accessions showed particularly clear di ﬀ erences for fall emergence and ﬂowering time, indicating greater diversity and potential for selection in cover crop breeding programs. Fall emergence and winter survival were positively correlated, and both were negatively correlated with biomass per plant and plant N from BNF. A few promising lines performed well across multiple key traits, and are of particular interest as parents in future breeding e ﬀ orts, including PIs 369045, 418900, 561943, 561944, and 655006. In 2014–2015, accessions were also assessed for nodule microbiome diversity, and 11 genera were identiﬁed across the sampled nodules. There was large variation among accessions in terms of species diversity, but this diversity was not associated with observed plant traits, and the functional implications of nodule microbiome diversity remain unclear.

To increase farmer adoption, there is a need for germplasm improvement in cover crops. A lack of adapted cover crop cultivars (e.g., with enhanced biomass and winter hardiness for various climates) presents a significant barrier to cover crop adoption [11,12]. Historically, plant breeding has focused resources and effort on breeding for increased productivity in cash crops, with remarkable success [13][14][15][16][17][18][19]. By comparison, relatively little breeding has focused on ecosystem services (e.g., improving crops for use as cover crops) [20]. Shifting resources towards cover crop breeding could result in significant improvement in cover crop germplasm and ultimately in increased adoption on the landscape. Many species used as cover crops have been bred as cash crops (e.g., for forage production). However, to initiate breeding for use as cover crops, it is necessary to evaluate existing variation for traits of interest specific to cover crop production. Previous studies have assessed the genetic and phenotypic variation available through the National Plant Germplasm System (NPGS) for cover crop species, notably hairy vetch (Vicia villosa L.) [21,22].
Crimson clover (Trifolium incarnatum L.) is the most common legume cover crop in the United States [10]. Valued for its biomass production and nitrogen contribution, crimson clover is used across many regions; it is commonly used as a forage in the Southeast US and can be grown as a winter or summer annually depending on the climate [23]. A previous screening of the genetic variation in the NPGS crimson clover collection found limited genetic diversity and a substantial overlap in pedigrees [24]. Modern US cultivars show especially low genetic variation, with nearly all cultivars derived from 'Dixie.' However, an assessment of phenotypic variation in the NPGS crimson clover collection has not been conducted for cover crop traits of importance to farmers: nitrogen fixation, biomass production, winter hardiness, and flowering time [12]. Further, identifying which microbes colonize nodules may provide insights into variation in biological nitrogen fixation among crimson clover genotypes. In other legume species, nodule occupancy has been observed to differ both diversity and richness in rhizobia species [25,26]. To improve crimson clover germplasm for cover cropping, the phenotypic variations in existing collections needs to be assessed for traits of interest.
In this study, we assessed phenotypic variation within the NPGS crimson clover collection, and also explored the genetic diversity of nodule microbial populations regulating biological nitrogen fixation (BNF). We evaluated the NPGS crimson clover collection for emergence, winter survival, flowering time, biomass per plant, nitrogen (N) content, and proportion of plant N from BNF. We also evaluated the diversity of nodule microbial populations among accessions, with the understanding that ultimately BNF will be a result of interactions between traits of the resident soil/inoculum population and plant traits that affect nodule entry selectivity, nodule number, and activity.

Plant Material
Germplasm consisted of 37 accessions of crimson clover, representing all plant introduction (PI) lines of crimson clover in the NPGS with seed available for distribution. However, seven of these accessions were not available from NPGS after the 2012-2013 season, so the remaining 30 accessions were evaluated in 2013-2014 and 2014-2015 (Table S1).

Field Evaluation
Phenotypic evaluation of crimson clover accessions was conducted over three growing seasons, with crimson clover direct-seeded by hand in late September 2012, 2013, and 2014 in field plots in Beltsville, MD, USA (39 • 02 N, 76 • 56 W) on sandy loam soils (Russett Christiana complex). The field design was a randomized complete block design (RCBD) with four replications in each year, except for five accessions planted in 2015, which only had three replications due to limited seed availability.
Each plot was a single row of 0.6 m in length with 1.5 m between plots. Between 37 and 45 seeds were planted per plot, depending on seed availability in each year.
Fall emergence was evaluated in late October of each year by counting the total number of plants in each plot. Winter survival was determined by counting the total number of plants per plot in late April divided by the total number of plants counted in the fall. Flowering time was evaluated by recording percent flowering on a per-plot basis on a scale from 0% (no flower buds present) to 100% (all flowers dried up entire length of head). Flowering evaluations took place periodically between late April and early June. In 2013, evaluation took place on six dates: 23  Once an accession was rated at 50% or greater for flowering, biomass was collected. All plants in the plot were pulled up with roots attached. Plants were counted and the roots were clipped. All plants within a plot were placed in the same brown paper bag and dried. Dry weight was recorded and plants were ground for laboratory evaluation of nitrogen content, proportion of nitrogen from BNF, and metagenomic analysis.

Laboratory Evaluation
The crimson clover biomass samples were separated into shoots and roots. Shoots were oven-dried (60 • C) for approximately 72 h, weighed, and ground to pass a 1.0-mm screen. Tissue C and N concentrations and 15 N natural abundance were determined for the shoot material of each accession using a Thermo Delta V Isotope Ratio Mass Spectrometer (Thermo Scientific, Waltham, MA, USA) and Carlo Erba NC2500 Elemental Analyzer (Carlo Erba, Milan, Italy). Isotopic abundance data were expressed as δ 15 N in parts per thousand (% ), representing the abundance of plant tissue 15 N relative to that of atmospheric N2.
For 2015 samples, fresh roots were prepared for nodule collection by carefully removing soil from the root system, breaking as few roots as possible. Nodules were washed in a beaker of deionized water, placed onto a 53-µm sieve, and washed with deionized water until clean of soil and organic matter. The nodules were surface-sterilized by transferring to Falcon tubes containing 10% (v/v) bleach, gently inverting the tubes over a five-minute period, and washing again with deionized water until bleach odor was gone.
For each crimson clover accession, 500 mg of surface sterilized nodules were placed in a 1.5 mL Eppendorf tube and submerged in 1× phosphate buffered solution (PBS). Three to four sterilized nodules from each preparation were reserved before mashing and plated on yeast mannitol broth (YMB) to serve as controls for the surface sterilization procedure (no growth observed). The remaining nodules were mashed using an Eppendorf micropestle. The mashed nodules were centrifuged for two minutes at 14,000 rpm to pellet the residual plant biomass. A total of 100 µL of mashed nodules were added to two mL of yeast mannitol broth YMB in sterile, 4.5 mL culture tubes and incubated in the dark on a shaker at 28 • C and 225 rpm for four d. Then, 100 µL of this culture was plated onto YMB plates and incubated at 28 • C for four d. After four days, a bacterial lawn developed on all plates. To collect the cells for analysis, 300 µL of 1× PBS buffer was added to the surface of each plate using a cell scraper, and the bacterial biomass from the surface of the plate was scraped and transferred into a single Eppendorf tube. The bacterial biomass was pelleted and divided into two aliquots; one was immediately prepared for storage as a glycerol stock. The other half was frozen at −20 • C until it was used for genomic DNA extraction and downstream molecular analysis.
The nodule rhizobacteria from 19 accessions were individually processed for genomic DNA extraction using Qiagen power plant genomic DNA (Qiagen Inc., Germantown, MD, USA). Each metagenomic extraction was individually barcoded using the Illumina 30 barcodes of a 96-index bar code kit following manufacturer's instructions (Illumina, Inc., San Diego, CA, USA). The indexed metagenomic libraries were then sequenced on an Illumina nextSeq 500 following manufacturer's instructions using all channels of a flow cell.

Data Analysis
Six phenotypic performance metrics were assessed: fall emergence, winter survival, flowering time, biomass per plant, nitrogen content, and proportion of plant nitrogen from BNF. For each response variable, a linear mixed model was fit, with accession, year, and year × accession as fixed effects and replicate within year as random effects. Where appropriate, log-transformation was applied to the response variable to achieve homogeneity of variances. For each model, means and least-significant difference (LSD) estimates were extracted.
Fall emergence was calculated as the count of living plants as a proportion of seeds planted. Winter survival was calculated as the count of living plants in the spring as a percent of fall-emerged plants. When more plants were present in the spring than in the fall (e.g., due to spring emergence), winter survival was set to 100%.
Flowering time was evaluated as the growing-degree days (GDD) required to reach 25% flowering on a per-plot basis. GDD was calculated using a base temperature of 0 • C [27]. For plots that reached 25% flowering before the first rating (17 of 323 plots), GDD to 25% was set to the GDD of the first observation date. Likewise, for plots that never reached 25% flowering (5 of 323 plots), GDD to 25% was set to that of the last observation date. For plots that reached 25% flowering between observation dates, a linear interpolation function was used to estimate the GDD at which 25% flowering occurred.
Biomass for each plot was calculated on a per-plant basis by dividing the total plot biomass by the number of plants present in the plot at biomass harvest. The biomass and plant-tissue %N content data showed a non-normal distribution, so a log-transformation was applied. The proportion of tissue-N derived from BNF was estimated using the 15 N natural abundance method [28] and statistically analyzed as above. All analyses were conducted in R version 3.6.1 (R Foundation for Statistical Computing, Vienna, Austria) and used package {nlme} (3.1-143) for model fitting [29]. Scripts for data analysis and raw data for the plant phenotypic analysis are available at https://data.nal.usda.gov/dataset/data-phenotypicand-nodule-microbial-diversity-among-crimson-clover-trifolium-incarnatum-l-accessions.
All metagenomic analyses were conducted with the free on-line bioinformatics suite Kbase (https: //kbase.us/). Paired-end reads were uploaded to the Kbase cloud then joined using the join-paired-ends import tool, adapters and indices were trimmed, and joined reads with a Phred quality score over 30 were kept for further analysis and processed separately from this point forward. Paired end libraries were assembled using KBaseGenomeAnnotations.Assembly-5.0, which employs Megahit 1.1 [30] with customized parameter setting. Parameters were set to min-count multiplicity for filtering (k_min+1)-mers, default 2, k-min kmer size (≤127), default 21, k-max kmer size (≤255), default 141, k-step increment of kmer size of each iteration (≤28), default 12, k-list list of kmer size (range 15-255, increment ≤28), min-contig-len minimum length of contigs to output (≥300), default 300.
Assembled contigs were then annotated using Rapid Annotations and the Subsystems Technology toolkit (RASTtk) [31][32][33]. Default search and alignment parameters are defined at http://rast.nmpdr.org/. Annotated metagenomic libraries were compared to each other using GenomeComparisonSDK v.0.0.4 (https://kbase.us/), which uses a k-mer approach to compute a protein pangenome. This approach builds on the sequence-based homologous protein families identified in the input Pangenome object by adding to the accumulated protein family annotation by identifying putatively similar functions within or among family groups. In addition, we identified the genes for tRNAs, rRNAs, and transposons. This database was then used to identify all the co-occurring protein families, tRNAs, rRNAs, and transposons among all nodule metagenomes. Scripts for data analysis and raw data for the metagenomic analysis are available at https://narrative.kbase.us/narrative/26966.

Fall Emergence
The effects of accession, year, and accession x year were all significant to at least the 0.05 level. Replicate explained a small amount of variance in the model compared with accession, year, and accession x year interaction ( Table 2). Fall emergence averaged across years varied from 16.0% (PI 233812) to 70.5% (PI 556990) of seeds planted (Figure 1), with a grand mean of 48.6% of plants emerged across all genotypes. Emergence differences were likely caused by a combination of genetic and seed source effects on seed coat impermeability and seed viability [34,35]. The effect of year on emergence was inconsistent across genotypes (indicated by the significant interaction effect), but there were several genotypes with consistently high or low emergence. The following nine accessions had a mean fall emergence of 60% or greater, and merit consideration as breeding parents: PIs 369045, 556990, 561942, 561943, 561944, 613042, 613043, 613047, and 655006. If low emergence was due to seed coat impermeability rather than non-viable seed, accessions with low fall emergence may also be valuable in a breeding context, as some producers prefer to use hard-seeded crimson clover as a self-seeding forage in pastures. Conversely, other producers prefer lines with low hard-seed for consistent emergence as a cover crop in rotations with annual cash crops [24,34]. Further investigation is needed to determine the effect of genetic and environmental variance on hard seed and seed viability: accessions that exhibited high and low emergence in this study ( Figure 1) could be grown in diverging environments, then harvested seed could be assayed for impermeability. As previous research has found hard seed to be heritable [36], divergent selection for hard seed could be a promising breeding target for crimson clover.

Winter Survival
The effects of accession, year, and replicate on winter survival were significant at the 0.001 level. Replicate explained more variance in the model than accession, year, and accession × year interaction combined ( Table 2). Winter survival averaged across years varied from 52.8% (PI 418900) to 82.0% (PI 613046) of seeds planted (Figure 2), and a grand mean of 69.6% of emerged plants survived the winter. The 2013-2014 season had lower winter survival than other years (few accessions surpassed a mean of 50% winter survival) but also much greater variability in survival. The reduced winter survival in 2013-2014 is likely explained by the greater number of FDD and days below freezing without snow cover ( Table 1). The large contribution of replicate, relative to fixed effects, highlights challenges associated with selection for winter hardiness, especially the complex genetics and physiology of winter hardiness and its high degree of variability even within a single field and growing season [37,38]. However, the significant accession effect and the lack of significant accession × year interaction suggests that agronomically important differences in winter hardiness in crimson clover may be observed even in years with relatively limited winter kill, and that selection can take place even in milder winters.

Winter Survival
The effects of accession, year, and replicate on winter survival were significant at the 0.001 level. Replicate explained more variance in the model than accession, year, and accession × year interaction combined ( Table 2). Winter survival averaged across years varied from 52.8% (PI 418900) to 82.0% (PI 613046) of seeds planted (Figure 2), and a grand mean of 69.6% of emerged plants survived the winter. The 2013-2014 season had lower winter survival than other years (few accessions surpassed a mean of 50% winter survival) but also much greater variability in survival. The reduced winter survival in 2013-2014 is likely explained by the greater number of FDD and days below freezing without snow cover ( Table 1). The large contribution of replicate, relative to fixed effects, highlights challenges associated with selection for winter hardiness, especially the complex genetics and physiology of winter hardiness and its high degree of variability even within a single field and growing season [37,38]. However, the significant accession effect and the lack of significant accession × year interaction suggests that agronomically important differences in winter hardiness in crimson clover may be observed even in years with relatively limited winter kill, and that selection can take place even in milder winters.

Flowering Time
Effect of accession, year, and accession × year interaction on crimson clover flowering time were significant to at least the 0.01 level. As with fall emergence, the replicate explained a small amount of variance in the model compared with accession, year, and accession × year interaction ( Table 2). GDD to reach 25% maturity varied from 1470 (PIs 561943 and 561944) to 1910 (PI 418901) (Figure 3), and accessions reached 25% maturity at a grand mean of 1581 GDD. Although the accession × year interaction was significant, flowering time was relatively consistent across years except for the accessions with the most extreme values. This consistent performance aligns with previous studies in other clover species, which found high heritability for flowering time [39,40]. While the majority of accessions fell within a narrow band of flowering time (around 1600 GDD), three accessions (PIs 561942, 561943, and 561944) originating in the southeast US matured somewhat earlier, with all three reaching 25% maturity prior to 1500 GDD. These accessions represent potential parents for development of an early-maturing population. One accession, PI 418901, was significantly latermaturing than all other accessions; it reached 25% maturity at 1910 GDD, and the next-latest accession (PI 418900) reached 25% at 1690 GDD. This accession, therefore, is an excellent candidate for development of a late-maturing population. Both early-and late-maturing crimson clover may be of interest to farmers, depending on region and cropping system.

Flowering Time
Effect of accession, year, and accession × year interaction on crimson clover flowering time were significant to at least the 0.01 level. As with fall emergence, the replicate explained a small amount of variance in the model compared with accession, year, and accession × year interaction ( Table 2). GDD to reach 25% maturity varied from 1470 (PIs 561943 and 561944) to 1910 (PI 418901) (Figure 3), and accessions reached 25% maturity at a grand mean of 1581 GDD. Although the accession × year interaction was significant, flowering time was relatively consistent across years except for the accessions with the most extreme values. This consistent performance aligns with previous studies in other clover species, which found high heritability for flowering time [39,40]. While the majority of accessions fell within a narrow band of flowering time (around 1600 GDD), three accessions (PIs 561942, 561943, and 561944) originating in the southeast US matured somewhat earlier, with all three reaching 25% maturity prior to 1500 GDD. These accessions represent potential parents for development of an early-maturing population. One accession, PI 418901, was significantly later-maturing than all other accessions; it reached 25% maturity at 1910 GDD, and the next-latest accession (PI 418900) reached 25% at 1690 GDD. This accession, therefore, is an excellent candidate for development of a late-maturing population. Both early-and late-maturing crimson clover may be of interest to farmers, depending on region and cropping system.

Biomass per Plant
Effects of accession, year, and replicate on biomass per plant were significant to at least the 0.01 level. Replicate explained more variance in the model than accession, year, and accession × year interaction combined ( Table 2). Biomass per plant varied from 1.52 g plant −1 (PI 233279) to 6.51 g plant −1 (PI 442556) (Figure 4), and accessions produced a grand mean of 2.97 g plant −1 . Across accessions, plants produced less biomass in 2013-2014 than the other years, likely due to the colder winter and/or wetter year (Table 1). Again, the large contribution of replicates within the year highlights the impact of field spatial variability on plant biomass and underscores the challenge of phenotyping and selecting for increased biomass production. While some crimson clover cultivars have been developed with improved plant vigor and forage yields [34], given the limited variance explained by accession within the crimson clover collection, progress in biomass production is likely to be slow in a crimson clover cover crop breeding program as well. However, as for winter survival, accession x year interaction is not significant, suggesting that agronomically important differences in biomass in crimson clover may be observed even in years with low biomass per plant overall.

Biomass per Plant
Effects of accession, year, and replicate on biomass per plant were significant to at least the 0.01 level. Replicate explained more variance in the model than accession, year, and accession × year interaction combined ( Table 2). Biomass per plant varied from 1.52 g plant −1 (PI 233279) to 6.51 g plant −1 (PI 442556) (Figure 4), and accessions produced a grand mean of 2.97 g plant −1 . Across accessions, plants produced less biomass in 2013-2014 than the other years, likely due to the colder winter and/or wetter year (Table 1). Again, the large contribution of replicates within the year highlights the impact of field spatial variability on plant biomass and underscores the challenge of phenotyping and selecting for increased biomass production. While some crimson clover cultivars have been developed with improved plant vigor and forage yields [34], given the limited variance explained by accession within the crimson clover collection, progress in biomass production is likely to be slow in a crimson clover cover crop breeding program as well. However, as for winter survival, accession x year interaction is not significant, suggesting that agronomically important differences in biomass in crimson clover may be observed even in years with low biomass per plant overall.

Nitrogen Content
The effects of accession, year, accession × year interaction, and replicate on nitrogen content were all significant at the 0.001 level. Replicates explained comparable variance as accession, year, and accession × year interaction combined ( Table 2). Plant nitrogen content varied from 1.87% (PI 418901) to 2.24% (PI 233812) ( Figure 5), and accessions contained a grand mean of 2.07% nitrogen. In 2013-2014, in which most accessions produced less biomass, many accessions also contained higher nitrogen content. The relationship between yield and crude protein content within cultivars due to seasonal variation has not been sufficiently addressed in the literature and merits further research. However, the range of nitrogen content present in the accessions is relatively narrow, and so in terms of total N contributed by the cover crop, an accession producing more biomass will likely produce more N even if its percent N is on the lower end of the observed spectrum. As with winter survival and biomass, replicate contributed a large portion of the model variance, again indicating that nitrogen content is highly impacted by spatial variability within the field.

Nitrogen Content
The effects of accession, year, accession × year interaction, and replicate on nitrogen content were all significant at the 0.001 level. Replicates explained comparable variance as accession, year, and accession × year interaction combined ( Table 2). Plant nitrogen content varied from 1.87% (PI 418901) to 2.24% (PI 233812) ( Figure 5), and accessions contained a grand mean of 2.07% nitrogen. In 2013-2014, in which most accessions produced less biomass, many accessions also contained higher nitrogen content. The relationship between yield and crude protein content within cultivars due to seasonal variation has not been sufficiently addressed in the literature and merits further research. However, the range of nitrogen content present in the accessions is relatively narrow, and so in terms of total N contributed by the cover crop, an accession producing more biomass will likely produce more N even if its percent N is on the lower end of the observed spectrum. As with winter survival and biomass, replicate contributed a large portion of the model variance, again indicating that nitrogen content is highly impacted by spatial variability within the field.

Proportion of Plant Nitrogen as BNF
Again, the effects of accession, year, accession × year interaction, and replicate on BNF were all significant at the 0.001 level. Again, the replicate explained more variance in the model than accession, year, and accession x year interaction combined ( Table 2). The proportion of plant nitrogen as BNF varied from 50.2% (PI 527691) to 85.6% (PI 655006) (Figure 6), and accessions contained a grand mean of 73.1% BNF as a proportion of plant nitrogen. However, the accessions with the highest mean proportion of N as BNF were accessions not included in 2013-2014, a year with lower BNF across all included accessions. Few breeding efforts have focused on nitrogen fixation ability in part due to challenges in implementing efficient screening systems, although differences between clover cultivars have been noted [41]. Again, although significant differences between accessions were observed, the replicate explained a larger portion of the variance than fixed effects, indicating that biological nitrogen fixation is highly impacted by field spatial variability and selection for this trait in a crimson clover cover crop breeding program is likely to be slow.

Proportion of Plant Nitrogen as BNF
Again, the effects of accession, year, accession × year interaction, and replicate on BNF were all significant at the 0.001 level. Again, the replicate explained more variance in the model than accession, year, and accession x year interaction combined ( Table 2). The proportion of plant nitrogen as BNF varied from 50.2% (PI 527691) to 85.6% (PI 655006) (Figure 6), and accessions contained a grand mean of 73.1% BNF as a proportion of plant nitrogen. However, the accessions with the highest mean proportion of N as BNF were accessions not included in 2013-2014, a year with lower BNF across all included accessions. Few breeding efforts have focused on nitrogen fixation ability in part due to challenges in implementing efficient screening systems, although differences between clover cultivars have been noted [41]. Again, although significant differences between accessions were observed, the replicate explained a larger portion of the variance than fixed effects, indicating that biological nitrogen fixation is highly impacted by field spatial variability and selection for this trait in a crimson clover cover crop breeding program is likely to be slow.

Nodule Metagenome
In total, we constructed nodule metagenomic libraries from 19 of the crimson clover accessions. Each library contained on average 8-10 million reads, equivalent to about 1.51 × 10 9 bases with an average single read length of 147 bp. Sequence assembly and duplicate sequence removal resulted in metagenomic libraries that ranged in size from 4.6 to 23.7 Mb. The number of contigs generated for each accession ranged in size among the libraries from 300 bp to 190 kb. The total number of contigs ranged from 5693 to 18,765, with a mean of 13,090.
Homology to the rRNA operon was used to determine the diversity of bacterial species within each nodule metagenomic library at >99%, identity. Other parts of the genome were identified to the genera or species level by alignment with documented gene fragments in KBASE with a minimum length > 1 kb, and an insertion or deletion gap size of 25 bp. All species identified had a minimum Evalue of <0.001. These stringent conditions enabled the identification of many taxa at species level or variants contained within species. Among the nodule metagenomic libraries, the 11 genera identified were Rhizobium, Mesorhizobium, Xanthomonas, Stenotrophomonas, Aquiflexum, Sinorhizobium, Bradyrhizobium, Agrobacetrium, Paenibacillus, Sphingomonas, and Pseudomonas. Bacillus dominated the number of reads that could be identified to a species level with reads specific enough to identify nine distinct Bacillus species in contigs of genomic regions greater than 10 kilobases.

Nodule Metagenome
In total, we constructed nodule metagenomic libraries from 19 of the crimson clover accessions. Each library contained on average 8-10 million reads, equivalent to about 1.51 × 10 9 bases with an average single read length of 147 bp. Sequence assembly and duplicate sequence removal resulted in metagenomic libraries that ranged in size from 4.6 to 23.7 Mb. The number of contigs generated for each accession ranged in size among the libraries from 300 bp to 190 kb. The total number of contigs ranged from 5693 to 18,765, with a mean of 13,090.
Homology to the rRNA operon was used to determine the diversity of bacterial species within each nodule metagenomic library at >99%, identity. Other parts of the genome were identified to the genera or species level by alignment with documented gene fragments in KBASE with a minimum length > 1 kb, and an insertion or deletion gap size of 25 bp. All species identified had a minimum E-value of <0.001. These stringent conditions enabled the identification of many taxa at species level or variants contained within species. Among the nodule metagenomic libraries, the 11 genera identified were Rhizobium, Mesorhizobium, Xanthomonas, Stenotrophomonas, Aquiflexum, Sinorhizobium, Bradyrhizobium, Agrobacetrium, Paenibacillus, Sphingomonas, and Pseudomonas. Bacillus dominated the number of reads that could be identified to a species level with reads specific enough to identify nine distinct Bacillus species in contigs of genomic regions greater than 10 kilobases. Rhizobium was clearly represented but is known to be more challenging to grow on media and therefore may have had a disadvantage on media relative to its importance in the nodule.
We identified 160,000 protein families, tRNA, ORFs, rRNA, and transposon genes; of these, 42% of the protein families were of unknown function. In the 11 bacterial genera identified in large reconstructed genomic contigs, all contained whole or partial complements of the nitrogen fixation complexes (nitrogenase genes, nif, and nod) as well as nitrogen responsive gene operons such as the nor and nos genes. Searches within the nodule metagenomes for other potential plant growth promoting rhizobacterial genes yielded numerous antibiotic genes as well as hydrogenases, which have been associated with increases in efficiency in biological nitrogen fixation.
We used three reference Rhizobium legumunosarum strains in our analysis (WSM1689, WSM2304, and Rt24.2 which have 6760, 6700, and 7080 annotated genes, respectively). The reference genomes have about 9500 identified elements that are over 1 kb in size and can be assigned to a specific species. The crimson clover accessions harbored a wide range of diversity within their nodule phytobiomes, from PI 369045, which contained 20,729 orthologous metagenome regions, to PI 561944, which contained 72,930 orthologous metagenome regions. The clover nodule microbiome with the lowest diversity of identifiable genetic regions (PI 369045) has about twice as much genetic diversity as the three reference genomes ( Figure S1). The diversity of taxa identified among the nodules range between 12 and 20 different species of bacteria among the clover accessions. This diversity and the sequencing depth of our study (average library size) suggests that, for some accessions, we may not have data covering the full genomes of all genera within the nodule.
Several genes with roles in nitrogen fixation were found across multiple taxa identified within the crimson clover nodules ( Table 3). All of the major taxa identified contained the nifH gene, which is a primary gene encoding the nitrogenase enzyme that carries out nitrogen fixation [42]. The narG nitrate reductase gene was found in two of the identified taxa, and has been shown to recycle fixed nitrogen through the dissimilatory nitrate reduction to ammonium (DNRA) pathway [43]. Anaerobic ammonium oxidation has been shown to remove significant amounts of nitrite and ammonium from ecosystems fixing nitrogen [44]. The hlyD gene, which is a multi-drug efflux transporter associated with antimicrobial resistance and survival in toxic environment, was found in three of the identified taxa. These types of excretion systems are likely to be involved with infection and nodule morphology [45]. The hesB gene, which is part of the nitrogenase, nitrogen fixation operon, was also found in three of the identified taxa [46]. Interestingly, the majority of research related to the role of hesB in BNF efficiency and regulation has occurred in algal systems and platonic freshwater cyanobacteria systems [47][48][49]. The possible roles of these genes in terrestrial leguminous systems has largely been ignored. There is potential to use metagenomic analysis of nodules to understand the efficiency of nitrogen fixation, recycling, and survival of symbionts in the nodule merit.

Correlation among Traits
Associations among plant traits and nodule metagenome features were assessed using Pearson correlation. Among the plant phenotypic traits measured, only fall emergence and winter survival were positively correlated. Negative correlations were observed between fall emergence and biomass per plant, winter survival and biomass per plant, fall emergence and plant N from BNF, and winter survival and plant N from BNF. No correlations with any other traits were observed for flowering time or N content (Table 4). Several accessions ranked in the top five accessions for two or three key traits. Of particular interest, PI 369045 was among the top accessions for emergence, survival, and N content. PI 418900 ranked second for both biomass and late flowering. PIs 561943 and 561944 were the earliest-flowering accessions and were both also among the top accessions for emergence. PI 613048 had both high winter survival and biomass per plant, and PI 655006 was among the top accessions for emergence, proportion of N as BNF, and late flowering (Figures 1-6).
Several positive correlations were observed among the nodule metagenome features. However, there were no correlations between any plant phenotypes and any nodule metagenome features.

Discussion
Plant breeding is a critical tool to increase cover crop use and efficacy on agricultural landscapes [11,12]. Crimson clover is the most widely used legume cover crop in the United States [10], and there is a need to improve the crimson clover germplasm for important cover cropping traits such as increased biomass and winter hardiness. Modern US cultivars have significant overlap in pedigrees, as most are derived from 'Dixie,' and limited genetic diversity found in a previous study of the NPGS crimson clover collection [24], but this was the first phenotypic assessment of the crimson clover collection for cover crop traits important to farmers.
Despite the limited genetic diversity previously observed, we did find some meaningful phenotypic variation for traits of agronomic importance for cover crop usage. Flowering time showed a large accession effect, and agronomically relevant diversity was observed among accessions. Its lack of correlation with other traits indicates that selection for flowering time may be possible without negative impacts on other agronomically important traits (e.g., due to linkage drag). Both earlyand late-maturing populations have utility in different cropping systems. These results suggest that flowering time would be a relatively straightforward trait for selection in a crimson clover breeding program.
Fall emergence also showed a large accession effect, but further work is needed to determine whether this variation is caused by genetics or seed source effects. Low emergence may have been due to hard seededness, in which case accessions with both low and high emergence will be valuable for crimson clover breeding programs, depending on cropping system context [24,34].
Replicate effects explained the most variance in winter survival, biomass per plant, and proportion of plant N as BNF, highlighting the high degree of spatial variability among these traits and the challenges of phenotyping and selection [37,38,41,50,51]. However, unlike other traits, winter survival and biomass per plant did not exhibit accession x year interaction, indicating that despite large annual variation in these traits, agronomically relevant differences in winter hardiness and biomass per plant are observable among crimson clover accessions even in years with mild winters or low biomass overall.
Additional challenges come with the negative correlations among key traits-both fall emergence and winter survival are negatively correlated with both biomass per plant and proportion of plant N as BNF. The presence of these correlations may limit breeders' ability to rapidly integrate all key agronomic traits in a single population. However, several accessions performed among the top five for multiple traits, and should be considered as parents for future breeding populations. All of the nodules contained more genetic diversity than would be expected if a single isolate occupied the nodules; the reference isolates had, on average, 6900 genetic elements [52], whereas the accessions surveyed ranged from 20,000 to 72,000 common genetic elements. This indicates that, in all of the accessions tested, we observed, from 3 to 10 times, the expected diversity of a single bacterial genome (reference genomes), but the range among accessions was quite broad. In the case of PI561944 (~72,000 elements) the genetic diversity is what might be observed if analyzing the genetic diversity of 9-12 genera of bacteria.
Additional analysis of the metagenome data showed that a large proportion of the identified bacterial taxa contained intact or partial nitrogenase operons [42]. There is evidence that genomes of nitrogen-fixing bacteria may have higher GC content then their non-fixing relatives [53]; the metagenomic GC content of the nodule microbiome may be an indicator of nodule populations that harbor a higher or lower proportion of nitrogen-fixing bacteria. Although we did not observe a significant relationship between GC content and BNF, our assessment was not quantitative in regards to extraction of total microbial biomass from the nodules.
Our data suggest that crimson clover accessions contain a wide range of bacteria in their nodules, but transferring these microbes from the nodule to a petri plate likely gave an advantage to the r-selected linages of bacteria (i.e., those with high growth rates) that had successfully infected the nodule [54]. In addition, nodule metagenomic analysis was conducted in only one of the three growing seasons, so we are unable to draw conclusions about the stability of the nodule microbial populations or their impact on annual variation in agronomic traits. We also cannot yet clearly link the genetic diversity of nodule microbiomes of field grown clover to traits important in cover crop production such as biological nitrogen fixation and biomass production.
Although we cannot relate the metagenomic survey to the diversity of phenotypes analyzed for the field work portion of this study, this nodule microbiome dataset gives us a snapshot of the surveyed diversity of plant bacterial symbiosis in planta. Recent studies have shown that the nodule is much more diverse than previously thought; effectively managing the nodule microbiome will require an understanding of interactions among the diversity within the nodule in the context of the plant genotype [55][56][57][58]. It is not known whether this observed range of nodule metagenomic diversity is typical among closely related plant varieties, and additional research is needed to determine any salient relationships between microbial diversity and agronomic performance. The degree of annual and spatial variation in N content and BNF especially highlight the need to study environmental and management effects and genotype × environment interactions on crimson clover microbial populations.

Conclusions
While the effect of crimson cover accession was significant across all evaluated traits, accessions showed particularly clear differences for fall emergence and flowering time, indicating greater diversity and potential for selection in cover crop breeding programs. By contrast, winter survival, biomass per plant, N content, and plant N from BNF showed greater within-and/or among-year variation. Fall emergence and winter survival were found to be positively correlated, and both of these traits were negatively correlated with both biomass per plant and plant N from BNF. Several accessions were identified as high-performing across several key traits, and are of particular interest as parents in future breeding efforts: PIs 369045, 418900, 561943, 561944, and 655006. We observed larger than expected variation among accessions in terms of the diversity within the nodule microbial population, but did not observe any association between nodule metagenome features and plant phenotypic traits. Therefore, additional research is needed to determine functional implications of nodule microbiome diversity, and thus whether to incorporate such data into crimson clover breeding programs.