Fine-Scale Patterns of Genetic Structure in the Host Plant Chamaecrista fasciculata (Fabaceae) and Its Nodulating Rhizobia Symbionts

In natural plant populations, a fine-scale spatial genetic structure (SGS) can result from limited gene flow, selection pressures or spatial autocorrelation. However, limited gene flow is considered the predominant determinant in the establishment of SGS. With limited dispersal ability of bacterial cells in soil and host influence on their variety and abundance, spatial autocorrelation of bacterial communities associated with plants is expected. For this study, we collected genetic data from legume host plants, Chamaecrista fasciculata, their Bradyrhizobium symbionts and rhizosphere free-living bacteria at a small spatial scale to evaluate the extent to which symbiotic partners will have similar SGS and to understand how plant hosts choose among nodulating symbionts. We found SGS across all sampled plants for both the host plants and nodulating rhizobia, suggesting that both organisms are influenced by similar mechanisms structuring genetic diversity or shared habitat preferences by both plants and microbes. We also found that plant genetic identity and geographic distance might serve as predictors of nodulating rhizobia genetic identity. Bradyrhizobium elkanii was the only type of rhizobia found in nodules, which suggests some level of selection by the host plant.


Introduction
Several studies have recognized plants as one of the most important factors shaping soil microbial community structure [1][2][3][4]. It has been suggested that this may be the result of active selection by plants for certain soil microbes or shared habitat preferences by plants and microbes [5] because spatially close biological communities are more similar than expected by chance and this similarity decays with distance [6]. It has also been demonstrated that differences in microbial community composition are significantly correlated with phylogenetic distance of their plant hosts [7]. Plant species that develop specific associations with soil microbes have important implications for coexistence in natural communities [5], and recent evidence suggests that microbial communities are influenced by both dispersal limitation and host plants [8]. In ref. [9], by studying plant phylogeny and life history of summer annuals in an agricultural field, it was indicated that rhizosphere beta-diversity was positively correlated with phylogenetic distance between plant species, but not genetic distance within a plant species. Plants within a closely related taxonomic group are likely to share traits, including amount and availability of rhizodeposits and root defensive strategies, and these are key factors shaping soil microbial communities [10]. Thus, plant hosts with similar genotypes may have similar soil microbial communities [1,11,12] which may indicate the presence of genotype x genotype interactions in these situations. Among the three plots only plot 1 demonstrated a significant regression slope (b F ) for plant kinship coefficient over distance classes (p = 0.001 for b F = −0.0138) ( Table 2). Since the genetic diversity measures among all three plots were very similar (Table 1) and principle coordinate analysis did not separate individuals by plot (Figure 1), we combined samples from the plots for downstream analyses. When considering the combined plots, we found significant autocorrelation in plant genotypes for all distance classes except 244, 261, 344 and 384 m, which are distributed at the intermediate and largest distances ( Figure 2). The average kinship coefficient decreased linearly with the natural logarithm of spatial distance (r ij ) between individuals. Significant positive values of F ij were found at short distances (<43 m), indicating that neighboring individuals had a higher genetic relatedness than random pairs of individuals. Negative values (i.e., individuals within a distance class were less genetically similar than expected with a random distribution) of F ij occurred at larger distances ( Figure 2). Mean values of the regression slope (b F ), F (1) and Sp statistics across all plots were −0.0121, 0.045 and 0.0127, respectively ( Table 2). The p-value for the regression slope was 0.000.
Due to the lack of mature nodules for some of the collected plants, we failed to culture rhizobia for five plants, and sequencing was unsuccessful for 12 samples of the cultured strains. Thus, we were able to generate 53 truA sequences from nodulating rhizobia. The aligned length of the truA sequence set was 497 nucleotides (~67% coverage of truA gene in Bradyrhizobium). All sequences most closely resembled Bradyrhizobium elkanii truA sequences in BLAST-n searches in GenBank (NCBI 1988). Additionally, the phylogeny of truA sequences indicates that all recovered sequences from C. fasciculata cluster exclusively with B. elkanii and with 1.0 posterior probability support ( Figure 3). The sequence diversity of the nodulating rhizobia was found to be similar in all three plots. A similar number of haplotypes (6-9) was found across the plots, and these haplotypes exhibited low mean genetic difference between pairs of sequences (π = 0.03457-0.04175; Table 3). Across the three plots 23 distinct rhizobia truA haplotypes were found. Only plot 1 demonstrated a significant regression slope (b F ) for rhizobia kinship coefficient over distance classes (b F = −0.0165, p = 0.046) ( Table 2). When plots were combined, we found no significant autocorrelations in rhizobia genotypes for any distance classes except for 286 m  Due to the lack of mature nodules for some of the collected plants, we failed to culture rhizobia for five plants, and sequencing was unsuccessful for 12 samples of the cultured strains. Thus, we were able to generate 53 truA sequences from nodulating rhizobia. The aligned length of the truA sequence set was 497 nucleotides (~67% coverage of truA gene in Bradyrhizobium). All sequences most closely resembled Bradyrhizobium elkanii truA sequences in BLAST-n searches in GenBank (NCBI 1988). Additionally, the phylogeny of truA sequences indicates that all recovered sequences from C. fasciculata cluster exclusively with B. elkanii and with 1.0 posterior probability support ( Figure 3). The sequence diversity of the nodulating rhizobia was found to be similar in all three plots. A similar number of haplotypes (6-9) was found across the plots, and these haplotypes exhibited low mean genetic difference between pairs of sequences (π = 0.03457-0.04175; Table 3). Across the three plots 23 distinct rhizobia truA haplotypes were found. Only plot 1 demonstrated a significant regression slope (bF) for rhizobia kinship coefficient over distance classes (bF = −0.0165, p = 0.046) ( Table 2). When plots were combined, we found no significant autocorrelations in rhizobia genotypes for any distance  Due to the lack of mature nodules for some of the collected plants, we failed to culture rhizobia for five plants, and sequencing was unsuccessful for 12 samples of the cultured strains. Thus, we were able to generate 53 truA sequences from nodulating rhizobia. The aligned length of the truA sequence set was 497 nucleotides (~67% coverage of truA gene in Bradyrhizobium). All sequences most closely resembled Bradyrhizobium elkanii truA sequences in BLAST-n searches in GenBank (NCBI 1988). Additionally, the phylogeny of truA sequences indicates that all recovered sequences from C. fasciculata cluster exclusively with B. elkanii and with 1.0 posterior probability support ( Figure 3). The sequence diversity of the nodulating rhizobia was found to be similar in all three plots. A similar number of haplotypes (6-9) was found across the plots, and these haplotypes exhibited low mean genetic difference between pairs of sequences (π = 0.03457-0.04175; Table 3). Across the three plots 23 distinct rhizobia truA haplotypes were found. Only plot 1 demonstrated a significant regression slope (bF) for rhizobia kinship coefficient over distance classes (bF = −0.0165, p = 0.046) ( Table 2). When plots were combined, we found no significant autocorrelations in rhizobia genotypes for any distance A significant relationship between plant genetic and geographic distances was found (r = 0.243, p = 0.001, Figure 5), but the relationship between nodulating rhizobia genetic and geographic distances was not significant (r = 0, p = 0.99, Figure 6), as was the relationship between plant genetic and nodulating rhizobia genetic distance (r = −0.087, p = 0.26, Figure 7). Stepwise regression analysis showed that plant genetic distance, and plant genetic plus geographic distance, are both able to predict genetic distance of nodulating rhizobia with the adjusted r-square of 0.012 and 0.016, respectively (p < 0.0001 for both, Table 4). Geographic distance alone did not significantly determine nodulating rhizobia genetic distance. Even though they were significant, the combined variables of plant genetic distance and geographic distance only explained 1.6% of the observed variation in nodulating rhizobia genetic distances (Table 4). classes except for 286 m (Figure 4), but we found a significant regression slope for rhizobia kinship coefficients against distance among pairs of individuals (bF = −0.0031, p = 0.041) ( Table 2).       A significant relationship between plant genetic and geographic distances was found (r = 0.243, p = 0.001, Figure 5), but the relationship between nodulating rhizobia genetic and geographic distances was not significant (r = 0, p = 0.99, Figure 6), as was the relationship between plant genetic and nodulating rhizobia genetic distance (r = −0.087, p = 0.26, Figure 7). Stepwise regression analysis showed that plant genetic distance, and plant genetic plus geographic distance, are both able to predict genetic distance of nodulating rhizobia with the adjusted r-square of 0.012 and 0.016, nodulating rhizobia genetic distances ( Table 4).
The total number of unique 16S rRNA sequences of Rhizobiales present in the rhizosphere samples from host plants in all three plots was 81 sequences. The majority of the retrieved sequences are of Bradyrhizobium, including B. elkanii (30%), B. canariense (22%), and unidentified Bradyrhizobium strains (9%) (Figure 8). Rhizobium strains were second in abundance, making up 9% of the rhizosphere sequences.

Discussion
In many plant species, seed dispersal has a peak distribution at or close to the maternal plant and progressively fewer seeds are dispersed at greater distances from the maternal plant [40][41][42]. The extent of SGS within plant populations depends on factors such as seed and pollen dispersal distance and effective plant density [43]. According to the theory of isolation by distance, SGS arises from the interplay of limited gene flow and local genetic drift, and the rate of decrease of genetic similarity with distance is a measure of the strength of SGS [44][45][46]. The morphological characteristics of C.

Discussion
In many plant species, seed dispersal has a peak distribution at or close to the maternal plant and progressively fewer seeds are dispersed at greater distances from the maternal plant [40][41][42]. The extent Plants 2020, 9, 1719 9 of 20 of SGS within plant populations depends on factors such as seed and pollen dispersal distance and effective plant density [43]. According to the theory of isolation by distance, SGS arises from the interplay of limited gene flow and local genetic drift, and the rate of decrease of genetic similarity with distance is a measure of the strength of SGS [44][45][46]. The morphological characteristics of C. fasciculata, including heaviness of the seeds and bee pollination, suggest that plants should exhibit an isolation by distance pattern. The SGS pattern detected for host plants in the studied plots was consistent with IBD, showing an increase in genetic distance between pairs of plants as they increased in physical distance from one another. Additionally, the kinship coefficients gradually decrease from very positive to very negative over the distance of sampling. Discovering statistically significant fine scale genetic structure at most of our studied distances is concordant with the existence of very restricted seed dispersal events among plants and suggests that plants are not randomly distributed even at this fine scale. Values for the Sp statistic in this study, ranging 0.0048-0.0139, are comparable to another study on this species with reported Sp statistic of 0.00746 [47]. The mean Sp statistic in other outcrossing or gravity dispersed plant species is 0.0126 and 0.0281, respectively [43], which are comparatively close to our measured Sp statistic value for C. fasciculata.
Given the presence of fine-scale structure in host plants and widely documented genotype x genotype interactions between legumes and their symbionts, we expected to find significant genetic structure in nodulating rhizobia of C. fasciculata as well. Additionally, limited dispersal ability of bacterial cells in the soil due to passive means is expected to maintain structure if there is not extensive soil disturbance. Although we did not find a significant kinship coefficient for pairs of rhizobia at most geographic distances, the mean sp statistic in rhizobia followed the same pattern observed for plant hosts. That is, we found a significant b F for rhizobia and plant hosts in plot 1 and the combined plots. This finding suggests a common mechanism that influences genetic structure of plants and rhizobia. It may be that host plants directly influence their rhizobia symbionts by selecting for certain strains or that plant hosts and their nodulating rhizobia are influenced by similar environmental factors that structure genetic diversity. In our study, we focused on only one housekeeping gene, truA, to identify the nodulating rhizobia and estimate genetic diversity of this community. Although this does not capture genome-wide diversity, it is a suitable genetic marker that was considered highly informative of phylogenetic relationships in Bradyrhizobium by refs. [36,48]. Our study revealed relatively high haplotype diversity (Hd = 0.82-0.86 across plots) in truA, which indicates an ability to discriminate among most of the rhizobia symbionts we recovered. In another study, the nucleotide diversity of Rhizobium leguminosarum biovar viciae isolates from Vicia cracca L. acquired using two housekeeping genes, glnII and recA, and one nodulation gene, nodC, revealed relatively close values of nucleotide diversity for all of the three studied genes (π = 0.036, 0.021 and 0.025, respectively) [49]. This study suggests that single genes can be informative of bacterial genetic diversity.
While a literature review found no fine-scale studies of genetic structure in nodulating rhizobia or plant endophytes for comparison to our results, we have found that the number of haplotypes recovered for C. fasciculata is similar to studies of other legumes. In ref. [36], which also used truA to study genetic diversity of nodulating rhizobia in C. fasciculata across Mississippi, 9-25 haplotypes and 0.831-0.954 for haplotype diversity was reported at a regional scale. Our finding of six haplotypes and substantial haplotype diversity in a single site indicates that populations can harbor diverse symbionts even at the smallest spatial scales. In a fine scale study (<2 m) of diversity patterns in microbial communities associated with roots of Bistorta vivipara L., a perennial herb, it was revealed that similarities between the structure of bacterial and fungal communities were stronger in soils than in roots [50]. The root-associated fungal and bacterial communities in that study both showed significant spatial autocorrelation at distances below 40-50 cm, while soil-associated communities showed no spatial structure across the 2 m scale investigated. This result, along with ours, suggests that different host filters are responsible for structuring symbiotic endophytes of roots and nodules.
Host promiscuity in symbiotic associations can influence exotic legume establishment and colonization of novel ranges [51], and it has been suggested that enhanced diversity of symbionts may underlie the success of host plants in non-native habitats (e.g., [52]). Hence, it may be expected that due to a broad geographic distribution of a host species it potentially harbors different rhizobia strains that enable tolerance to varying soil types and environmental conditions. Unlike this assumption and the previous study on this system, which covered a vast geographic area with different physiographic regions and found different genetic variants of Bradyrhizobium nodulating C. fasciculata [36], in our 400 m study area, the only sequences retrieved from nodules of the widespread C. fasciculata studied here belonged to Bradyrhizobium elkanii (Figure 3). While our results confirm the exclusive use of this Bradyrhizobium by C. fasciculata also noted in other studies [35,36,[53][54][55], it was surprising to find that all host plants harbored strains of a single species, B. elkanii. Although it is possible that other rhizobia types could have existed in untested nodules, the likelihood of high diversity among these seems rare in light of our results. Given that numerous species of Bradyrhizobium were identified in the rhizosphere, these results suggest the selection of particular symbionts by the host plants at this site, which may indicate that genotype x genotype interactions are important in this system too. The genetic diversity measures of nodulating rhizobia obtained by [36] in their study locations nearby the location of this study (e.g., H = 9 and Hd = 0.83-0.93) are very similar to our findings (H = 6-9 and Hd = 0.83-0.86) but they still found multiple species of nodulating rhizobia. Furthermore, genetic diversity analyses of the host plants in this study revealed less genetic variation among plants (Table 1), which suggests that they are likely to prefer the same type of rhizobia to associate with those already supported by selection of the same type of nodulating rhizobia by all studied plants. This finding also supports our hypothesis that plants establish symbioses with only a subset of available soil bacteria in their nodules, and confirms several other studies showing that some microbial species typically associate with only a few or a single plant species. This fact is well known for pathogens as well as beneficial interactions, e.g., in the legume-rhizobium relationship. For example, Sinorhizobium meliloti effectively colonizes Medicago, Melilotus and Trigonella, whereas Rhizobium leguminosarum induces nodules in Pisum vicea, Lens and Lathyrus plants [56]. However, there are also examples of rhizobia with wide host ranges [57,58].
Selection of only one species by the host plants may reduce the SGS pattern of kinship in nodulating rhizobia. A non-significant Mantel test also supports this idea by showing that genetic distance of isolates is not associated with geographic distance between pairs of nodules (r = 0, p = 0.99, Figure 6). Additionally, plant genetic distances are required to explain a small but significant amount of the observed variation as our stepwise regression, revealing such impacts of the host genotype on the genotype of the nodulating rhizobia (Table 4). Despite the expectation that plant genotype would explain a greater amount of variation in nodulating bacteria, other studies have also found a lack of strong influence by plant hosts on rhizobia diversity. For example, in a study on nodulating rhizobia of Lotus japonicus (Regel) K. Larsen, the plant genotype explained only 4.91% of the variation in nodulating rhizobia [24]. The lack of influence of geographic distance in structuring-nodulating rhizobia is in line with results from ref. [36], which were not able to find a significant correlation between geographic and bacterial genetic distances at larger spatial scales either. Their finding is consistent with other studies demonstrating that genetic structure of soil bacteria is largely independent of geographic distance [59]. Results from my study show that plant genetic distance explains only about 1.2% variation in diversity of nodulating rhizobia, while, by adding geographic distance, the new model explains 1.6% of variation in nodulating rhizobia diversity. It is likely that on small scales, host plant and the geographic distance between pairs of hosts, cumulatively, control take up of specific rhizobia in the soil to associate with the host plant.
Along with geographic distance, soil influence on bacterial communities has been investigated in several studies, and ref. [24] observed that while Bradyrhizobium were the major rhizobia symbionts of cowpea, irrespective of plant genotype and soil type, the presence of other genera including Enterobacter, Chryseobacterium, and Sphingobacterium was significantly correlated with the soil type and, to a lesser extent, plant genotype. However, ref. [37] indicated that for rhizobia-host interaction, the plant genotype has a higher influence on the selection of the bacterial symbiont than soil type, due to the fact that the abundance of major OTUs differed in their study under the influence of plant genotype. In ref. [60], it was proposed that the bulk soil surrounding the rhizosphere might introduce low levels of restrictions and/or promote the recruitment of a subset of bacteria that colonize the rhizosphere. Although it might be helpful to analyze the rhizosphere for chemical properties to gain a comprehensive understanding of the factors influencing the microbial communities in the nodules, we postulate that in our fine scale study the soil matrix is quite homogeneous and the role of environmental influences would have been minimized given the very small study area. Congruently, there is no general decision about the key player, which means that both factors can dominate depending on biotic and abiotic conditions [1], and there are several contrasting reports recognizing plant or soil type as the dominant factor [61][62][63]. Although soil type and pH appear to influence free-living soil bacterial communities [17], ref. [64] found that rhizobia housed in nodules of different species of wild Lotus were a subset of those in the surrounding soil, indicating a strong role for plant host to choose particular rhizobia genotypes.  (Figure 9). Whole plants, including roots, were carefully excavated from the soil. Plants were stored separately in plastic bags on ice in the field. Plants were kept at 4 • C until DNA could be extracted. All nodules were removed from the roots and stored at 4 • C until they were plated on growth medium. m. The two sampled plants at each point were within ca. 12 cm of one another (Figure 9). Whole plants, including roots, were carefully excavated from the soil. Plants were stored separately in plastic bags on ice in the field. Plants were kept at 4 °C until DNA could be extracted. All nodules were removed from the roots and stored at 4 °C until they were plated on growth medium. Leaf samples to be used in DNA extraction were kept on ice in the field and then stored in silica gel or at −80 °C prior to DNA extraction. Genomic DNA was extracted using a CTAB method [65]. Each sampled plant was genotyped at 14 trinucleotide microsatellite loci (Table 5) using a multiplexed genotyping approach to quantify genetic structure. For each sample, three multiplex amplification reactions with five, four, and five loci per reaction respectively, were performed in a final volume of 10 μL in the presence of 10 ng of template DNA, 100 µ mole of each of the reverse and tagged fluorescent label primers and 10 µ mole of tagged forward primer using a KAPA 2G Fast Multiplex PCR kit (Kapa Bio-systems, Wilmington, Massachusetts). Tag sequences were derived from [66], and the tag in the 5′ forward primer matched the sequence of the fluorescent labeled primer ( Table 5). The thermal cycler program used to amplify loci included 3 min at 95 °C, 30 cycles of 15 s at 95 °C, 30 s at 60 °C, and 30 s at 72 °C, and a final extension step of 1 min at 72 °C. Successful amplification of the samples was checked by agarose gel electrophoresis, and amplified products were genotyped at the Arizona State University DNA Lab with LIZ 600 size standard. Individual alleles were sized using GeneMarker software (SoftGenetics, State College, Pennsylvania). Table 5. Characteristics of 14 microsatellite loci used to evaluate genetic structure in Chamaecrista fasciculata.

Multiplex
Fluorescen Allele Repeat Leaf samples to be used in DNA extraction were kept on ice in the field and then stored in silica gel or at −80 • C prior to DNA extraction. Genomic DNA was extracted using a CTAB method [65]. Each sampled plant was genotyped at 14 trinucleotide microsatellite loci (Table 5) using a multiplexed genotyping approach to quantify genetic structure. For each sample, three multiplex amplification reactions with five, four, and five loci per reaction respectively, were performed in a final volume of 10 µL in the presence of 10 ng of template DNA, 100 µmole of each of the reverse and tagged fluorescent label primers and 10 µmole of tagged forward primer using a KAPA 2G Fast Multiplex PCR kit (Kapa Bio-systems, Wilmington, Massachusetts). Tag sequences were derived from [66], and the tag in the 5 forward primer matched the sequence of the fluorescent labeled primer ( Table 5). The thermal cycler program used to amplify loci included 3 min at 95 • C, 30 cycles of 15 s at 95 • C, 30 s at 60 • C, and 30 s at 72 • C, and a final extension step of 1 min at 72 • C. Successful amplification of the samples was checked by agarose gel electrophoresis, and amplified products were genotyped at the Arizona State University DNA Lab with LIZ 600 size standard. Individual alleles were sized using GeneMarker software (SoftGenetics, State College, Pennsylvania).  One to two nodules per plant were surface sterilized with 1% hypochlorite for 5 min then washed in sterile water. Later, they were placed in 70% ethanol for 5 min and then were washed three times with sterile distilled water each for 1 min. After surface sterilization, nodules were ground to release rhizobia, and this mixture was plated on solid agar MAG medium [67] in petri dishes at 30 • C until colonies appeared, ca. 4-10 days after plating. One colony per sample that could be morphologically identified as Bradyrhizobium (i.e., creamy yellow color, smooth margins, medium sized, and round appearance), following [68,69], was randomly picked and suspended in 50 µL 1× TE buffer solution (pH = 8). Even though there are some studies suggesting the possibility of multiple strains per nodule (e.g., [70]), we selected only one colony per sample because only a single strain of rhizobia is typically found in a nodule as it arises from infection by a single bacterial cell (e.g., [71,72]). Prior to their use directly in PCR, the cells were lysed by heating at 65 • C for 5 min. TruA, a housekeeping gene involved in translation and ribosomal biogenesis [73] and that is capable of distinguishing Bradyrhizobium strains [36,48,74] was used to characterize nodulating rhizobia. Bacterial samples were amplified with truAB-F/R primers specific for Bradyrhizobium [48] and then sequenced using Sanger sequencing. PCR was used to amplify the region in 12.5 µL volume, containing 1 µL DNA, 1× LongAmp buffer (New England Biolabs, Ipswich, MA, USA), 1.5 U LongAmp Taq (New England Biolabs), 0.32 mM dNTP's, 0.4 µM forward primer, and 0.4 µM reverse primer (Integrated DNA Technologies, Coralville, IA, USA). The thermal cycler program consisted of heating to 95 • C for 5 min, followed by 11 cycles of 94 • C for 45 s, 60 • C for 1 min. decreased by 1.0 • C per cycle, 72 • C for 1 min., 26 cycles of 94 • C for 45 s, 50 • C for 1 min., 72 • C for 1 min., and an elongation step of 72 • C for 10 min. Amplified products were subjected to agarose gel electrophoresis to check for the presence of a single band of the expected size for truA. Successfully amplified products (one sequence per sample) were cleaned by adding 0.2x Antarctic Phosphatase buffer, 5 units of Exonuclease I, and 1.25 units of Antarctic Phosphatase (New England Biolabs, Ipswich, MA, USA), to 7 µL of PCR product. This mixture was heated to 37 • C for 15 min followed by 80 • C for 15 min. Once the samples were cleaned, cycle sequencing was conducted in 10 µL reactions using either the forward or reverse primer and Big Dye version 3.1 (Life Technologies, Carlsbad, CA, USA). Forward and reverse primer sequences were generated for all individuals using the PCR primers. Sequenced samples were dried and sent to the Arizona State University DNA Lab for capillary electrophoresis. Rhizobia forward and reverse sequences were edited and assembled into a consensus sequence for each sample using Sequencher version 4.7 (Gene Codes Corporation, Ann Arbor, MI, USA). Sequences were aligned using the alignment algorithm in Geneious v.10.2.5 (Biomatters, Inc. Newark, NJ, USA).
For bacterial community sequencing of rhizosphere samples, whole-community DNA was extracted from each rhizosphere sample (n = 70 plants) using 0.5 g of soil and the FastDNA™ SPIN Kit for soil isolation (MP Biomedicals, Solon, OH, USA). We were unsuccessful in getting truA to amplify consistently in these samples. Thus, we used the 16S rRNA gene to characterize rhizosphere rhizobia communities because this gene has been broadly used in prokaryote taxonomy and phylogenetic reconstructions and sequences are widely available across prokaryotic lineages [75]. Besides, previous studies have shown that assessment of rhizobial genotypic diversity relevant to ecologically oriented studies can be achieved by 16S rRNA sequencing [76]. Using primers 319F and 806R designed for the V3 and V4 region of the 16S rRNA gene [77], amplicons were produced using a 2-step PCR method [77] and sequenced on an Illumina (San Diego, CA, USA) instrument. Each step 1 PCR reaction contained 1× Phusion Taq master mix (New England Biolabs, Ipswich, MA, USA), primers 319F and 806R (0.4 µM each), 3% DMSO, and 5 ng genomic DNA. The thermal cycler program included the following cycles: an initial denaturation at 94 • C for 3 min, 20 cycles of denaturation at 94 • C for 30 s, annealing at 58 • C for 30 s, elongation at 72 • C for 1 min, and a final elongation step at 72 • C for 7 min. Successful amplifications were tested by running a small amount using agarose gel electrophoresis. The second PCR, which added barcodes and adapters, and remaining steps in library preparation were conducted according to [77] at the Microbiome Service Lab at the University of Maryland School of Medicine. All samples were sequenced together in a single run on a MiSeq instrument (Illumina, San Diego, CA, USA) using paired end reads and length of 300 bp each.
To identify bacterial species in the rhizosphere samples, bioinformatics analysis and annotation of the output data were carried out following Berlanas et al. (2019) using QIIME2 [78]. This software provides quality filtering, picking operational taxonomic units (OTUs), taxonomic assignment, phylogenetic reconstruction, diversity analysis, and graphical displays [79]. Sequences were demultiplexed by sample at the Microbiome Service Lab at the University of Maryland School of Medicine using a dual-barcode strategy, a mapping file linking barcode to samples, and QIIME-dependent script of split_libraries.py and split_sequence_file_on_sample_ids.py, [77]. Primer sequences were removed from each read using q2-cutadapt plugin, and sequence quality control and feature table construction were conducted using a DADA2 pipeline [80]. Chimeras for combined runs were removed per the DADA2 pipeline. Amplicon sequence variants (ASVs) generated by DADA2 were taxonomically classified using the scikit-learn classifier [81] trained with the SILVA v132 16S rRNA gene sequence database [82]. After taxonomic classification, sequences belonging to Rhizobiales, which includes Bradyrhizobiaceae, were extracted from the dataset.

Data Analysis
The plant dataset was tested for presence of unique multilocus genotypes using the Poppr package [83] in RStudio statistical software v. 1.1.456 [84]. Genetic diversity of plants was assessed as mean number of alleles per locus (A), percent of polymorphic loci (P), observed heterozygosity (H O ), expected heterozygosity (H E ), and inbreeding coefficient (F IS ) using GenAlEx version 6.503 [85]. For each locus, departure from Hardy-Weinberg expectations was tested through permutations of alleles among individuals and statistical significance was assessed using a p-value of 0.001, which is adjusted using the sequential Bonferroni correction for multiple comparisons [86]. A Principal Coordinates Analysis (PCoA) was conducted using GenAlEx version 6.503 [85] to explore dissimilarities among samples in the three studied plots. We performed analysis of spatial genetic structure (SGS) for plant samples using SPAGeDi 1.5 [87]. The plant microsatellite loci were analyzed in each pair of individuals using pairwise relatedness coefficients according to Ritland's estimator (Equation (5) in [88]), which has been shown to be the best estimator, especially with highly polymorphic markers [43]. Seventeen distance intervals, from 1 to 384 m, were plotted to maximize the number of pairwise comparisons. Spatial genetic structure was tested by assessing the significance of the regression slope (b F ) of pairwise statistics (F ij ) on ln (distance) using 9999 randomizations of the individual spatial positions and obtaining 95% confidence intervals (CIs) after bootstrapping (1000 iterations) using SPAGeDi 1.5 [87]. Spatial genetic structure was also quantified by the "Sp" statistic, which estimates SGS intensity and is calculated as − b F /(1 − F (1) ), where F (1) is the mean F (ij) between individuals belonging to the first distance interval and b F is the regression slope of F (ij) on r ij (physical distance between samples i and j) [43]. F (1) can be considered a good estimate of the kinship between pairs of neighbors, on the condition that the first distance interval contains enough pairs of individuals to obtain reasonably precise F (1) values [43]. Standard errors for the estimates of the kinship coefficients per distance class were estimated using a jackknife procedure over the loci. Then, values of the average genetic relatedness statistic between pairs of individuals separated by given distance intervals were plotted using a spatial autocorrelogram.
The sequence of truA from each nodule was checked against GenBank sequences via BLAST-n searches [89] to identify closely matched bacterial strains. Rhizobia truA sequences were aligned in Geneious v.10.2.5 (Biomatters, Inc. Newark, NJ, USA) using the internal alignment algorithm and compared against five reference sequences of taxonomically valid Bradyrhizobium retrieved from GenBank [89] in a phylogenetic tree. The reference sequences included B. canariense BTA1 (Accession: JX064276), B. elkanii USDA 76 (Accession: JX064277), B. japonicum USDA 6 (Accession: JX064272), B. liaoningense USDA 3622 (Accession: JX064278), and B. yuanmingense CCBAU 10,071 (Accession: JX064271). These five strains are informative for identifying Bradyrhzobium species which associate with plants from Fabaceae [36,53]. We used jModeltest2 [90,91] to select HKY+G [92] as the best fitting model of molecular evolution according to the BIC MrBayes v. 3.2.3 [93] was used for phylogenetic reconstruction in the Cipres Science Gateway [94]. We conducted Markov Chain Monte Carlo (MCMC) 5 million generations, sampling every 1000 points. Prior to determining the posterior probability of the trees with the highest likelihood, 1250 trees were discarded as burn-in. A consensus tree is reported with posterior probability indicating support for clades. Sequence diversity analysis including number of haplotypes (H), haplotype diversity (Hd), and π for each plot and the entire dataset was conducted using DnaSP v5 [95].
Analysis of spatial genetic structure (SGS) of nodulating rhizobia was conducted for each plot separately and all plots combined using SPAGeDi 1.5 [87]. Rhizobia truA haplotype sequences were converted to SPAGeDi 1.5 input using haplotype codes generated by SPADS 1.0 [96]. All settings in SPAGeDi 1.5 were the same as those used in analysis of SGS for the plants. Spatial genetic structure was tested by assessing the significance of the regression slope (b F ) of pairwise statistics (Fij) on ln (distance) using 9999 randomizations of the individual spatial positions and obtaining 95% confidence intervals (CIs) after bootstrapping (1000 iterations) [87], and seventeen distance intervals, from 1 to 384 m, were plotted to maximize the number of pairwise comparisons.
Isolation-by-distance for the plant hosts and nodulating rhizobia was investigated separately by comparing pairwise population genetic and geographic distances in a Mantel test [97]. Pairwise genetic distances for plant individuals were calculated using GenAlEx [85], and for rhizobia truA sequences using Mega v.6 [98]. Geographic distances between collected individuals were calculated using a modification of the haversine formula [99] based on the coordinates of the sampled plants in GenAlEx [85]. The Mantel test was performed using PASSaGE 2 [100], and 1000 permutations of the datasets were used to assess significance of the correlation. A stepwise regression analysis using IBM SPSS Statistics v.25 [101] was also performed to test whether plant genetic and physical distance can be considered as explanatory variables for rhizobia genetic distances in the nodules.
Sequences of Rhizobiales 16S rRNA that were generated from rhizosphere samples were compared against sequences in GenBank [89] through BLASTn searches to identify closely matching sequences. The highest scoring, as determined by e-value, sequence identified to the species level was selected as the best matching sequence. The relative proportions of each identified strain across all plots were compared to have an understanding of the diversity and abundance of potential symbionts for C. fasciculata in the rhizospheric soil.

Conclusions
No study to date has been conducted on genetic diversity and structure of legume rhizobia system within Chamaecrista fasciculata and at the fine scale studied here. Here we showed that there is fine-scale genetic structure in the host plant and its nodulating rhizobia, and that plant genotype along with geographic distance contribute minimally to explaining genetic distance among nodulating rhizobia. The data indicate that co-located host plant individuals are likely to be more genetically similar than those more distant probably because large seeds are maintained close to the maternal plant. The low dispersal ability of bacteria in soil and specificity between legume and rhizobia species may explain the common pattern of structure between rhizobia and their host plants.
Nevertheless, the small amount of variation in nodulating rhizobia genetic distances explained by host genetic distance and geographic distance suggests that there may be other environmental factors influencing these interactions. We also found that the only rhizobia strain retrieved from sampled host plant nodules belonged to Bradyrhizobium elkanii, which was also the most common form in the rhizosphere. This finding reveals a relatively high degree of plant selection among available rhizobia in soil. To understand to what degree this strain-specific legume rhizobia symbioses can develop, further studies in particular habitats are needed.