The Conservation Genetics of Iris lacustris (Dwarf Lake Iris), a Great Lakes Endemic

Iris lacustris, a northern Great Lakes endemic, is a rare species known from 165 occurrences across Lakes Michigan and Huron in the United States and Canada. Due to multiple factors, including habitat loss, lack of seed dispersal, patterns of reproduction, and forest succession, the species is threatened. Early population genetic studies using isozymes and allozymes recovered no to limited genetic variation within the species. To better explore genetic variation across the geographic range of I. lacustris and to identify units for conservation, we used tunable Genotyping-by-Sequencing (tGBS) with 171 individuals across 24 populations from Michigan and Wisconsin, and because the species is polyploid, we filtered the single nucleotide polymorphism (SNP) matrices using polyRAD to recognize diploid and tetraploid loci. Based on multiple population genetic approaches, we resolved three to four population clusters that are geographically structured across the range of the species. The species migrated from west to east across its geographic range, and minimal genetic exchange has occurred among populations. Four units for conservation are recognized, but nine adaptive units were identified, providing evidence for local adaptation across the geographic range of the species. Population genetic analyses with all, diploid, and tetraploid loci recovered similar results, which suggests that methods may be robust to variation in ploidy level.


Introduction
In 1818, Thomas Nuttall described a new species of crested Iris L., Iris lacustris Nutt., "on the gravelly shores of calcareous islands of Lake Huron" [1]. Since then, the recognized geographic range of the species has expanded to include the northern regions of Lakes Huron and Michigan in the United States and Canada. Presently, the species is known from 165 occurrences, with more than half in Michigan (89) and the others split between Wisconsin (36) and Ontario (40) [2].
Plants of I. lacustris grow less than 15 cm in height [3], and this feature provides the species with its common name, Dwarf Lake Iris. The species bears self-compatible flowers, with purple sepals and purple petals with yellow and white markings, that are visited by various species of bees [4]. Across its geographic range, I. lacustris frequently inhabits the understory of coniferous forests along the shore, although a small number of inland populations are known ( Figure 1) [2,5,6]. These habitats have thin entisols, and the dominant tree species primarily include Thuja occidentalis L., Abies balsamea (L.) Miller, and Picea glauca (Moench) Voss. The species has become a well-known endemic plant of the Great Lakes and is so characteristic of the region that it was recognized as the state wildflower of Michigan [7]. In 1988, 170 years after I. lacustris was initially described, the species was listed as federally threatened [5]. The small number of populations and individuals is due to multiple factors, including the loss of shoreline habitat, fungal infection of fruits, lack of seed dispersal, and overgrowth of the forest canopy that restricted plant growth, flower production, and sexual reproduction. Plants of the species currently reproduce more by vegetative growth than germination from the myrmecochorous seeds [5]. Despite this low germination rate, seeds can remain viable in the seedbank for at least 15 years [5], a factor that could influence long-term population growth and genetic diversity, although mass germination and recruitment are rare [4].
The ecology of I. lacustris has been examined to a greater extent than the genetic diversity of the species. To date, only three studies have explored this topic: Simonich and Morgan [8] examined nine populations in Wisconsin, using 22 allozyme markers, Orick [9] investigated nine populations in Michigan, using 24 isozymes, and Hannan and Orick [10] examined nine populations in Michigan, using 18 isozymes. In two studies, researchers identified genetic homogeneity across the populations; however, Orick [9] found overall heterozygosity to be 3.7%. Hannan and Orick [10] also note gene silencing may have been possible in four loci. In contrast to the genetic diversity recognized in I. lacustris, Hannan and Orick [10] found that the sister species, I. cristata Aiton [11], which has a wider geographic range across eastern North America, was variable at 11 of 15 loci. In 1988, 170 years after I. lacustris was initially described, the species was listed as federally threatened [5]. The small number of populations and individuals is due to multiple factors, including the loss of shoreline habitat, fungal infection of fruits, lack of seed dispersal, and overgrowth of the forest canopy that restricted plant growth, flower production, and sexual reproduction. Plants of the species currently reproduce more by vegetative growth than germination from the myrmecochorous seeds [5]. Despite this low germination rate, seeds can remain viable in the seedbank for at least 15 years [5], a factor that could influence long-term population growth and genetic diversity, although mass germination and recruitment are rare [4].
The ecology of I. lacustris has been examined to a greater extent than the genetic diversity of the species. To date, only three studies have explored this topic: Simonich and Morgan [8] examined nine populations in Wisconsin, using 22 allozyme markers, Orick [9] investigated nine populations in Michigan, using 24 isozymes, and Hannan and Orick [10] examined nine populations in Michigan, using 18 isozymes. In two studies, researchers identified genetic homogeneity across the populations; however, Orick [9] found overall heterozygosity to be 3.7%. Hannan and Orick [10] also note gene silencing may have been possible in four loci. In contrast to the genetic diversity recognized in I. lacustris, Hannan and Orick [10] found that the sister species, I. cristata Aiton [11], which has a wider geographic range across eastern North America, was variable at 11 of 15 loci. These studies suggest that the genetic diversity of this rare species of Iris is quite limited. This genetic paucity is intriguing because I. lacustris and its sister are both putative tetraploids [10], and polyploid plant species tend to have greater genetic variation than diploid relatives, although selfing tends to be higher in polyploids [12][13][14]. Importantly, the genetic diversity of the I. lacustris may have implications for the ability of the species to respond to the changing environment across its geographic range and for various conservation efforts.
In order to investigate the population and conservation genetics of the species in a comprehensive manner, we examined multiple populations from across Michigan and Wisconsin, and we used tunable Genotyping-by-Sequencing (tGBS [15]), a method of reduced representation sequencing, to identify single nucleotide polymorphisms (SNPs) among the populations. The objectives of the present study are threefold: (1) identify genetic diversity and population structure and substructure across the range of I. lacustris, (2) explore patterns of migration, and (3) recognize population clusters for management of this rare species. Given the paucity of genetic diversity identified in previous studies, we hypothesized that there would be limited genetic variation across the species.

DNA Sequencing and Polyploid Filtering
Among the 171 individuals of 24 populations across the geographic range of I. lacustris in Michigan and Wisconsin ( Figure 1, The numbers of SNPs in the diploid and tetraploid datasets identified through analysis in polyRAD are in Table 2. Table 1. Population and sampling information and assignation of populations to clusters based on results of various population genomic analyses, including recognition of management and adaptive units, based on loci not under and under selection, respectively. Cluster, management unit, and adaptive unit assignation is based on population genetic analyses with fastStructure, discriminant analysis of principal components (DAPC), principal component analyses (PCA), and others described in the text.

Population Genomics
Across all datasets, observed heterozygosity slightly exceeds expected heterozygosity, and F IS values are, in general, negative (Table 3). Pairwise F ST values vary from 0.1-0.45, and results are similar among datasets (Table 4). Based on various AMOVA results, most of the variation is within samples, followed by between the populations, regardless of the datasets and partitioning of the populations (Supplemental Table S1). Mantel tests for isolation-bydistance analyses identify all datasets as having spatial structure (Supplemental Figure S1) with p < 0.001 for analyses of individuals, but only MCR90 datasets had spatial structure for populations (p < 0.05).
Results from analyses in fastSTRUCTURE, STRUCTURE, MavericK, and tess3r are similar. Based on the results from STRUCTURESELECTOR, the optimal K values were greater for all loci analyzed together than for either the diploid or tetraploid loci analyzed independently ( Table 2, Supplemental Table S2). Similar clusters were recovered with the different datasets ( Figure 2, Table 1), with a clear division between three groups-eastern, western, and central populations-and multiple analyses resulted in the central population being divided into two distinct groups at K = 4 and/or 5 ( Figure 1, Supplemental Figures S2-S4), especially for all loci in fastSTRUCTURE and multiple datasets with STRUCTURE, MavericK, and tess3r. At K = 4-5, the two Wisconsin populations were often recovered with unique genomic signatures suggestive of admixture, and this is particularly the case with the MCR90 datasets. While the results of conStruct are similar to others, the three distinct groups identified are more opaque, with boundaries between the eastern and western populations overlapping to a larger extent than with the other analyses (Supplemental Figure S5); although, similar patterns can be recognized at K = 4 and 5 for the MCR90 all and diploid loci datasets. Among all    The results of principal components analysis (PCA) and discriminant analyses of principal components (DAPC) are similar to those that explicitly consider a priori population structure. With PCA, three to four clusters were recovered corresponding to the same ones from the population assignation analyses, and this was more evident with the MCR90 datasets compared to the MCR50 ones. In all analyses, three populations-MI6, MI16, and WI5-were recognized as most distinct from the other populations. Across DAPC analyses, individuals from populations tended to cluster together, and this is similar to results from other methods. In general, DAPC analyses recover MI6, MI16, and WI5 as distinct units or as a cluster together, with the results for MCR50 all loci being the only exception. In analyses with this dataset, WI5 was included in a cluster distinct from the other two populations, but with WI4 and populations from Michigan. In some  The results of principal components analysis (PCA) and discriminant analyses of principal components (DAPC) are similar to those that explicitly consider a priori population structure. With PCA, three to four clusters were recovered corresponding to the same ones from the population assignation analyses, and this was more evident with the MCR90 datasets compared to the MCR50 ones. In all analyses, three populations-MI6, MI16, and WI5-were recognized as most distinct from the other populations. Across DAPC analyses, individuals from populations tended to cluster together, and this is similar to results from other methods. In general, DAPC analyses recover MI6, MI16, and WI5 as distinct units or as a cluster together, with the results for MCR50 all loci being the only exception. In analyses with this dataset, WI5 was included in a cluster distinct from the other two populations, but with WI4 and populations from Michigan. In some analyses, such as MCR50 and MCR90 diploid loci, the divided cluster of central populations was identified. The number of loci under selection in each dataset is in Table 2.
Patterns of migration inferred from BA3-SNPs suggest that migration is minimal, regardless of the dataset analyzed, and that most individuals are from their original population ( Figure 3). While this was certainly the case for all loci for MCR90, analyses with only the diploid loci for three or four population clusters ( Table 1) provide evidence of greater rates of migration between adjacent populations ( Figure 3). Migration directly between the eastern and western populations was negligible. The relationship among the four population clusters that was most supported by the results of DIYABC-RF and abcranger varies depending on the dataset analyzed. For all, diploid, and tetraploid loci, (West (Mid1 (Mid2, East))), (West (East (Mid1, Mid2))), and (West (Mid1 (Mid2, East))) are recovered as optimal, respectively, and (Mid2 (West (Mid2, East))) and (West (East (Mid1, Mid2))) are identified as close second choices for all and tetraploid datasets, respectively. The one constant among the three optimal trees is that the western population is recognized to have diverged prior to the mid and eastern populations, and this also is the case for one of the near-optimal trees (Supplemental Figure S17).

Conservation Units
Based on the method of Funk et al. [16], evolutionarily significant units (ESUs) were identified using all loci, as described below, and the management units (MUs), which are based on fastSTRUCTURE, PCA, and DAPC analyses with loci not under selection, are

Conservation Units
Based on the method of Funk et al. [16], evolutionarily significant units (ESUs) were identified using all loci, as described below, and the management units (MUs), which are based on fastSTRUCTURE, PCA, and DAPC analyses with loci not under selection, are quite similar. The largest difference between ESUs and MUs is that the two populations in Wisconsin may or may not be included with the other two western populations, MI6 and MI16, depending on the use of all loci or only diploid or tetraploid loci (Figure 4). The populations on Bois Blanc Island also have mixed ancestry based on these loci. The adaptive units, which are based on fastSTRUCTURE, PCA, and DAPC analyses with loci under selection, provide quite different results. Generally, among analyses, nine adaptive units are recognized, and these are structured based on geography (Figure 4, Supplemental Figure S18, Table 1).

Conservation Units
Based on the method of Funk et al. [16], evolutionarily significant units (ESUs) were identified using all loci, as described below, and the management units (MUs), which are based on fastSTRUCTURE, PCA, and DAPC analyses with loci not under selection, are quite similar. The largest difference between ESUs and MUs is that the two populations in Wisconsin may or may not be included with the other two western populations, MI6 and MI16, depending on the use of all loci or only diploid or tetraploid loci (Figure 4). The populations on Bois Blanc Island also have mixed ancestry based on these loci. The adaptive units, which are based on fastSTRUCTURE, PCA, and DAPC analyses with loci under selection, provide quite different results. Generally, among analyses, nine adaptive units are recognized, and these are structured based on geography (Figure 4, Supplemental Figure S18, Table 1).  Table 1, and best K values noted in Table 2.
The results of the fastSTRUCTURE, PCA, and DAPC are similar, with one exception. Unlike analyses with fastSTRUCTURE and PCA, where individuals of the same population cluster together, with DAPC, some individuals of the same population are members of different clusters. This is likely due to the large number of clusters identified as optimal, which is particularly the case for MCR90 and MCR50 datasets with all loci.

Population Structure and Genetic Diversity
Based on the multiple datasets explored using various methodological approaches, three or four different population clusters were frequently recognized for I. lacustris across Michigan and Wisconsin. These clusters are structured geographically, with eastern, central, and western groups, and at higher K values, the central group is subdivided into two groups that are also geographically oriented (Figures 1 and 2). In the three prior studies that employed isozymes and allozymes to examine the population genetics of I. lacustris [8][9][10], no to limited genetic diversity was identified in the populations. Each study only investigated the genetic diversity of populations within one state, using markers available at the time, which likely led to the paucity of genetic diversity. In the present study, many more loci were examined, and individuals from across most of the geographic range of the species were analyzed together, which provides a more holistic approach to elucidating the genetic diversity of the species. These results demonstrate that our hypothesis-a lack of genetic diversity among the species-was incorrect.
Across all studied populations, statistically significant isolation-by-distance is noted, and much of the genetic variation occurs within samples and among populations, with little variation within each population. These results are, on some level, unsurprising for a species that is not only clonal but also includes minimal sexual reproduction. Sampling issues, such as small numbers of individuals studied for some populations and potential collection of ramets, could also have contributed to limited within-population genetic diversity. Additionally, almost all populations have negative F IS values, a finding frequently occurring with clonal plants [17]. A similar result was recovered by Edgeloe et al. [18] for another clonal, polyploid species, Posidonia australis Hook.f. Despite the clonal growth in these polyploid species, the multiple gene copies may provide sufficient genetic diversity and potential so that rare species, such as I. lacustris, do not suffer the negative long-term impacts of vegetative reproduction and inbreeding. The changing climate will certainly be a test as to whether the genetic diversity harbored in each population will be appropriate to adapt to new conditions [19].
Among the identified clusters of populations, there are two notable areas: Bois Blanc Island in the eastern part of the sampled range and the four western populations. In Bois Blanc Island, the populations display mixed ancestry between the eastern and central populations, and these were results recovered with multiple datasets and analyses. This mixed ancestry could occur because of hybridization on the island itself with ancestors from both populations colonizing and interbreeding there. Alternatively, hybridization could have taken place on the mainland of the lower peninsula of Michigan, such as at MI7 or MI8, followed by colonization of the island. While the signature of mixed ancestry identified in the present study may suggest that hybridization is recent, given that the species reproduces clonally, the signature of (older) hybridization could remain for an extended period of time. It is useful to keep in mind that the island and nearby areas on the mainland are some of the more heavily sampled geographic regions in the present study. This greater sampling could hint at a similar pattern in other areas if individuals were sampled to a larger extent. It was not possible to include representatives from Ontario, Canada in the study, and future studies that add these will likely have greater context for the relationship of the central and eastern populations to those even farther east.
The four populations in the western cluster (MI6, MI16, WI4, and WI5) are notable. While these populations form a cluster in most analyses (Figure 2), the two Wisconsin populations (WI4 and 5) differ from those in Michigan, and, in some analyses, from each other. While WI4 and WI5 are geographically close together on the Door Peninsula and tend to cluster together in some analyses, WI4 is sometimes resolved as sharing ancestry with the eastern populations, which is not the case for WI5. This could be due to the retention of ancestral polymorphism or the fact that the establishment of each of these populations differs. However, in analyses that account for both genetic and geographic data (i.e., tess3r and conStruct), both Wisconsin populations are distinct clusters and/or are usually allied with the other western populations. This is particularly the case for the diploid dataset. In another, well-known Great Lakes shoreline endemic, Cirsium pitcheri Torr. & A.Gray, a similar pattern was recovered. The populations from the Door Peninsula are also quite distinct from others on Lake Michigan [20], and the northern populations on the peninsula share more alleles with the populations in the Upper Peninsula of Michigan than with some of the populations on the southern part of the peninsula.
MI6 and MI16 are intriguing populations of I. lacustris because they are situated inland, and this is not the case for the other sampled populations. While other populations can be found a short distance from the shoreline, these populations are ca. 30 km from the current boundary of Lake Michigan. These two populations are consistently recognized as genetically distinct from the other sampled populations, and these both likely became established during higher water level periods of Glacial Lake Algonquin ca. 12,500 years ago [21,22]. As water levels decreased during the time of Glacial Lake Chippewa and subsequently rose to current levels, these two populations became isolated in suitable habitat (e.g., conifer wetland) that allowed individuals of I. lacustris to persist, but without the opportunity to interbreed with other, coastal populations, resulting in their distinct genetic signature (Figure 2).

Migration and Demography
After deglaciation, I. lacustris migrated eastward from the western part of its range. This pattern provides evidence that MI6 and MI16 became established early in the colonization of the species during times of higher water levels and, therefore, are relicts rather than the result of inland dispersal. Additionally, the central and then eastern populations developed via migration across northern Lakes Michigan and Huron, and these populations may have retained some of the ancestral polymorphisms in the more western populations, such as WI4 and WI5. This west-to-east pattern suggests that the populations in Ontario are the most recently established, a hypothesis that can be tested during a future study. The pattern noted here for I. lacustris differs from that of C. pitcheri, which is hypothesized to have migrated from east to west [20].
Overall, rates of migration, as inferred with BA3-SNPs, among populations are minimal, a result recovered in other species of Iris on the Korean Peninsula [23] and a pattern that is not uncommon for narrow endemics [20]. This minimal migration is the case for all 24 populations studied as well as with three and four population clusters inferred (Figure 3). Although the species presently reproduces within populations, migration occurred and may have provided an infusion of new alleles, even if this was not a common occurrence.
In C. pitcheri, Fant et al. [20] note that the changes in the water level of the Great Lakes shaped the geographic distribution of this endemic species, with lower water levels allowing for increased connection among populations. Lake level changes could also have impacted the geographic distribution of I. lacustris. This is particularly the case for the more inland populations, which could have become established ca. 4500 years ago during the most recent high water levels for the lake. Lower lake levels may have influenced colonization of the islands as well as migration across the northern regions of Lake Michigan and allowed for the exchange of individuals that currently would be more challenging.
An alternative hypothesis for the present geographic distribution of the species also exists. Van Kley and Wujek [6] and Brotske [4] provide evidence that I. lacustris can inhabit a diversity of ecosystems and that changes in patterns of disturbance and forest succession following European colonization of the area reduced the suitable habitat for the species (e.g., more forests with more closed canopies). This has resulted in populations primarily being restricted to shorelines where habitat was appropriate. If this is the case, the inland populations, such as MI6, would still represent relicts of a prior time, but this would be due to remnant habitat availability based on adequate disturbance regimes and/or seral stages, not prior establishment during higher water levels of the Great Lakes and subsequent serendipitous survival.

Subsetting Diploid and Tetraploid Loci
In the present study, polyRAD [24] was used to create datasets of diploid and tetraploid loci, and these were analyzed alongside a dataset of all loci for the MCR90 and MCR50 datasets. In general, analyses of all six datasets produced fairly similar results (Figure 2, Tables 3 and 4). fastSTRUCTURE analyses of MCR90 and MCR50 datasets of all loci resulted in the identification of a cluster of six populations in the central part of the sampled population of I. lacustris (MI2, MI3, MI4, MI11, MI12, and MI20) that was not recovered with the diploid or tetraploid datasets, although hints of this cluster can be seen in the MCR90 2N dataset at K = 5. This cluster is identified in all of the datasets with loci under selection as either one or two clusters ( Figure 2) and with the MCR90 datasets analyzed with STRUCTURE [25] and MavericK [26].
The similar results among the datasets, regardless of ploidy, may provide some evidence that not disentangling diploid and tetraploid loci from all loci may not lead to spurious results using SNP data for population genomics [27]. This statement should be treated with skepticism because it is based only on one, empirical, study. Others who have used polyRAD to subset their datasets and identify diploid loci to use for population genomics [28,29], which is a practice aligned with assumptions of common methods [28], have not explored the use of all loci and/or tetraploid loci in comparison to only ones that segregate as diploids. It would be useful for additional studies on the population genomics of polyploid species to examine data employing all, diploid, and tetraploid (and higher) loci to determine if similar or divergent results are recovered. At the same time, the results presented herein may provide some level of confidence for researchers investigating the population genomics of species of unknown ploidy that use all loci identified via tGBS, and similar reduced-representation methods may not yield incongruent results.

Conservation Genetics of I. lacustris
The evolutionarily significant units (ESUs) were described above with all loci used for population genomic analyses, and the management units (MUs), which were determined using only loci not under selection, are similar, but not identical to the ESUs; however, the differences are minor (Figure 4). Given the similar ESUs and MUs, the management of the populations of I. lacustris could be geographically clustered into three to four units. However, the results of the use of the loci under selection to resolve adaptive units (AUs) differ from those of ESUs and MUs (Supplemental Figure S18). The AUs provide evidence of local adaptation, so managing only three or four MUs would not necessarily ensure that all of the genetic diversity of the species is appropriately protected. A total of nine AUs are recognized (Table 1), and while these are also geographically clustered, the AUs are much smaller than are the ESUs and MUs (Figure 4).
This local adaptation is, on some level, unsurprising, because even though the species is generally restricted to the same type of habitat presently (i.e., shorelines), climatic, soil, and vegetation differences occur across the geographic range of the species. Indeed, I. lacustris inhabits three of the landscape ecology regions of Michigan and multiple districts and subdistricts within each region [30,31]. Van Kley and Wujek [6] also recognized four soil types, four vegetation types, and pH variation across the species' range. Given that the species primarily reproduces asexually, this can lead to a loss of genetic variation over time as a limited number of successful genotypes dominates each particular climate-soil-vegetation combination. Consequently, the seemingly same type of habitat in a geographically distinct area may result in local adaptations to the specific region and ecosystem and contribute to outbreeding depression, limiting successful offspring from infrequent interpopulation crosses.

Plant Material
During the summers of 2019 and 2020, leaf material of 171 individuals of I. lacustris was collected from 24 locations in Michigan and Wisconsin ( Figure 1) and dried in silica gel. The number of individuals per population ranged from 1 to 12, depending on the suitability of the population for collection. Most individual plants were collected at least 3 m from each other to maximize the possibility of sampling genets, not ramets. Latitude and longitude were recorded for each specimen.

DNA Sequencing
Leaf material was sent to data2bio (www.data2bio.com, accessed on 1 May 2023) for DNA isolation and tunable Genotyping-by-Sequencing (tGBS) to recognize single nucleotide polymorphisms (SNPs) across the populations. Using the restriction enzyme Bsp1286I, paired-end tGBS libraries were created [15] and subsequently sequenced with an Illumina HiSeq X (Illumina Inc., San Diego, CA, USA). Based on all sequence data, consensus reference sequences were generated with CD-HIT-454 [32] after sequencing depth was normalized to 50×, and sequencing errors were corrected using Fiona [33]. Lowquality reads were discarded (PHRED quality < 15 and error rates ≥ 3%) and trimmed, and GSNAP [34] was employed to map reads to the reference sequences based on the following parameters: ≤2 mismatches per 36 bp and less than five total per 75 bp for tails. SNPs were identified based on the following criteria: two most common alleles supported by at least 30% of the aligned bases, at least five unique reads, the sum of the one or two most common alleles covering at least 80% of the aligned reads, and no polymorphisms in the first or last three base pairs of each read. From the SNPs, two datasets were created: MCR90 with up to 10% missing data and MCR50 with up to 50% missing data.

Polyploidy Filtering
Because I. lacustris is a putative polyploid and many population genetic methods assume that species are (at most) diploid, polyRAD [24] was used to identify and filter loci that are diploid and tetraploid. The MCR90 and MCR50 datasets were filtered using the IteratePopStruct command to identify genotypes, and then the H ind /H E statistic [24,35] was employed to recognize diploid loci with H ind /H E < 0.5 and tetraploid loci with H ind /H E > 0.75. Datasets were created for each set of loci ( Table 2). The number of SNPs in the diploid and tetraploid datasets does not equal the value in the initial datasets because of filtering with polyRAD.

Population Genomics
Observed and expected heterozygosity measurements and F-statistics were calculated with hierfstat [36,37], and AMOVA was conducted with poppr [38]. All 24 populations were examined, as were the populations divided into three and four geographic clusters, which are based on the optimal K values from preliminary analyses in fastSTRUCTURE ( Table 2) and patterns of population structure from STRUCTURE and MavericK. fastSTRUC-TURE [39] was employed to identify population structure, including the optimal number of clusters (K), and for these analyses, K = 1-24 were analyzed for the six SNP datasets, using Structure_threader [40], on the Kettering University High-Performance Computing Cluster (KUHPC). Ten replicates were run for each K, with a convergence criterion of 0.000001, a simple prior, and 100 test sets for cross-validation. The CLUMPAK main pipeline, which includes CLUMPP [41] and DISTRUCT [42], was employed to organize, cluster, and visualize the results of independent fastSTRUCTURE analyses, via 10,000 permutations of the LargeKGreedy algorithm [43]. To identify the optimal K value(s), the marginal likelihood that maximizes model complexity from fastSTRUCTURE and the MedMedK, MedMeanK, MaxMedK, and MaxMeanK values determined by STRUCTURESELECTOR [44,45] were examined. These latter four metrics are useful for uneven sampling and are based on recognizing the number of clusters that include, at minimum, one subpopulation. Differences among these metrics are the result of the arithmetic mean or median used and the median or maximum number of clusters identified [45].
For comparison, and given potential variation in ploidy at loci [27], STRUCTURE [25] and MavericK [26] were also used, with Structure_threader, for analyses with the three MCR90 datasets. With STRUCTURE, the following parameters were used with K = 1-24: 1,000,000 steps and 500,000 burnin, with alpha and lambda of 1, and with or without admixture. Ten replicates were run for each K. CLUMPAK and STRUCTURESELECTOR were also used for STRUCTURE analyses, with the best K also determined via the method of Evanno et al. [46] and Ln Pr (X|K). MavericK analyses were run for K = 1-12 with five replicates per K, without admixture, using the following parameters for each replicate: 50,000 steps and 5000 burnin for Markov Chain Monte Carlo (MCMC) sampling and an alpha of 1500 steps and 5000 burnin, with 50 rungs, for thermodynamic integration (TI) sampling, and 100 expectation-maximization repeats. With MavericK, graphs were visualized with R [47], and the optimal K value was determined using TI.
To explicitly include geographical data along with SNPs to investigate patterns of population genetics, tess3r [48] and conStruct [49] were used, and all datasets were analyzed with the former, but only the three MCR90 datasets with the latter. For tess3r, the alternating projected least squares method was undertaken for K = 1-24 for MCR90 and K = 1-12 for MCR50 datasets. Results for each K were visualized with bar graphs and maps in R [47], and the optimal K value was identified using the cross-validation plot for each dataset. For conStruct cross-validation, analyses were conducted with five replicates, for K = 1-8, using 10,000 MCMC iterations sampled every 1000 iterations and a training proportion of 0.5-0.8, depending on the dataset. Subsequently, analyses with K = 3-5 were conducted, with five replicates, using one chain run for 100,000 MCMC iterations sampled every 1000 iterations and with the spatial model.
In addition to analyses for explicit population structure, all datasets were analyzed with principal component analyses (PCA), correspondence analyses (CA), and discriminant analyses of principal components (DAPC) in adegenet [50], principal coordinate analyses (PCoA) in hierfstat [36,37], and isolation-by-distance (IBD) analyses in adegenet using separate Mantel tests for population and individuals, with 999 simulations for the Mantel test. For DAPC for each dataset, the Bayesian Information Criterion (BIC) was used to identify the optimal number of clusters, and cross-validation was employed to explore the most appropriate number of PCs to retain for analysis.
Loci under selection were determined with BayeScan [51] using 100,000 iterations, a burnin of 50,000 iterations, a thinning interval of 10, and a sample size of 5000, and for each analysis, 20 pilot runs were conducted, each with 5000 steps. Loci under selection were visualized in R using F ST values and a false discovery rate of 0.05.
Demographic history and patterns of migration were explored using BA3-SNPs [52,53], DIYABC Random Forest (DIYABC-RF) [54], and abcranger [55], and only the three MCR90 datasets were used for these analyses, with the three and four aforementioned population clusters used (apart from all 24 populations investigated with MCR90 with BA3-SNPs). For BA3-SNPs, the datasets were each run for 50 million Markov Chain Monte Carlo (MCMC) iterations, with 20 million MCMC burnin iterations, and a sampling interval of 2500 iterations, and the initial parameters for allele frequencies, inbreeding coefficient, and migration rates were tuned to vary between 0.2-0.6. For DIYABC-RF, the optimal scenario for patterns of diversification were examined among all 15 arrangements of four bifurcating populations. For each scenario, population size was modelled to vary after populations split and one and two other times for when the second and first populations diverge (Supplemental Figure S17). For analyses, all genetic diversity, F ST distances, Nei's distances, and admixture estimates were selected, and the analyses were run for 15 million simulations with a batch size of 1000. Using the results of the training, a random forest analysis was conducted with abcranger [55] using 1000 trees to identify the number of trees supporting each model and to estimate the parameters of the model, with and without linear discriminant analysis, for partial least squares (PLS) estimation on the optimal model for each dataset.

Conservation Units
Conservation and management units were identified following the three-step method of Funk et al. [16], in which (1) evolutionarily significant units (ESUs) are recognized using all loci, (2) management units (MUs) are delimited with non-outlier loci, and (3) adaptive groups are determined using outlier loci. For the three steps, fastSTRUCTURE [39], PCA, and DAPC were used [50]. The first step was described above for datasets with all loci, and the other two steps were conducted using the same parameters for the three analyses and were based on two datasets (loci under and not under selection as determined via BayeScan [51]) for each MCR50 dataset and the all loci dataset of MCR90 ( Table 2). The optimal K value was identified using STRUCTURESELECTOR [44], the marginal likelihood that maximizes model complexity from fastSTRUCTURE [39], and the BIC for DAPC with adegenet [50]. Based on the results of these analyses, ESUs, MUs, and adaptive groups were identified (Supplemental Figure S18).

Conclusions
The present study provides evidence of genomic variation and local adaptation across the geographic range of the species, which is novel given the negligible genetic diversity previously recovered for I. lacustris [8][9][10]. However, as Van Kley and Wujek [6] stated thirty years ago, "Despite a preference for a somewhat disturbed habitat, Iris lacustris will not grow where the habitat has been destroyed by residential, resort, or industrial development". Therefore, the conservation genetic results are of limited value if management steps are not taken to ensure that individuals of I. lacustris have the opportunity to be successful in situ. This includes not only ensuring intermediate light conditions and limited litter [5,6], but also that as much genetic diversity across the entire geographic range of the species is conserved and managed appropriately. Indeed, given the local genetic diversity recognized among the nine adaptive units, it would be prudent to strive to conserve representatives from these areas. This is particularly important because the populations that are best able to adapt to the changing climate in the Great Lakes region is presently unknown [56]. Therefore, to ensure the longevity of this charismatic species, appropriate long-term management is necessary. Future work that includes the populations of I. lacustris from Ontario can extend the presented results to investigate the ways in which these populations relate to those in the United States. Given the international geographic range of the species, conservation efforts that are binational would be particularly useful.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/plants12132557/s1, Figure S1. Results for Isolation-by-Distance (IBD) for the six datasets. The x-axis is geographic distance, and the y-axis is genetic distance. Figure S2. Structure bar graphs from STRUCTURE for the six datasets analyzed in the present study for K (clusters) = 3-5. Individual ancestry denoted by color. Populations are denoted below each graph. Figure S3. Structure bar graphs from MavericK, without admixture, for the three MCR90 datasets analyzed in the present study for K (clusters) = 3-5. Individual ancestry denoted by color. Populations are denoted below each graph. Figure S4. tess3r maps of population assignation for the six datasets analyzed in the present study for K (clusters) = 3-5. Individual ancestry denoted by color. Figure S5. Maps and bar graphs of population assignation for the three MCR90 datasets analyzed in the present study for K (clusters) = 3-5. Individual ancestry denoted by color. Figure S6. Results for best K from StructureSelector for analyses with fastStructure for (A) MCR90 all loci, (B) MCR90 diploid loci, and (C) MCR90 tetraploid loci. Figure S7. Results for best K from StructureSelector for analyses with fastStructure for (A) MCR50 all loci, (B) MCR50 diploid loci, and (C) MCR50 tetraploid loci. Figure S8. Results for best K from StructureSelector for analyses with Structure without admixture for (A) MCR90 all loci, (B) MCR90 diploid loci, and (C) MCR90 tetraploid loci. Figure S9. Results for best K from StructureSelector for analyses with Structure with admixture for (A) MCR90 all loci, (B) MCR90 diploid loci, and (C) MCR90 tetraploid loci. Figure S10. Results for best K from MavericK, based on thermodynamic integration (TI), for analyses without admixture (A) MCR90 all loci, (B) MCR90 diploid loci, and (C) MCR90 tetraploid loci. Figure S11. Results for cross-validation scores for tess3r analyses for (A) MCR90 all loci, (B) MCR90 diploid loci, (C) MCR90 tetraploid loci, (D), MCR50 all loci, (E) MCR50 diploid loci, and (F) MCR50 tetraploid loci. Figure S12. Results for cross-validation scores for conStruct validation analyses for (A) MCR90 all loci, (B) MCR90 diploid loci, and (C) MCR90 tetraploid loci to identify best K (clusters). Graphs with blue and green dots are for spatial and non-spatial models, respectively, and graph with only blue dots displays predictive accuracy for spatial model with confidence intervals. Figure Table 1 for population assignation to each population. Change in color represents potential change in population size. Scenario 3 is optimal for all and tetraploid loci, and scenario 7 is optimal for diploid loci. Figure S18. Nine Adaptive Units recognized from population genetic analyses using loci under selection. Map of locations sampled in present study. Dark gray entire lines denote division between East, Mid1, Mid2, and West clusters (also recognized as Management Units). The dashed gray line separates Mid1 and Mid2 populations, and Mid includes both groups of populations together. Light gray lines separate Wisconsin (USA), Michigan (USA), and Ontario (Canada). Scale bar, in red, represents 50 kilometers. Table S1. AMOVA results for all datasets. Table S2. K values for the MCR90 datasets for STRUCTURE and Maverick.