Phylogeography and Antioxidant Activity of Proso Millet (Panicum miliaceum L.)

Proso millet (Panicum miliaceum L.) or broomcorn millet is among the most important food crops to be domesticated by humans; it is widely distributed in America, Europe, and Asia. In this study, we genotyped 578 accessions of P. miliaceum using 37 single-sequence repeat (SSR) markers, to study the genetic diversity and population structure of each accession. We also investigated total phenolic content (TPC) and superoxide dismutase (SOD) activity and performed association analysis using SSR markers. The results showed that genetic diversity and genetic distance were related to geographic location and the fixation index (Fst). Population structure analysis divided the population into three subpopulations. Based on 3 subpopulations, the population is divided into six clusters in consideration of geographical distribution characteristics and agronomic traits. Based on the genetic diversity, population structure, pairwise Fst, and gene flow analyses, we described the topological structure of the six proso millet subpopulations, and the geographic distribution and migration of each cluster. Comparison of the published cluster (cluster 1) with unique germplasms in Japan and South Korea suggested Turkey as a possible secondary center of origin and domestication (cluster 3) for the cluster. We also discovered a cluster domesticated in Nepal (cluster 6) that is adapted to high-latitude and high-altitude cultivation conditions. Differences in phenotypic characteristics, such as TPC, were observed between the clusters. The association analysis showed that TPC was associated with SSR-31, which explained 7.1% of the total variance, respectively. The development of markers associated with TPC and SOD will provide breeders with new tools to improve the quality of proso millet through marker-assisted selection.


Introduction
Proso millet (Panicum miliaceum L.) is an annual monocotyledonous grass crop. Archaeological evidence indicates that this crop was first domesticated in northern China about 10,000 years ago [1]. Today, proso millet is widely distributed in the Americas, Europe, and Asia, and is still among the most important food crops worldwide [2]. Proso millet has a short growth cycle and low water requirement; rotation with proso millet can maintain moisture in deep soil layers, control winter weeds, and reduce the occurrence of pests and diseases, making it an ideal rotation crop for winter wheat [3]. When other crops fail to harvest or planting is delayed due to adverse weather, proso millet can be planted as an intercropping crop to reduce economic losses [4]. Proso millet is also widely used in the bird, pet feed, snack food, and wine-making industries [5]. At present, the demand for proso millet is highest for bird feed production [6].
Many previous studies have examined the genetic diversity of proso millet, including in China (using 88 accession core collections selected from 8,515 materials based on 67 proso millet-specific single-sequence repeat (SSR) markers) [5], and Canada and the USA (using 12 accessions based on amplified fragment length polymorphisms [AFLPs]) [7], as well as in six countries using 50 accessions based on 25 SSR markers [8], and 25 countries using 90 accessions based on 100 SSR markers [6]. However, these studies focused on explaining the genetic diversity and clades of local proso millet populations.
The evolutionary origin of proso millet has always been controversial, with domestication centers being proposed in multiple regions of China and Eastern Europe [9], and in a single center in China [10]. Accumulating evidence from archaeological, diversity and phylogenetic studies, among others, suggests that that proso millet originated from carbonized grains about 10,000 years ago; these grains were unearthed at the Cishan site in China [1]. The Dadiwan site in the Loess Plateau, and the Xinglonggou site in Inner Mongolia, have also yielded carbonized proso millet particles believed to be about 8000 years old [11,12]. Another study suggested that the European proso millet first appeared about 7000 years ago [13]. This was revised to 3600 years ago based on direct measurement of the crop remains [14]. However, it is difficult to determine the place of origin of proso millet based on the distribution of its wild ancestors, because this species is easily back-mutated from domesticated crops to weeds [13]. A study using SSR markers to genotype domesticated proso millet in China concluded that genetic diversity in China was highest on the Loess Plateau [10]. One study used 98 accessions from Eurasia and 16 SSR markers to explore the possibility that Eastern Europe is one of several sites of origin of domesticated proso millet [9]. However, due to the limited availability of accessions, comprehensive analysis of genetic diversity is difficult and observer bias can affect diversity analysis.
Several studies have reported benefits of proso millet polyphenols, such as antiinflammatory effects [15], anti-proliferative effects in colorectal cancer [16], liver protection due to syringic acid [17], free radical scavenging by ferulic acid, and antioxidant activity [18]. Proso millet varieties differ in terms of free syringic acid, ferulic acid, chlorogenic acid, caffeic acid, and p-coumaric acid content [19]. Total phenolic content (TPC) mainly depends on the variety, rather than type or color, of proso millet, although TPC and antioxidant capacity are also significantly affected by climatic and environmental conditions [20].
Breeders use genetic markers to assess the phenotypes of target traits in the early stages of the growth cycle; this approach greatly shortens the research cycle and reduces the workload associated with crop breeding. Molecular markers are essential for improving traits that cannot be directly measured. The development of molecular markers for phenols has allowed rapid estimation of the types and quantity of phenolic compounds in individual plants. however, no studies have developed molecular markers of TPC and antioxidants in proso millet, although such markers have been developed for other plant taxa. For example, molecular markers were developed to identify phenolic compounds in wild and cultivated barley [21], and a genome-wide association study identified 11 quantitative traits related to nucleosides in snap bean [22]. Genome-wide association studies have revealed that apple polyphenols are controlled by 4-O-caffeoylquinic acid and procyanidins B1, B2, and C1, and demonstrated the applicability of these markers to marker-assisted breeding. Although polyphenols are important components of the human diet, breeders have not widely regarded them as breeding targets; however, polyphenolic enhancement of nutritional properties may become a future breeding trend. With the improvement of the cultivation environment and the impact of cash crops, locally endemic varieties of proso millet are rapidly disappearing, which greatly affects the diversity of proso millet populations. Therefore, as genetic important resources, proso millet and other small grain crops have been a focus of research.
The objective of this study was to explore the phylogeography of proso millet, compare the TPC and antioxidant properties of 578 proso millet accessions collected in 17 countries of origin of the world, and identify SSR markers associated with these traits for use in marker-assisted breeding to enhance nutritional properties. This study has developed 37 EST-SSR markers suitable for proso millet, which should improve the accuracy of genetic diversity analyses of this species and improve our knowledge of the migration events and degree of differentiation among its evolutionary origin centers.

Genetic Diversity Analysis
In a preliminary experiment, proso millet samples were amplified; there were 481 expressed sequence tag (EST) SSR markers, 37 EST-SSR markers that can be successfully amplified in the DNA of all individuals and have polymorphisms were screened out (Supplementary Dataset 1). Among 578 proso millet accessions collected in 17 countries of origin, 37 pairs of SSR primers were used to amplify 151 alleles (Supplementary Dataset 2).
Genetic diversity analysis showed that, among the 578 germplasms, the number of alleles (Na) per locus ranged from 2 to 7, with an average of 4.0811. The number of amplified genotypes (Ng) ranged from 2 to 17, with an average of 4.7838. Shannon's information index (I) ranged from 0.0803 to 1.436, with an average of 0.387. Observed heterozygosity (Ho) ranged from 0 to 0.0657, with an average of 0.0032, indicating gene flow among individuals and genotypes. The genetic diversity value (H) ranged from 0.0274 (SSR-143) to 0.7331 (SSR-365), with an average of 0.19. The fixation index (Fst) is used to measure the proso millet population genetic differentiation. Among the 37 SSR markers, each marker provided a different ability to distinguish genetic differentiation, ranging from 0.0452 (SSR-458) to 0.6783 (SSR-128), with an average of 0.4545. The polymorphic information content (PIC) value of the SSRs ranged from 0.0273 (SSR-143) to 0.6884 (SSR-365), with an average of 0.1735. The average major allele frequency (MAF) was 0.8740, with a range of 0.3789 (SSR-365) to 0.9862 (SSR-143). Three SSRs showed high PIC values, SSR-203 (0.5086), SSR-232 (0.5944), and SSR-365 (0.6884), i.e., values exceeding the critical value of 0.5. The detailed parameters are listed in Table 1 and the genetic diversity analysis results for each place of origin are listed in Table 2. Genetic resources from Ukraine showed the highest diversity index (0.247 ± 0.247), followed by Russia (0.215 ± 0.183) and South Korea (0.156 ± 0.192).  Pairwise Fst is used to evaluate the degree of genetic differentiation among 17 countries of origin (Table 3)

Population Structure and Phylogenetic Analysis
We performed population structural analysis using the Structure Harvester program [23], with the number of subpopulations (K) ranging from 2 to 20. The results showed that ∆K reached a maximum (16.65391) at K = 3, indicating that this was the most suitable K (Figure 1a), followed by 10 (∆K = 11.57047), 8 (5.402586), and 6 (3.059784). The 578 accessions from various regions were therefore classified into three subgroups (Figure 1b). Individuals of the three subgroups were widely distributed in Eurasia (Figure 1c), of which subpopulations B (orange) and C (azure) were distributed in Eurasia, and subpopulations A (dark blue) in Asia. Phylogenetic analysis resulted in three clusters ( Figure 2); individuals located within these clusters were consistent with those grouped by population structure analysis. We used XLSTAT software v2019 (Addinsoft, Paris, France) to test the significance of the geographic distribution of 3 clusterings of proso millet through the chi-square test, and verify the results of the chi-square test with Fisher's exact test. The results of the chi-square test showed that the overall p-value was <0.0001, which was lower than the significance level of 0.05. Therefore, we reject the null hypothesis that cluster and geographic distribution are independent, and the risk of error is less than 0.0001. Fisher's exact test results that 17 of the 18 p-values are lower than the significance level 0.05. Fisher's exact test also leads to a rejection of the null hypothesis. The results shows that the geographical distribution of 3 clusters of proso millet is significant.

Geographical Distributions and Agronomic Characteristics of Subclusters
To analyze the phylogenetic relationship of 578 proso millets, we did a phylogenetic analysis of the population and marked the country of origin ( Figure 2). According to the delta K in the population structure analysis, three clusters A, B, and C were marked on the phylogenetic tree. However, in cluster A, it can be observed that the origin of a branch is more diverse than other accessions. In cluster C, the origins of accessions of a branch were located in plateau regions and high latitude regions. This show that the population can be subdivided based on the three subgroups. Determine the optimal number of subpopulations according to the delta K value calculated by the Structural Harvester program. Should be further divided into 10 subgroups. However, when K = 10 in population structure analysis, individuals of the same subgroup cannot be clustered in a cluster on the phylogenetic tree ( Figure S1). Similarly, when K = 8, the same phenomenon occurred. However, when K = 6, individuals in the same subgroup can gather into clusters on the phylogenetic tree. We analyzed the agronomic traits data of 6 clusters and found that some traits have significant differences between clusters.
In cluster A, branch A1 contained 174 germplasms; 98% of individuals were from South Korea. Branch A2 was distributed throughout Asia ( Figure 2). Comparison of the agronomic traits of the two branches showed that the heading and harvest dates of branch A2 were earlier (heading = 27.13 days after sowing on average) than those of branch A1 (4.86 days later than A2) ( Table 4). The number of branches per plant, panicle length, panicle width, stem diameter, and plant height were all lower in branch A2 than branch A1, with the most significant difference observed in plant height. In cluster B, branch B1 showed high distribution frequency in Europe, whereas branch B2 was widely distributed in Eurasia. The heading and harvest dates of branch B1 were earlier than those of branch B2. The number of branches per plant, panicle length, panicle width, stem diameter, and plant height were lower in branch B1 than in branch B2.
In cluster C, branch C2, which included 94 germplasms, was distributed in highlatitude and high-altitude regions, whereas branch C1 was widely distributed in Eurasia. The heading date of branch C2 (27.06 days after sowing) was earlier than that of branch C1 (30.11 days after sowing); however, the harvest time of branch C2 was the latest among all branches (98.8 days after sowing). Thus, the germplasms of branch C2 have longer reproductive growth periods. Panicle length, panicle width, stem diameter, and plant height were lower in branch C2 than in branch C1. The detailed agronomic trait data are provided in Supplementary Dataset 1.
Therefore, we combined SSR marker, geographic distribution, and trait data to divide the 578 genetic resources into six clusters, such that clusters A1, A2, B1, B2, C1 and C2 were renamed as clusters 1-6, respectively.
The population structure analysis showed two distinct subpopulations for K = 2: cluster 5, which was widely distributed but mainly found in East and Southeast Asia, and cluster 4, which was also widely distributed ( Figure 3 and Figure S2). At K = 3, cluster 5 was distinct from cluster 1, which is mainly distributed in Korea. At K = 4, cluster 5 was distinct from cluster 6, which was distributed in high-latitude and high-altitude regions such as Mongolia, Russia, and Ukraine. At K = 5, clusters 4 and 1 became distinct from cluster 2, which was distributed in Asia. At K = 6, cluster 3, which was distributed in Ukraine, Russia, and France, became distinct from cluster 5. We also performed a significant analysis on the geographic distribution of 6 clusters proso millet, and the results showed that the geographic distribution of 6 clusters proso millet was also significant. And because the higher the chi-square statistic, the lower the p-value, the geographic distribution of 6 clusters proso millet is more significant than that of 3 clusters proso millet. As the number of subgroups (K) increases (K = 2 to K = 6), the population structure changes show that: cluster 1 is from cluster 5; cluster 6 is from cluster 5 and cluster 1; cluster 2 is from cluster 1 and cluster 4; cluster 3 is from cluster 5, cluster 1, and cluster 6 ( Figure S2).

Gene Migration Analysis
Through comparing the migration rate (M) analysis results between the 6 clusters, we found the asymmetry of the migration rate between each two clusters. The direction of gene flow is considered to flow from clusters with high migration rates to clusters with low migration rates.  (Supplementary Dataset 3). The population structural analysis showed that as K increased, the changing trend in population structure remained consistent with the direction of gene flow among the six clusters ( Figure S2).
We analyzed the gene flow direction from the origin of each subpopulation according to gene flow asymmetry; the detailed results are shown in Figure 4.
We performed Pearson's correlation analysis of TPC and SOD (Figure 5b). The correlation coefficient was 0.3, indicating a moderately positive correlation. The significance level alpha was set to 0.05, p < 0.0001, and the correlation coefficient is significant. The r 2 value indicated that the TPC of proso millet accounted for only 12% of the total variance in antioxidant activity. Differences in the TPC or content of other components may have had a greater impact on antioxidant activity.
The association analysis results showed that TPC was significantly associated with SSR-31 (p = 1.88E-04). SSR-31 explained 7.1% of the total phenotypic variation, comprising three genotypes. The average TPC values of three genotypes of each SSR are listed in Table 5.

Gene Flow and Geographic Distributions
The gene flow analysis results did not identify a single domestication center for proso millet, as cluster 4 did not clearly originate from cluster 5. Our results could reflect either single or multiple domestication centers, as population structural analysis cannot assume that K = 1 [9]. Our gene flow analysis results were insufficient to demonstrate that cluster 4 originated from cluster 5; they only showed that gene flow from cluster 5 to cluster 4 was greater than that from cluster 4 to cluster 5. According to the local diversity of each cluster (Table S1) and the gene flow analysis results, we determined the primary or secondary center of origin of each clusters. Nepal (and its surrounding areas) was the secondary center of origin for cluster 6. Our phylogenetic, population structure, and gene flow analysis results indicated that cluster 6 originated from cluster 5, and that the secondary center of origin was in Nepal and its surrounding areas, in high-latitude and high-altitude areas. This is similar to the three gene microcenters detected in Turkey, where the cultivation environment was exceptionally well-suited to wheat domestication [24]. In South Korea, cluster 1 was found to have derived from cluster 5, which is a unique germplasm resource. A previous study also identified Japan as an origin point for proso millet cluster 1 [9].
Cluster 1 is considered to be a unique proso millet germplasm resource that was domesticated in South Korea. Although one germplasm resource was collected in Russia, and one in Thailand, while two germplasms were collected in India, gene flow analysis showed that they were all derived from the Korean germplasm. Cluster 2 is distributed only in Asia. Gene flow analysis showed that these two clusters were introduced from China to South Korea, Mongolia, Uzbekistan, and India, and then from India to Russia and Thailand. Cluster 3 is widely distributed in Eurasia, with a diversity index of 0.143174. However, our gene flow results showed that cluster 3 originated in the fertile crescent of Turkey and then spread to France and India, from which it subsequently spread to Central Asia (Uzbekistan and Turkey) and finally to East Asia. In cluster 4, the highest genetic diversity was detected in germplasms native to Turkey; however, gene flow analysis showed that Turkey's seed resources migrated from China to Uzbekistan, and then to Turkey. Cluster 4 originated in China and was introduced via Mongolia to Russia and North Korea. Germplasms of China and Mongolia also spread to Uzbekistan, and then from Uzbekistan to India, India to Thailand, Uzbekistan to Turkey, and Turkey to Ukraine. Cluster 5 occurred frequently in East, Southeast, and South Asia, and evolved according to a network pattern. Cluster 5 had the highest diversity; gene flow analysis pointed to China as the center of domestication, from which it migrated to Europe (Ukraine) via different routes, and then to Mongolia and finally back to China. Cluster 6 showed high abundance in high-latitude and high-altitude regions; it originated in Nepal and was then introduced into Ukraine, Russia, and Mongolia, and then into South Korea from Ukraine.
By combining the phylogeny, diversity, population structure, M, and gene flow analysis results, we obtained expansion paths for each cluster of proso millet that were consistent with a previously established archaeological map of the agricultural origins and migration of Neolithic and formative cultures [25]. In this study, we provided SSR markers combination of each cluster to identify the cluster where an individual is located (Table S2). The highest diversity was found in clusters 4 and 5, and the longest branches (indicating the longest genetic distance from other germplasms) were found in clusters 3 and 5. Regions with high diversity were not identified as regions of origin, likely because gene introgression from other proso millet clusters resulted in high diversity. In a future study, we will amplify and sequence these genotypes to identify the markers with the greatest diversity (SSR-203, SSR-232, or SSR-365) as plant DNA barcodes, to detect other known antioxidants. Then, we will use these markers to accurately determine genotype composition and compare it with a neutral model to shed light on the population expansion associated with bottleneck events.
In this study, the population genetic diversity is low, and the genetic differentiation is high. We believe that the reason is that the proso millet is an inbred plant. Self-crossing reduces the effective population size and effective recombination rate. Compared with outcrossing, it directly leads to a decrease in polymorphism and an increase in linkage disequilibrium [26]. The increase in isolation between populations also directly stems from selfing or indirectly from evolutionary changes, leading to greater differentiation of molecular markers than during outcrossing. The lower effective recombination rate increases the possibility of free-riding and further reduces the internal diversity of inbreds, thereby increasing their genetic differentiation [27].
In addition, in the process of agronomic traits data collation, we observed that the standard deviation within each sub-cluster was very high ( Table 4). The degradation of cultivars of proso millet to wild species may be the main reason for the large standard deviation. In this study, the population of 578 accessions is composed of wild species and landraces. Landraces degenerate into wild species, some genotypes are preserved, and may still be clustered with local landraces, but agronomic traits show differences. Feral derivatives of crop varieties may show a similar phenotype to that of the crop ancestor [28].

Association Analysis
The results showed high-level LD. This may be due to the mating system (selfing) of proso millet affecting the pattern of LD [29]. We arranged the TPC and SOD data corresponding to each accession on the outer circle with a simple bar ( Figure S3). We observed that two red clades had higher TPCs, with average values of 25.4 and 23.2 µg/g. Blue clades had lower TPCs, with average values of 25.4 and 23.2 µg, respectively. Each clade can be clustered together because some SSR markers have the same amplification length. Therefore, we speculate that certain markers may be linked to the quantitative trait loci (QTL) associated with TPC. Comparison of the genotypes in these 4 clades showed high TPC in the 261-bp SSR-195 marker, and low TPC in the 269-bp SSR-195 marker (Supplementary Dataset 4). To further explore the relationships between these phenotypes and markers, we performed an association analysis of the phenotype and genotype data. Because proso millet is a selfed species, in the results of genetic differentiation analysis, a high degree of differentiation was observed in the population. This may lead to false positives in the association results. We choose to use the mixed linear model (MLM) for correlation analysis. In the MLM (Q + K) model, the population structure matrix (Q) and the kinship matrix (K) are used as random effects to control the false positives. SSR-31 associated with TPC explained >7.1% of the variance in TPC, indicating that there may be QTLs on both sides of the marker.
There have been a large number of reports on the antioxidant mechanism of total phenols. The number and position of hydroxyl groups in phenolic compounds, and the nature of the substituents on the aromatic ring, determine the antioxidant capacity of plant extracts [30]. Whole-genome sequencing of the proso millet genome has been completed [31]. In future research, we will map the positions of SSR-31 identified in this study on the proso millet genome and develop flanking markers to gradually narrow the target range and determine the main proso millet genes influencing TPC and oxidation capacity, and clarify the types of phenolic compounds and antioxidant mechanisms in this species. This will provide new opportunities for high-quality proso millet breeding.

Plant Materials and Genotyping
In this study, Among 578 proso millet accessions collected in 17 countries of origin were obtained from the National Agro-biodiversity Center of the Rural Development Administration (RDA), Republic of Korea (http://genebank.rda.go.kr, accessed on 5 October 2021). The leaf tissue was sampled 5 weeks after sowing. Total DNA was extracted using the DNeasy®Plant Mini kit (Qiagen, Valencia, CA, USA) according to the manufacturer's instructions. The DNA extraction used freeze-dried leaf tissue. The DNA was dissolved into 100 µL of water. Genomic DNA was quantified using a Nanodrop / UVS-99 instrument (ACTGene, Piscataway, NJ, USA) and determined the ratio of A260/A280 nm and dilute with water to 100 ng/µL. DNA quality was verified on 1% agarose gel. Store at −20 • C.
In a preliminary experiment, eight germplasms collected in regions with the largest differences in agronomic traits were selected to test the polymorphism of the marker-amplified products. We trialed 481 EST-SSR markers developed by Ali et al. [32]. These markers were originally used for the genetic diversity analysis of little millet (Panicum sumatrense). Among 481 little millet EST-SSR markers, 37 SSR markers with polymorphic amplification products were screened out ( Table 6). The SSR makers were amplified in a total volume of 20 µL containing 50 ng of genomic DNA, 2 µL of each EST-SSR primer (10 pmol), 4 µL of 5× reaction buffer (Inclone Co.), 1 U of Taq DNA polymerase (Inclone Co.), 1.6 µL of dNTP (2.5 mM), and 11 µL of nuclease-free water. The amplification program was executed on GeneTouch thermal cycler (Bioer, Zhejiang, China) under the following conditions: Initial denaturation at 95 • C for 5 min, followed by 35 cycles at 95 • C for 40 s, 30 s at the annealing temperature, 30 s at 72 • C, and a final 10 min extension at 72 • C. The PCR products were separated using Fragment Analyzer™ 96 (Advanced Analytical Techologies, Ankeny, IA, USA) and the results were digitized using PROSize 3.0 software (Advanced Analytical Technologies, Ankeny, IA, USA).

Genetic Diversity Analysis
We used PowerMarker v3.25 software and PopGene32 v1.31 software to calculate the following genetic diversity indicators: observed Na, Ho, Nei's gene diversity (H), I, and polymorphism information content (PIC). Geographic differences were evaluated using PowerMarker software following the estimation of Fst between geographic regions.

Population Structure Analysis
We used Structure software [33] to calculate the population structure for values of K ranging from 2 to 20, and used SIMCA v14.1 software (Umetrics, Umeå, Sweden) to perform orthogonal partial least-squares discrimination analysis (OPLS-DA) on 7 agronomic traits (number of branches per plant, heading date, harvest time, panicle length, panicle width, stem diameter, plant height (Supplementary Dataset 1)). Structure Harvester software [23] was used to calculate the most likely number of subspecies. Visualization was performed using CLUMPAK software [34].

Phylogenetic Analysis
We used 37 pairs of polymorphic SSR marker-amplified product length data for phylogenetic analysis, performed with PowerMarker software. MEGA X v10.0.4 (Kumar et al. 2018) software was used to draw a phylogenetic tree based on the 37 SSR markers am-plified products size data of 578 individuals by the neighbor-joining method. The tree file was imported in nwk format into iTOL v5 [35], which was used to further visualize the phylogenetic tree. We used chi-square test and Fisher's exact test in XLSTAT software v2019 (Addinsoft, Paris, France) to analyze the significance of clustering results and geographic distribution.

Effective Population Size and Migration
Migrate-n v4.4.3 software [36] was used to estimate gene flow. The data were entered into a microsatellite data model, the sampling increment was set to 1000, and the number of steps in the chain was set to 5000. We used a Brownian motion model to calculate theta values and effective mobility in two directions, and Bayesian analysis to calculate posterior distribution. Run model 2 to analyze the gene flow between six clusters, and gene flow between the origins of each cluster was then analyzed by run model 1.

Extraction and Determination of TPC
Phenolic compounds were extracted using an Ase-200 Accelerated Solvent Extractor (Dionex Corp., Sunnyvale, CA, USA). Briefly, 1.5 g of lyophilized and ground whole seed sample was mixed with 37.5 mL of 70% methanol, and the mixture was subjected to extraction for 40 min. Then, we centrifuged the sample at 4500 rpm for 10 min and the supernatant was collected. The obtained total phenol extract was stored at 4 • C.
TPC was determined using the Ainsworth colorimetric method and Folin-Ciocalteu reagent [37]. Initially, 0.2 mL of the sample extract was mixed with 2 mL of 2% Na 2 CO 3 solution, and the mixture was incubated for 2 min at room temperature. Then, 0.2 mL of Folin-Ciocalteu reagent was added, and the absorbance was recorded at 750 nm after 30 min of incubation at room temperature. Known concentrations of gallic acid (20-100 ppm) were used to establish a standard curve and TPC was determined as µg GAE/g of dried seed weight from triplicate measurements.

SOD Activity Measurement
SOD activity was determined using the EZ-SOD Assay Kit (DoGenBio Co., Ltd., Seoul, Korea) following the improved Marklund method [38]. Initially, 1 mL of a 2-(4-iodophenyl)-3-(4-nitrophenyl)-5-(2,4-disulfophenyl)-2H-tetrazolium monosodium salt (WST) solution was mixed with 19 mL of buffer solution to prepare a WST working solution, and 15 µL of xanthine oxidase was mixed with 2.5 mL of dilution buffer to prepare an enzyme working solution. During analysis, 20 µL of sample solution was mixed with 200 µL of WST and 20 µL of enzyme working solution in 96-well plates. Then, the plates were incubated for 20 min at 37 • C and absorbance was measured at 450 nm using an Eon microplate spectrophotometer (BioTek Instruments, Inc., Winooski, VT, USA). SOD activity (inhibition rate) was calculated using the following equation: where OD is the absorbance, blank 1 contains distilled and deionized water (ddH 2 O) instead of sample solution, blank 2 contains dilution buffer instead of enzyme solution, and blank 3 contains dilution buffer and ddH 2 O instead of enzyme and sample solution.

Association Analysis
Run packages ("corrplot") in R v4.1.0 to analyze the correlation between TPC and SOD. The LD squared allele-frequency correlations (r 2 ) was analyzed based on 1000 permutations using TASSEL v5.2.64 [39]. To identify SSR markers related to the target traits, we performed an association analysis of 37 SSR markers containing 177 proso millet genotypes with two sets of phenotypic data (TPC and SOD). However, detected associations in highly differentiated highly selfed species may be false positives. This may be due to the population structure and the existence of kinship between accessions. In the case of large differences in phenotypic frequency between different clusters, it is possible that the markers are only associated with clusters, and not associated with the quantitative trait loci. This analysis was performed using a mixed linear model (MLM) in TASSEL software [39]. MLM (Q + K) to control both population structure (Q) and kinship (K), to avoid false-positive results. Determine the population structure (Q) based on the population structure analysis final results. Kinship matrix (K) is provided by kinship analysis in Tassel. Markers were considered to be related to a trait if p < 0.01.

Conclusions
In this study, 37 SSR markers selected were used to genotyping 578 accessions of proso millet. The population shows a low level of diversity and high genetic differentiation. In population structure analysis, 578 proso millet accessions were divided into 3 clusters. Combining geographic distribution and trait data to distinguish more significance 6 clusters. Based on gene migration analysis, the formation process of the 6 clusters was estimated. Similarly, based on the asymmetry of the migration volume, the propagation paths of the 6 clusters in Asia and Europe are estimated. The accessions on cluster 1 are unique to Korea. Turkey may be the secondary center of origin and domestication of this genotype (cluster 3). We also found a cluster domesticated in Nepal (cluster 6), adapted to high latitude and high altitude cultivation conditions. We also studied the total phenolic content (TPC) and superoxide dismutase (SOD) activity and used SSR markers for correlation analysis. SSR-31 interpretation of TPC variables was 7.1%.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/ 10.3390/plants10102112/s1, Figure S1: Population structure diagrams when the numbers of subpopulations (K) were 8 and 10, Figure S2: Trends in population structure as K increased from 2 to 6, Table S1: Genetic diversity of each cluster and region, Table S2: Genetic diversity and combinations of six clusters, Supplementary dataset 1: Information on the 578 accessions used in this study, Supplementary dataset 2: Allele size and frequency, Supplementary dataset 3: Effective population size and migration rate results, Supplementary dataset 4: Genotype comparison of clusters with high and low total phenolic content (TPC) and superoxide dismutase (SOD) activity.