Genome-Wide Association Study for Plant Architecture and Bioenergy Traits in Diverse Sorghum and Sudangrass Germplasm

: Sorghum is an important grain, forage, and bioenergy crop. The objective of this study was to identify genetic signals associated with plant architecture and bioenergy traits in sorghum and sudangrass germplasm through a genome-wide association study (GWAS). Plant height (HT), tiller number (TN), internode number (IN), stem diameter (SD), panicle length (PL), panicle weight (PW), reducing sugar (RS) content, Brix, and protein (PRO) content were assessed in 300 germplasm consisting of grain sorghum, sweet sorghum, sudangrass, sweet sorghum-sweet sorghum recombinant inbred lines (RILs) and sudangrass-sudangrass RILs grown in three di ﬀ erent environments over two years. Large variations of phenotypic traits were observed in the population panel. The heritability of traits were all higher than 0.5, ranging from 0.52 (PRO) to 0.92 (HT) with an average of 0.76. The population exhibited three population structures (Q) and minor relative kinship (K), assessed by using 7982 single-nucleotide polymorphisms (SNPs). After controlling Q and K, GWAS identiﬁed 24 SNPs that were signiﬁcantly associated with traits, including three SNPs with HT, four with TN, four with PL, three with Brix, and ten with RS. Of them, seven SNPs were novel signals that were not identiﬁed previously, including one for HT, one for TN, one for Brix, and four for RS. The putative candidate genes involved in brassinosteroid regulatory pathway, auxin biosynthesis, carbohydrate metabolism, and sugar transport were identiﬁed underlying the signiﬁcant SNPs. Identiﬁcation of SNP signals and related candidate genes would enrich the current genomic resource for further molecular breeding aimed at improvement of food, feed, and biofuel productions of sorghum. sudangrass, sweet sorghum RILs, and sudangrass RILs and conducted GWAS for plant architecture and bioenergy related traits across three climate regions in China. The research aimed at the identiﬁcation of signals for a better understanding of the genetic control of phenotypic traits in sorghum and sudangrass germplasm. The results would beneﬁt marker-assisted breeding for improved food, feed, and biofuel productions in sorghum plants.


Introduction
Sorghum (Sorghum bicolor) is an important crop worldwide. Grain sorghum is a major world staple crop for human food and livestock feed [1,2], and is ranked as the fifth most important grain crop worldwide. Sweet and biomass sorghums have also been increasingly receiving attention for bioenergy production due to their high sugar content in the stalk for potentially producing more ethanol [3,4]. Sudangrass (Sorghum sudanese) is an annual forage crop with a strong capacity for tillering and regeneration, and has softer stems and leaves than forage maize (Zea mays) and sorghum [5]. Both sorghum and sudangrass have the same diploid chromosomes (2n = 2x = 20), and crosses can be made between these two species. The sorghum-sudangrass hybrids (S. bicolor × S. sudanense) show a wide range of advantages including high yield, good lodging resistance, drought and cold tolerance,

Plant Materials and Growing Conditions
Three hundred diverse germplasm lines were used for the experiment, consisting of 22 grain sorghum, 165 sweet sorghum, 63 sudangrass, 8 sweet sorghum-sweet sorghum RILs, and 42 sudangrass-sudangrass RILs (Supplemental Table S1). Seeds of each line were planted in three locations in China, including Xinjiang province (44.54 • N, 88.17 • E) and Inner Mongolia province (45.54 • N, 108.31 • E) in 2014 and Liaoning province (42.19 • N and 120.80 • E) in 2015. Seeding and harvesting time, and environmental and soil conditions for each location are listed in Supplemental  Table S2.

Phenotypic Traits and Repeatability
Plant height, tiller number, internode number, stem diameter, panicle length, panicle weight, reducing sugar, Brix, and protein content were measured in all germplasm lines at the reproductive stage across all locations in both years. Plant height was determined from the soil surface to the top of the panicle. Tiller number was recorded by manual separation from the main stalk, and internode number was counted on the main stalk. Stem diameter was measured using calipers. Panicle length was measured from the base of the leaf scar to the top of the panicle and dry weight was recorded. The third internode area from the top of the main stalk was collected for measuring Brix using a Brix meter (PAL-1, ATAGO Co., Ltd., Tokyo, Japan) and reducing sugar using Fehling's solution in an SGD-IV automatic reducing sugar analyzer (Shandong Acad. Sci., Jinan, China). Brix (%) measures the sugar content of an aqueous solution and is commonly used as an indicator of sugar content in stalk juice. Crude protein was measured using the main stalk. Tissue was placed in an oven at 105 • C for 30 min and then at 65 • C until reaching constant dry weight. The dry tissue was ground into powder and sieved by 0.4 mm. Protein was determined by the Dumas combustion method using a rapid N/protein analyzer (Elementar Co., Langenselbold, Germany).
The heritability (h 2 ) of phenotypic traits across three locations (environments) was calculated using PROC MIXED (SAS Institute, Version 9.1, Cary, NC, USA). The h 2 was calculated as follows: h 2 = σ 2 g / (σ 2 g + σ 2 ge /l + σ 2 e /rl), where σ 2 e is the variance component for genotypes, σ 2 ge for genotype-by-environment, σ 2 e for error; r is the number of replications, and l is number of environment [42]. Based on the outcome of h 2 , least square means were estimated for each trait per germplasm line across three locations.

Experimental Design and Statistical Analysis
All locations were a randomized complete block design with three replications (blocks). Each block contained 300 germplasm lines, and they were randomly assigned into each block. Twenty-five plants of each line were planted in a single 5-m-long row, with 50-cm row spacing, 20-cm plant spacing in each block. Three representative plants from each row were chosen for sampling in each block as one replication. The analysis of variance was performed at a significance level of 0.05 using PROC MIXED in SAS [43], with accession and environment as fixed factors and replications as a random factor. Least squares means for each trait generated by PROC MIXED were used for testing trait correlation, trait differences among structural groups, and gene-trait association analysis. Pearson correlation among traits was performed using PROC CORR in SAS.

Genotyping
The population was genotyped using restriction site-associated DNA (2b-RAD) with type IIB (BsaXI) restriction enzyme; the detailed procedure was described previously [44,45]. Briefly, the library was established and sequenced at Illumina Hiseq Xten platform to produce nucleotide paired-end reads, with an average of~6.8 M reads per sample generated. The reads were filtered by removal of reads with adaptor, low quality reads, and reads with unidentified nucleotide sequence. The clean reads were aligned to the Sorghum bicolor genome version 1.4 using SOAP [46], and a total of 91,197 SNPs were discovered. The most informative 7928 SNPs were retained across 10 chromosomes for this study. The pipeline parameters included minimum SNP call rate of 80%, maximum observed heterozygosity of 0.5, and minor allele frequency of 1%. Population structure (Q) was determined by using STRUCTURE 2.3.2 software [47] using filtered SNP markers generated for this study. Briefly, the structure was run 10 times by setting pre-defined k (the number of population groups) from 1 to 8 using admixture models, with 10,000 burn-in time and 10,000 iterations of Markov chain convergence for each run. Among the 10 runs, the one with the highest likelihood value was selected to assign the proportion of membership for each accession [48]. Relative pairwise kinship (K) was calculated using TASSEL 5.0 [49]. The PCA was analyzed with SNP markers using the TASSEL 5.0, and the results were visualized using R (https://www.r-project.org). The LD was also calculated using TASSEL 5.0. The generated r 2 values averaged with each 50 kb were plotted against the physical distance among each pair of SNPs.

Genome-Wide Association Study (GWAS) and Candidate Gene Identification
Quantile-quantile (Q-Q) plots for model comparisons of simple linear (S), linear model implemented with population structure (Q), mixed model implemented with relative kinship (K), and Q+K across all traits were generated using 'qqman' package in R. The best fit model that represents the best agreement between the observed and expected −log10 (p) value for a given trait for gene-trait associations was selected for the association analysis of each trait. GWAS analysis was conducted with phenotypic data across three environments as well as with data of each individual environment using TASSEL 5.0. Associations were considered to be significant only at a p-value lower than the p threshold -value, calculated using p threshold = 0.05/N (N = the number of SNPs), based on a Bonferroni correction for multiple comparisons. The positions of all significant SNPs were compared with those previously published signals through GWAS and linkage mapping studies with sorghum QTL atlas [50]. The SNP locations were identified using genome version 1.4, but their positions in comparison with genome version 3.1 were also provided. Using the S. bicolor genome assembly, candidate genes containing SNPs or adjacent to SNPs extended to 100 kb (based on LD) were identified. Candidate genes annotated using genome version 1.4 were listed and their corresponding gene names in genome version 3.1 were also provided. The annotated genes putatively involved in plant growth and development, and hormone and carbohydrate metabolism were given high priority for identifying candidate genes underlying a trait.

Analysis of Variance, Trait Variation and Heritability
Genotypes had significant effects on plant height (HT), tiller number (TN), internode number (IN), stem diameter (SD), panicle length (PL), panicle weight (PW), reducing sugar (RS), Brix and protein (PRO) across three environments (Table 1). Significant environment effects and significant genotype by environment interactions were also noted in all traits ( Table 1). All the trait values varied considerably across genotypes (Table 1). Heritability (h 2 ) was calculated across 300 genotypes and three environments. The h 2 values of all traits were higher than 0.5, ranging from 0.52 (PRO to 0.92 (HT) with an average of 0.76 ( Table 1). The high heritability over testing the environments allowed least square means of individual traits to be calculated and used for association analyses of markers and traits.

Trait Correlations
There were 13 positive and 15 negative correlations among the traits across the population (Table 2). Specifically, HT was positively correlated with TN, IN, RS and Brix but negatively correlated with SD, PW, and PRO. TN was only positively correlated with PL but negatively correlated with IN, SD, PW, Brix and PRO. IN was only negatively correlated with PL but positively correlated with SD, PW, RS, Brix and PRO. Positive correlations were also observed between SD and PW, SD and PEO, PW and PRO, RS and Brix. Negative correlations were also identified between SD and PL, SD and RS, RS and PRO, and Brix and PRO. The PL was also negatively correlated with RS and Brix.

Population Structure and Its Effects on Traits
Population structure was analyzed using 7928 SNPs generated from the population. The combination trends of the likelihood values of [LnP(D)] and the ∆K calculated for each K showed the 300 genotypes could be assigned into three subgroups (G1, G2 and G3) ( Figure 1A). There were 78, 90, and 132 genotypes from G1, G22 and G3, respectively ( Figure 1B). G1 mainly contained sweet sorghum and sudangrass; G2 consisted of sweet sorghum, sudangrass, sudangrass inbred lines and sweet sorghum inbred lines, and G3 mainly included sweet sorghum and grain sorghum.
Three subgroups differed in trait values (Table 3). G2 had significantly higher TN, and PL and lower IN, SD, PW, RS, Brix, and PRO than both G1 and G3. G2 also had higher HT than G3 but not G1. Comparing G1 and G3, G1 had higher TN and PW but lower IN, SD, and Brix values than G3. No differences in HT, PL, RS and PRO were observed between G1 and G3.

PCA, Relative Kinship and LD
PCA revealed that grain sorghum and sudangrass germplasm were well-separated into different clusters, while grouping for sweet sorghum lines was not apparent as they were scattered into different clusters ( Figure 2A). Approximately 56.8% of the pairwise kinship estimates were zero, while 8.6% were between zero and 0.05, 6.8% between 0.05 and 0.1, and 9.3% between 0.1 and 0.2, indicating that approximately 82% of the estimates were < 0.2 ( Figure 2B). Across all 300 samples, a rapid decline in LD was observed within 100 kb with r 2 = 0.18, but LD decay extended to 500 kb with r 2 = 0.10 ( Figure 2C). The LD decay pattern of 250 diverse germplasm was almost identical to that of 300 samples, while LD decay for RILs (50) was r 2 = 0.50 within 100 kb and extended to 500 kb with r 2 = 0.40 ( Figure 2C).   Three subgroups differed in trait values (Table 3). G2 had significantly higher TN, and PL and lower IN, SD, PW, RS, Brix, and PRO than both G1 and G3. G2 also had higher HT than G3 but not G1. Comparing G1 and G3, G1 had higher TN and PW but lower IN, SD, and Brix values than G3. No differences in HT, PL, RS and PRO were observed between G1 and G3. Means followed by the same letter or not followed by any letter within a column for a given treatment were not significantly different at p < 0.05.

PCA, Relative Kinship and LD
PCA revealed that grain sorghum and sudangrass germplasm were well-separated into different clusters, while grouping for sweet sorghum lines was not apparent as they were scattered into different clusters (Figure 2A). Approximately 56.8% of the pairwise kinship estimates were zero, while 8.6% were between zero and 0.05, 6.8% between 0.05 and 0.1, and 9.3% between 0.1 and 0.2, indicating that approximately 82% of the estimates were < 0.2 ( Figure 2B). Across all 300 samples, a rapid decline in LD was observed within 100 kb with r 2 = 0.18, but LD decay extended to 500 kb with  Means followed by the same letter or not followed by any letter within a column for a given treatment were not significantly different at p < 0.05.

GWAS
The quantile-quantile (Q-Q) plots verified the adequate model for controlling false positives for marker and trait association ( Figure 3). The S, Q, K, and Q + K models were implemented and compared by examining the results between the observed and expected −log 10 (p). The results showed that the Q + K model was more suitable for analyzing genome-wide association for all phenotypic traits.

GWAS
The quantile-quantile (Q-Q) plots verified the adequate model for controlling false positives for marker and trait association (Figure 3). The S, Q, K, and Q + K models were implemented and compared by examining the results between the observed and expected −log10 (p). The results showed that the Q + K model was more suitable for analyzing genome-wide association for all phenotypic traits.

Candidate Genes
The putative candidate gene was identified within 100 kb of each significant SNP ( Table 5). The selected genes, homologs to those in Arabidopsis, seemed to be interesting and might play a role in affecting a particular trait. Three candidate genes were identified for HT, including Sb03g026400 encoding an auxin-responsive protein on chromosome 3 within 40 kb of a SNP, Sb08g019600 encoding a protein kinase on chromosome 8 within 41 kb of a SNP, and Sb09g027683 encoding an indole-3-butyric acid response protein on chromosome 9 within 0.26 kb of a SNP. Four candidate genes, Sb02g028870 on chromosome 2, Sb06g017100 on chromosome 6, Sb08g022790 on chromosome 8, and Sb09g022660 on chromosome 9 were detected for TN, which were about 3.7-, 0.47-, 33.4-, and 72.5-kb away from the target SNP, respectively. There were four potential candidate genes for PL, including Sb01g012660 encoding a protein kinase family protein, Sb02g000960 encoding WRKY57 transpiration factor, Sb08g019140 encoding a RNA binding protein, and Sb10g002190 encoding an exordium-like 3 protein involved in the brassinosteroid regulatory pathway at 0.38-, 0.69-, 59.4-, and 52.3-kb from the target SNP, respectively. Three genes were detected for Brix, including Sb05g000330 encoding a mitochondrial substrate carrier protein, Sb06g014370 encoding a hydroxyproline-rich glycoprotein protein and Sb08g015300 encoding a pentatricopeptide protein, which were approximately 76.9-, 64.5-, and 9.6-kb away from the target SNP, respectively. Ten candidate genes could affect RS. Of them, the notable homologs for RS were Sb01g028790 encoding a MYB transcription factor, Sb01g029400 encoding a glucosyltransferase, Sb01g029590 encoding a trehalose-6-phosphate phosphatase, Sb02g030990 encoding glycoside hydrolase 9B7, and Sb02g036310 encoding a sugar transporter. These genes were located approximately 85.1-, 79.7-, 33.8-, and 5.1-kb from the target SNP on chromosomes 1 and 2, respectively.  Across three environments, GWAS detected 24 significant SNPs associated with HT, TN, PL, RS, and Brix after controlling Q and K (Figure 4, Table 4). Specifically, one SNP at S3_53230521 on chromosome 3, one at S8_50424256 on chromosome 8, and one at S9_56656748 on chromosome 9 were associated with HT, explaining HT variations by 9.5%, 9.0% and 11.3%, respectively. Four SNPs (S2_63990205, S6_46267204, S8_54924780 and S9_52354291) located on chromosomes 2, 6, 8, and 9 were associated with TN, explaining TN variations by 9.1%, 10.1%, 9.4%, and 8.7%, respectively. Four     GWAS was also analyzed with phenotypic data from individual environment (Env). There were 19, 21, and 26 significant SNPs associated with various traits for Env 1, Env 2, Env3, respectively (Supplemental Table S4). Of them, 16 SNPs were considered as novel markers for Env 1, 12 for Env 2, and 10 for Env 3 (Supplemental Table S4). Among the total 66 significant SNPs, 14 markers were also detected in the analysis across three environments, including three (HT, TN, and RS) for Env 1, four (two TN and two PL) for Env 2, and seven (HT, Brix, five RS) for Env 3. GWAS produced no associations of IN, PW, and RPO across three environments, but significant SNPs for these traits were shown on at least one of the single environment (Supplemental Table S4).

Trait Variations and Correlations
Large phenotypic variations were observed in the diverse collection panel. The range of HT, TN, IN and PL values were comparable to the previous study identified in 315 sorghum accessions [15]. The range of Brix values observed in this study was also comparable to sorghum RILs derived from a cross between a dwarf grain sorghum and a tall sweet sorghum [65]. All traits showed high heritability except for PW and PRO with moderate heritability. Overall, large variations of phenotypic traits and high heritability of the traits provide an important basis for analyzing marker-trait associations in sorghum panel.
Some of the trait correlations agreed with the results in other studies. A positive correlation between HT and IN, negative correlation between TN and IN, and no correlation between HT and PL found in this study were consistent with a previous report in sorghum accessions [15]. However, a positive correlation between HT and PL was found in biparental quantitative trait loci (QTL) mapping population [31,66] and accessions including sweet sorghum, grain sorghum, forage sorghum, mutant populations, and B-and R-lines [67]. We identified a positive correlation between HT and TN, which was also shown in a segregating population derived from a cross between S. bicolor and S. sudanense [68], but not in a RIL population derived from a cross between grain sorghum and sweet sorghum [63]. HT was negatively correlated with SD, which was consistent with the previous result [63], but did not support the correlations noted in another study [54]. In addition, the positive correlation between HT and Brix observed in this study was consistent with previous results [52,[69][70][71]. Brix and RS were measured using stem tissue, but we found a negative correlation between SD and RS, but no correlation between SD and Brix. No correlation between SD and Brix and a positive correlation between SD and sugar were found previously [63], however, a positive correlation between HT and Brix was observed in RILs of sorghum [71]. Collectively, the results demonstrated interrelationships among morphological, agronomic and physiological traits in the diverse collection. Samples size, nature of the populations, and environmental conditions may all contribute to the variation of relationships among various traits.

LD Decay
The LD decay pattern of all 300 samples and 250 diverse accessions was almost identical, while LD decay for RIL lines was somewhat different ( Figure 2C). Linkage disequilibrium determines the resolution of an association mapping [72]. In this study, LD decayed to r 2 = 0.18 within 100 kb and to r 2 = 0.10 within 500 kb across all 300 samples or across 250 diverse germplasm ( Figure 2C). Previous studies have shown a varied LD decay pattern in sorghum. The LD decay was found to r 2 < 0.2 within 5 kb and r 2 < 0.1 within 20 kb in sweet cultivars and landraces [25], 15-20 kb in accessions [73,74], 30 kb in a mini core collection of landraces [73], 50 kb in the diverse landraces [16], 50-100 kb in landraces [75], and within 150 kb in accessions [12]. A recent study has demonstrated that LD decay to r 2 < 0.1 within 50 kb in the diverse landraces, but decay to r 2 = 0.2 within 90 kb in the recombinant inbred lines (RILs) [16]. In addition, several RILs population had LD decay to r 2 = 0.4 to 0.6 within 100 kb and to r 2 = 0.3 to 0.5 within 500 kb [76], and LD values for RILs in this study were in this range and consistent with their results. LD rate decay was slower in RILs families (∼500 kb) compared to diverse accessions from China and Ethiopia (∼20 kb) [76]. The mapping population in this study consisted of 50 RILs, accounting for 16.7% of total genotypes. The outcome of LD was similar to that observed in the RILs [16,76]. It appeared that genome coverage of markers and number and type of genotypes could contribute to variations of LD. The faster LD decay in more diverse populations may reflect greater recombination and lead to higher mapping resolution than in RIL families [72].

Marker Effects
On average, the significant SNPs caused 9.9% of phenotypic trait variations in this study ( Table 4). The SNP effects on the traits were higher, lower or comparable to other studies in sorghum. For example, a SNP on chromosome 9 explained 11.1% of HT variation, which was in a range of 10% to 12% found in a population derived from a cross between grain sorghum and sweet sorghum [22], but lower than 16% to 29% identified in diverse sorghum accessions [15]. A marker causing 10.1% of TN variation on chromosome 6 was higher than the effects of two QTLs on fertile tiller number (~7.8%) but was lower than the other two QTLs on the same chromosome (~13.7%) in sorghum [58]. A SNP at S8_49724219 on chromosome 8 accounted for 9.6% variation of PL (Table 4), which was higher than the value in long-day trials (6.9%) and in short-day trials (4.0%) on the same chromosome in a RIL population of sorghum [66]. In addition, we found that a marker on chromosome 6 explained 11.5% of Brix variation, which was comparable to the previously found marker effects (~10%) on the same chromosome [22]. In addition, SNP effects accounted for 8.9% to 12.5% of RS variations on chromosomes 1 and 9, similar to those QTL effects for glucose juice (11%-12%) or within the range of QTL effects (7%-21%) on sugar yield in sorghum [22]. The different mapping panels, tested diverse environments, and marker density may lead to variations of marker effects on phenotypic traits.

Significant SNPs across Environments
Genetic signals have been identified for many phenotypic traits in sorghum through linkage mapping and/or GWAS, providing comparisons to our results. In sorghum, several studies identified a signal for HT on chromosome 9 at between 52 to 59 Mbp [19,36,[52][53][54][55][56][57], while a SNP at S9_56656748 found in this study was in this genomic region noted above (Table 4, Supplemental Table S4). It demonstrated that S9_56656748 was a strong signal for controlling HT on chromosome 9. Previously, the Dw1 locus located at~57 Mbp on chromosome 9 was detected, which positively regulates brassinosteroid signaling [32]. The SNP at S9_56656748 was not too far from the Dw1 locus, and it might be the same signal. In addition, the SNP at S9_56656748 in our study was 609 kb away from a previously identified marker on chromosome 9 that affected plant height in sorghum [15]. We also detected a signal for HT on chromosome 8 (S8_ 50424256) that was overlapped with a known QTL [70]. However, a SNP on chromosome 3 (S3_53230521) found in this study was about 1.6Mbp away from a reported QTL [63], so it could be a novel signal for controlling HT in sorghum.
We detected four SNPs (S2_63990205, S6_46267204, S8_54924780, S9_52354291) associated with TN. Of them, three SNPs were overlapped with the previously reported QTLs on chromosome 2 [58], chromosome 6 [36,55,59], and chromosome 9 [55] in sorghum, including one identified through a GWAS [36] (Table 4, Supplemental Table S4). However, a SNP at S8_54924780 on chromosome 8 did not fall into a known QTL region, and was approximately 3.4 Mbp away from a signal detected for TN though a GWAS using diverse sorghum germplasm [15]. The result suggests that S8_54924780 for TN could be a novel signal. We also confirmed that all four SNPs (S1_11646700, S2_794419, S8_ 49724219, S10_1803717) associated with PL were overlapped with regions of signals identified previously, including one on chromosome 1 and 8 by a GWAS [36], one on chromosome 2 through a QTL mapping [51] and GWAS [60], and one on chromosome 10 with a QTL mapping [61] (Table 4,  Supplemental Table S4).
Three SNPs (S5_216438, S6_ 39878130, S8_40217639) were associated with Brix, and two were overlapped with known QTLs on chromosome 6 [74] and on chromosome 8 [63]. A SNP on chromosome 5 was approximately 1.0 Mbp away from a reported QTL [64], suggesting that this could be a novel signal. Among the traits, RS had the highest number of associations. Six out of ten significant SNPs were overlapped with known QTLs for RS, including four on chromosome 1 around 51 Mbp [52,64], one on chromosome 2 [52], and one on chromosome 3 [64]. Additional four SNPs were considered as novel signals, since they were distant to known QTLs, approximately 2.2 Mbp away on chromosome 2 [63], 3.0 Mbp and 8.8 Mbp on chromosome 9 [63], and 55.6 Mbp on chromosome 10 [77], respectively.

Significant SNPs for Individual Environment
In addition to a few common signals identified across environments as well as in a single environment, some unique SNPs were detected in each individual environment (Supplemental Table  S4). Notably, significant associations were found for IN, SD, PW and PRO for Env 1, which were all novel signals. For example, two SNPs for SD on chromosome 4 were 36.4 Mbp and 22.8 Mbp away from a reported QTL [51], while one SNP for PRO on chromosome 2 was 17.5 Mbp away from a known signal SNP [29] and four SNPs for PRO on chromosome 7 were about 10.1 to 54.1 Mbp away from a known QTL [78]. Thirteen significant associations for PW were especially found in Env 2, and ten were considered as novel signals. These SNPs ranged from 1.5 to 53 Mbp away from known QTLs on different chromosomes in sorghum [63]. For Env 3, there were ten significant associations for HT detected on chromosome 2, 4, 5, 6 and 9, respectively. Of them, four SNPs were considered as novel signals, since they were distant to known signals on chromosome 2 found a GWAS study [34] and on chromosome 5 [79] and chromosome 9 [63] found in a QTL mapping. In addition, novel signals were also noted for PW on chromosome 2, 5 and 9 in Env 3, respectively.
Overall, some novel SNPs were identified, either across environments or in a single environment analysis. We also confirmed the previously reported QTL or SNP signals in sorghum germplasm or QTL mapping population. In this study, we used 2b-RAD genotyping method, which is proven as a simple and flexible method for genome-wide genotyping [44,45,80,81]. Given that 2b-RAD is a reduced representation sequencing approach compared to the whole genome sequence (WGS), it is expected that the number of identified SNP markers in a population panel are lower than that using WGS. This can affect mapping resolution since high density genotyping increased the precision of association mapping in the panel. Nevertheless, our results indicate robustness of some candidate SNP signals in sorghum, regardless of variation of size and nature of different populations and genotype by environment interactions.

Potential Candidate Genes
The potential candidate genes in the significant SNP regions extended to 100 kb were searched for each signal associated with the target trait. Plant height is controlled by the phytohormones gibberellins (GA), brassinosteroids (BR), and auxins and by altering the distance between internodes. Dw3, a well-known auxin transporter gene, is located on chromosome 7 with a major effect on sorghum plant height [13]. We did not detect this gene nor any signals for HT on chromosome 7. However, a SNP at S9_56656748 for HT was found approximately 386 kb from the gene underlying the Dw1 locus, a regulator of sorghum stem internode length [82]. Dw1 is a novel component of brassinosteroid signaling and acts as a positive modulator of brassinosteroid signaling [32]. However, interestingly, approximately 260 bp away from SNP S9_56656748, a gene Sb09g026370, a homolog of AT4G14430, encoding indole-3-butyric response 10 (IBA10) was identified. Since IBA10 might be involved in the conversion of indole 3-butyric acid (IBA) to indole 3-acetic acid (IAA) in Arabidopsis [83], this Sb09g026370 could be the candidate gene that regulates plant height in sorghum. Notably, gene Sb03g026400, a homolog of AT1G76190, encoding small auxin up RNAs (SAUR)-like auxin-responsive protein, was also identified corresponding to SNP S3_53230521 on chromosome 3. In Arabidopsis, overexpression of several distinct SAUR genes results in increased growth of cotyledons, hypocotyls, or roots, respectively [84], suggesting a role of this protein in promoting plant growth.
Brassinosteroids (BRs) are a class of steroidal hormones essential for plant growth and development [85]. Gene Sb10g002190, a homolog of AT5G51550 encoding exordium (EXO)-like 3 protein, was found for PL in this study. The EXO gene was identified as a potential mediator of BR-promoted growth. Expressions of EXL3 (At5g51550) and EXL5 (At2g17230) were positively correlated with EXO expression in Arabidopsis, and an exo knock-out mutant showed reduced cell size and number and plant growth [86], suggesting a role of the EXL in promoting plant growth. We also identified another candidate gene Sb02g000960, a homolog of AT1G69310 encoding WRKY57 transcription factor. In Arabidopsis, WRKY57 protein acts as a repressor in jasmonic acid-induced leaf senescence and is a common component of the jasmonic acid-and auxin-mediated signaling pathways [87]. The results indicate a role of WRKY57 in regulating plant growth.
Several genes involved in carbohydrate metabolism were noted for Brix and RS, ranging from 5.1 kb to 88 kb from the associated SNPs. Interestingly, a gene Sb02g036310 was 5.1 kb from the SNP that was associated with RS. This gene is a homolog to AT1G11260, encoding SUGAR TRANSPORTER PROTEIN 1 (STP1), a high-affinity sugar transporter that acts as an H+/monosaccharide cotransporter, capable of transporting a wide range of hexoses [88,89]. STP1 is the member of the STP family with the highest expression level [90]. Moreover, the transcript AT1G11260 was strongly regulated by sugars, depending on phosphorylated hexoses [91]. Another gene Sb09g025790 encoding trehalose-6-phosphate synthase, located 64.2 kb from the significant SNP, is a homolog to AT1G78580 (AtTPS1). In Arabidopsis, AtTPS1 with a terminal TPS domain, is involved in trehalose synthesis [92]. It seemed that Sb02g036310 and Sb09g02579 might be candidate genes mediating sugar metabolism in sorghum.

Conclusions
Plant phenotype is regulated by a complex network of genetic and environmental signals. The work presented here has generated new knowledge of the genetic mechanisms underlying plant architecture and agronomics traits in sorghum, sudangrass and RILs collections. Through genome-wide association analysis across environments, 24 significant SNPs were detected and associated with plant height, tiller number, panicle length, Brix and reducing sugar. A few signals have not been previously detected and were considered as novel regions for controlling various traits. Several putative candidate genes underlying the signals could be potentially the targets for further validation of their roles in affecting plant architecture and growth. The significant markers may be used for marker-assisted breeding programs aimed at improvement of improved food, feed, and biofuel productions in sorghum and related species.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4395/10/10/1602/s1, Table S1: Information about 300 germplasm used in this study; Table S2: Soil and environmental conditions in experimental locations; Table S3: SNP genotype of the population (HapMap format); Table S4: Summary of GWAS signals correspondence with other studies.
Author Contributions: F.L. led collecting phenotypic data and preparing the manuscript; Z.P. performed genotyping and assisted in collecting phenotypic data and preparing the manuscript; X.Z. performed GWAS analysis; H.L. contributed to data interpretation; F.L. and S.S. designed the experiments; Y.J. led data analysis and writing of the manuscript. All authors have read and agreed to the published version of the manuscript.