Genomic Analysis of Selected Maize Landraces from Sahel and Coastal West Africa Reveals Their Variability and Potential for Genetic Enhancement

Genetic adaptation of maize to the increasingly unpredictable climatic conditions is an essential prerequisite for achievement of food security and sustainable development goals in sub-Saharan Africa. The landraces of maize; which have not served as sources of improved germplasm; are invaluable sources of novel genetic variability crucial for achieving this objective. The overall goal of this study was to assess the genetic diversity and population structure of a maize panel of 208 accessions; comprising landrace gene pools from Burkina Faso (58), Ghana (43), and Togo (89), together with reference populations (18) from the maize improvement program of the International Institute of Tropical Agriculture (IITA). Genotyping the maize panel with 5974 DArTseq-SNP markers revealed immense genetic diversity indicated by average expected heterozygosity (0.36), observed heterozygosity (0.5), and polymorphic information content (0.29). Model-based population structure; neighbor-joining tree; discriminant analysis of principal component; and principal coordinate analyses all separated the maize panel into three major sub-populations; each capable of providing a wide range of allelic variation. Analysis of molecular variance (AMOVA) showed that 86% of the variation was within individuals; while 14% was attributable to differences among gene pools. The Burkinabe gene pool was strongly differentiated from all the others (genetic differentiation values >0.20), with no gene flow (Nm) to the reference populations (Nm = 0.98). Thus; this gene pool could be a target for novel genetic variation for maize improvement. The results of the present study confirmed the potential of this maize panel as an invaluable genetic resource for future design of association mapping studies to speed-up the introgression of this novel variation into the existing breeding pipelines.


Introduction
Maize (Zea mays L.) is one of the most important cereal crops consumed in sub-Saharan Africa (SSA) and an essential component of livestock feed in the developed as well as developing world.

Plant Material
We analyzed 208 maize accessions obtained from international and national gene banks in Africa (Supplementary Table S1). The maize panel comprised 190 landraces representing gene pools from Burkina Faso (58), Ghana (43), and Togo (89) (Supplementary Figure S1). The landraces from Burkina Faso and Togo were sourced from the gene bank at the International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria, whereas those from Ghana were provided by the Plant Genetics Resources Institute (PGRI) at Bunso, Ghana. The majority of these landraces were collected from farmers' fields in the 1970s and 1980s. However, the eco-geographical data of the collection sites of the landraces were not available. The study also included a diverse set (18) of drought-and heat-tolerant open pollinated populations (hereafter referred to as reference populations) developed by the Maize Improvement Program at IITA (MIP-IITA), Ibadan, Nigeria.

DNA Isolation and Genotyping Analysis
For each accession, total genomic DNA was isolated from bulked leaf composites from 15 seedlings at two weeks old according to the DArT protocol (https://www.diversityarrays.com/orderinstructions/ plant-dna-extraction-protocol-for-dart/). The quality of each DNA sample was visualized by electrophoresis on 0.8% agarose gel, and the purified DNA was further quantified using a nano-drop spectrophotometer (Thermo Scientific, Wilmington DE, USA). Certified DNA samples were then sent to the Integrated Genomic Service and Support (IGSS) genotyping platform, Nairobi, Kenya, for genotyping. High-throughput genotyping was conducted in 96 plex DArTseq protocol, and SNPs were called using the DArT's proprietary software, DArTSoft, as previously described [22]. Reads and tags found in each sequencing result were aligned to the Zea mays L. genome reference, version AGPV3 (B73 Ref-Gen v4 assembly) [23].

Data Analysis
A total of 47,441 putative DArTseq markers were generated from the 208 maize panel. Prior to further analysis, the raw data set was filtered to remove markers with call rate <0.8, minor allele frequency (MAF) <0.05, and unmapped SNP markers. Thereafter, markers with no missing rate were retained using the TASSEL software version 5.2.12 [24]. The retained markers were subjected to various genetic diversity analyses including basic diversity statistics such as polymorphic information content (PIC), MAF, observed heterozygosity (H o ), and expected heterozygosity (H e ) using PowerMarker v. 3.2.5 [25]. The population structure of the maize panel was inferred using the Admixture model-based clustering algorithm implemented STRUCTURE 2.3.4 [26]. The adhoc number of clusters (k) was varied from 1 to 12, with 10,000 burn-in steps, followed by 10,000 Markov chain Monte Carlo simulations, as previously described [20,27]. For each k, ten independent iterations were implemented. The most likely number of k was determined by the ad hoc ∆ k statistics [28] embedded in Structure Harvester [29]. Accessions with membership proportions (Q-value) ≥80% were assigned to groups, while those with membership probabilities less than 80% were designated as admixtures [30]. The population structure of each gene pool (Burkinabe, Ghanaian, Togolese, and reference populations) were also estimated as described above. A discriminant analysis of principal components (DAPC) was carried out on the 208 maize panel using the first 40 principal components using the adegenet R package [31]. Membership probabilities of the individuals for the different gene pools were estimated using the "find cluster" function implemented in adegenet. Further, principal coordinate analysis (PCoA) was conducted to reveal the genetic relationships among the maize accessions using GenAlEx v. 6.503 software [32]. An unrooted neighbor-joining (NJ) tree was constructed by following the procedure of Nei [33] with 1000 bootstrap replicates in PowerMarker v3.25 [25]. The resulting NJ tree was visualized in Molecular Evolutionary Genetics Analysis (MEGA) software version X [34] and edited using Figtree software v.1.4.4 [35]. Genetic relationships within each maize gene pool were elucidated Genes 2020, 11, 1054 4 of 14 through construction of an unrooted NJ tree, as described above. Analysis of molecular variance (AMOVA) was estimated in GenAlEx v. 6.503 [32] to partition components of genetic variance among and within the populations (k). Calculation of pairwise genetic differentiation statistics (F ST ) and haploid number of migrants (Nm) between gene pools was performed using GenAlEx v6.503 [32] with 999 permutations. F ST measures the amount of genetic variance that can be explained by population structure based on Wright's F-statistics [36], while Nm = [(1/F ST ) − 1]/4. An Nm value less than 1 indicates limited gene exchange among subpopulations [36].

Analysis of Genetic Diversity Parameters
Out of the total 47,441 putative DArTseq markers, 5974 were retained after filtering. The GBS-DArTseq markers were unequally distributed across the ten chromosomes of the 208 maize panel. Chromosome 1 had the highest number of markers (905), while chromosome 10 had the least (422) (Supplementary Figure S2)

Population Structure and Genetic Relationships
The model-based simulation of population structure analysis of the maize panel (208 accessions) showed that the delta K values from the mean log-likelihood probabilities plateaued at K = 3 (389.43), followed by K = 4 (276.33), and K = 2 (273.07) (Figure 1a). At K = 3, the 208 maize panel was divided into three sub-populations ( Figure 1b). Using an 80% membership probability threshold, 122 accessions (58.65%) were successfully assigned to the three subpopulations. In comparison, 86 accessions with a probability of associations less than 80% were designated as an admixed population (Supplementary Table S2). Subpopulation 1 was the most uniform (membership coefficient averaged, 90%), and it contained 53 landraces (49 from Burkina Faso, 3 from Togo, and 1 from Ghana). Subpopulations 2 and 3, which constituted 12.5% and 20.67% of the panel, respectively, were admixtures of Ghanaian and Togolese landraces, together with reference populations. Specifically, subpopulation 2 consisted of 26 accessions, 10 reference populations, and 11 and 5 Togolese and Ghanaian landraces, respectively. Subpopulation 3 comprised 24 and 15 landrace accessions from Ghana, and Togo, respectively, and 4 accessions from the reference population. The admixed group contained 60, 13, and 9 landraces from Togo, Ghana, and Burkina Faso, respectively, plus 4 reference populations (Supplementary Table S2). The additional smaller peaks observed at K = 4 (276.33) and K = 2 (273.07) implied the presence of subgroups within the three major groups ( Figure 1). Therefore, an independent STRUCTURE run was performed for each gene pool. Sub-clustering of the Burkinabe and Ghanaian gene pools both Genes 2020, 11, 1054 5 of 14 yielded a sharp peak at K = 2 (Figure 2a,b). Sub-clustering the reference populations and Togolese landraces showed the highest peak at K = 3, and K = 9, respectively (Figure 2c,d). A substantial degree of admixture was observed for each gene pool (Supplementary Table S3).
Genes 2020, 11, x FOR PEER REVIEW 5 of 13 and Ghanaian gene pools both yielded a sharp peak at K = 2 (Figure 2a,b). Sub-clustering the reference populations and Togolese landraces showed the highest peak at K = 3, and K = 9, respectively ( Figure  2c,d). A substantial degree of admixture was observed for each gene pool (Supplementary Table S3).
(a) (b) and Ghanaian gene pools both yielded a sharp peak at K = 2 (Figure 2a,b). Sub-clustering the reference populations and Togolese landraces showed the highest peak at K = 3, and K = 9, respectively ( Figure  2c,d). A substantial degree of admixture was observed for each gene pool (Supplementary Table S3).
(a) (b) Using the Bayesian information criterion (BIC) implemented in DAPC, a maximum of K = 3 was obtained, which corresponded to three groups of maize accessions in the panel (Figure 3). Estimation of the cluster membership revealed that cluster three had the highest number of accessions (94) followed by cluster two with 77 accessions, and cluster one with the smallest number of accessions (37). Of the 94 accessions in cluster three, 58 (61.70%) and 29 (30.85%) were landraces from Togo and Ghana, respectively, including six reference populations and the landraces from Burkina Faso (Supplementary  Table S4). All the accessions in cluster two were landraces from Burkina Faso (57), Togo (17), and Ghana (3). Of the 37 accessions in cluster 1, 14 (37.84%) were Togolese landraces, 12 (32.43%) were from the reference populations, while 11 (29.73%) were Ghanaian landraces.
Genes 2020, 11, x FOR PEER REVIEW 6 of 13 Using the Bayesian information criterion (BIC) implemented in DAPC, a maximum of K = 3 was obtained, which corresponded to three groups of maize accessions in the panel (Figure 3). Estimation of the cluster membership revealed that cluster three had the highest number of accessions (94) followed by cluster two with 77 accessions, and cluster one with the smallest number of accessions (37). Of the 94 accessions in cluster three, 58 (61.70%) and 29 (30.85%) were landraces from Togo and Ghana, respectively, including six reference populations and the landraces from Burkina Faso (Supplementary Table S4). All the accessions in cluster two were landraces from Burkina Faso (57), Togo (17), and Ghana (3). Of the 37 accessions in cluster 1, 14 (37.84%) were Togolese landraces, 12 (32.43%) were from the reference populations, while 11 (29.73%) were Ghanaian landraces. Further investigation of the genomic structure of the maize panel using the PCoA indicated three subpopulations as per the STRUCTURE simulation and DAPC analyses (Figure 4). The total amount of genetic variation explained by the first two principal coordinates was 57%. The PCoA clearly separated subpopulation 3 (by PCo2), which showed a higher degree of admixture between Ghanaian and Togolese landraces, including six reference populations and a landrace from Burkina Faso. The other two subpopulations appeared to be distributed along PCo1. Although some degree of overlap among landrace gene pools was shown in subpopulation 1, located at the upper extreme of PCo1, ~75% were Burkinabe landraces. Subpopulation 2 distributed along the lower extreme of PCo1 was the most distant of the three, comprising the majority of the reference populations and four Ghanaian landraces.
As per the STRUCTURE, DAPC, and PCoA results, the NJ phylogenetic tree also showed three sub-populations with higher degrees of admixture among Ghanaian and Togolese landraces, and reference populations ( Figure 5). The neighbor-joining tree performed for each gene pool divided the Further investigation of the genomic structure of the maize panel using the PCoA indicated three subpopulations as per the STRUCTURE simulation and DAPC analyses (Figure 4). The total amount of genetic variation explained by the first two principal coordinates was 57%. The PCoA clearly separated subpopulation 3 (by PCo2), which showed a higher degree of admixture between Ghanaian and Togolese landraces, including six reference populations and a landrace from Burkina Faso. The other two subpopulations appeared to be distributed along PCo1. Although some degree of overlap among landrace gene pools was shown in subpopulation 1, located at the upper extreme of PCo1,~75% were Burkinabe landraces. Subpopulation 2 distributed along the lower extreme of PCo1 was the most distant of the three, comprising the majority of the reference populations and four Ghanaian landraces.
Burkinabe and Ghanaian gene pools into two main clusters (Figure 6b). The Togolese landraces and the reference populations were grouped into nine and three clusters, respectively (Figure 6b,c).  Accessions are colored according to origin. The orange, ash, and violet circles represent subpopulation 1, 2, and 3, respectively.
As per the STRUCTURE, DAPC, and PCoA results, the NJ phylogenetic tree also showed three sub-populations with higher degrees of admixture among Ghanaian and Togolese landraces, and reference populations ( Figure 5). The neighbor-joining tree performed for each gene pool divided the Burkinabe and Ghanaian gene pools into two main clusters (Figure 6b). The Togolese landraces and the reference populations were grouped into nine and three clusters, respectively (Figure 6b,c). Burkinabe and Ghanaian gene pools into two main clusters (Figure 6b). The Togolese landraces and the reference populations were grouped into nine and three clusters, respectively (Figure 6b,c).

Analyses of Molecular Variance, Genetic Differentiation, and Gene Flow among Gene Pools
The AMOVA revealed that 14% of the total variation was found among gene pools, while the rest (86%) was within gene pools ( Table 2). The overall F ST value of the maize panel was 0.21, and the Nm value was 1.58. As shown in Table 3

Discussion
A well-characterized and diverse germplasm is an essential requisite for genetic enhancement of crops. In this study, we applied GBS technology to explore the genetic diversity and population structure of a maize panel comprising landrace gene pools from Burkina Faso, Ghana, and Togo, plus a reference population from IITA-MIP. The results of the estimated diversity indices revealed ample genetic diversity within the maize panel indicated by average H e (0.36) and H o (0.5). The He obtained in this study was comparable to the 0.36 reported for provitamin A (PVA) quality protein maize (QPM) germplasm from IITA-MIP [21] but was higher than that reported for maize landraces from Eastern Africa (H e = 0.25), Western Africa (H e = 0.18), and Sahel Africa (H e = 0.24) [9] as well as tropical maize breeding populations (H e = 0.22) [27] including IITA early-maturing white inbred lines [20]. Characterization of the Burkinabe, Ghanaian, and Togolese maize pools showed different values for the estimated diversity indices. The results indicated that the Togolese gene pool (H e = 0.36, H o = 0.50) contained slightly higher diversity than the Burkinabe (H e = 0.30, Ho = 0.41) and Ghanaian (H e = 0.32, H o = 0.34) landrace pools. Further, the low variation in the genetic indices identified between the landraces as a group, and the reference populations showed that the two germplasm sets possessed similar genetic diversity (Table 1). These results agreed with previous findings that tropical maize germplasm is highly diverse with H e > 0.3 [37][38][39]. The mean PIC obtained in the present study, 0.29 using 5974 DArTseq SNPs for the 208 maize accessions was higher than the 0.19 and 0.26 reported for tropical early-maturing maize inbred lines using 15,047 [30] and 7224 SNPs for a sample size of 94 and 134, respectively [20,27]. The discrepancies between the results of our study and those of earlier researchers may be due to the use of different genetic materials, the sample sizes, and the number of SNPs used. Nonetheless, the mean PIC value in this study was like the 0.29 recently reported for tropical PVA-QPM maize germplasm using 8171 DArTseq SNP markers [21].
The Evanno criterion employed for the model-based simulation of population structure identified the peak level of ∆K at K = 3 (Figure 1a), which depicted the presence of three genetically distinct subpopulations (Figure 1b). The proportion of admixed accessions (47%) in the maize panel, based on a membership probability threshold of 80%, suggested moderate genetic differentiation and gene flow. The DAPC, PCoA, and NJ phylogenetic analyses results all illustrated the existence of three subpopulations in the whole set of 208 maize accessions. Comparison of the results of the four complementary clustering methods (STRUCTURE, DAPC, NJ tree, and PCoA) revealed high consistency in the individuals assigned to each group, which reinforced the findings that the identified groups were indeed genetically distinct. The close proximity between Togolese and Ghanaian gene pools suggested high genetic relatedness of the two gene pools. This result was expected due to the geographical proximity of the two countries and the similarity of the climatic conditions. The Burkinabe gene pool largely diverged from all others, suggesting its adaptation to Sahel conditions, which is in agreement with its pattern of phenotypic diversity [12]. Multivariate analyses revealed high affinity of Ghanaian and Togolese landraces with the reference populations (Figures 1 and 3 -5). It is likely that some of these accessions are not true landraces but, rather, old improved cultivars that were either recollected or wrongly classified, as farmers usually consider improved varieties cultivated over longer periods in a given area as landraces [40]. The grouping together of some landraces with the reference populations also suggested a pedigree relationship. Hence, it is possible that some of the landraces analyzed in this study were local varieties that were selected by earlier maize breeders in IITA, based on high grain yield, earliness, and resistance to the maize streak virus (MSV), and adaptation to the drought and heat stress as starting materials for the development of inbred lines that were later involved in cross-breeding (see http://r4dreview.iita.org/index.php/tag/maize-improvement/). The additional smaller peaks observed at K = 4 (276.33) and K = 2 (273.07) implied the presence of subgroups within the three major groups ( Figure 1). Therefore, an independent STRUCTURE run was performed for each gene pool ( Figure 2). The high degree of genetic admixtures within each landrace gene pool observed with ancestry share of <80% probably reflects considerable levels of gene flow or germplasm exchange. Results of previous studies have shown that such an admixture is not unusual in landraces from restricted geographical backgrounds [40].
According to Frankham et al. [41], an F ST value greater than 0.15 can be considered as significantly differentiating populations. Thus, in the present study, the overall F ST value (0.21) supported the presence of significant genetic divergence within the maize panel. Wright [36] reported that an Nm value less than 1 indicated limited gene exchange among populations. In the present study, the overall Nm value of 1.58 (Table 2) indicated that moderate genetic exchange or gene flow may have occurred, leading to the moderate genetic differentiation between gene pools. This observation was consistent with the AMOVA results (Table 2), which indicated that 14% of the total variation was accounted for by gene pool variations. This result is consistent with the findings of previous studies [42]. According to the F ST values, the Burkinabe gene pool was the most differentiated (Table 3), in agreement with its divergence as revealed by the clustering methods (STRUCTURE, PCoA, DAPC, and NJ analyses). The divergence between the reference populations and landraces varied among the different gene pools. In particular, the low affinity of the Burkinabe gene pool with the reference population (F ST = 0.31, Nm = 0.98) suggested little involvement of the original Sahelian gene pool in the development of the modern maize varieties presently grown in the sub region. This observation is biologically and historically meaningful since in West Africa, the reference maize gene pool called Composite Y [43], which was developed through recombination of 145 flint landraces of West Africa savannah zone, contained only 2% each of the genetic materials from Burkina Faso and Niger, as well as 1% of those from Senegal [44]. In the analyses of the isozyme variability in West African maize cultivars, Sanou et al. [45] showed that Burkinabe landraces were distinct, even though some levels of gene flow between them and an elite open pollinated variety (SR 22) developed by IITA in 1984 from CIMMYT Pop 22 and widely adopted in Burkina Faso [46] was observed. Therefore, the Burkinabe gene pool, having been grown and selected by farmers over many generations under warmer and drier conditions, could harbor novel and favorable alleles for improving maize for tolerance to drought and heat stresses. It is notable that in our earlier work on this maize panel, the high degree of tolerance of the Burkinabe landraces to drought, heat, and the combined heat and drought stresses was unrivalled [47,48]. The high genetic similarity observed between the Ghanaian and Togolese landraces was supported by their low F ST (0.14) and high Nm (2.63) values. This result further reflected the gene flow via seed exchanges and local preferences towards a given agrotype owing to similar climatic conditions. These may have significantly shaped the distribution of the genetic diversity within Ghanaian and Togolese maize landraces, as was previously suggested [12]. The F ST and Nm values (Table 3) suggested that the Ghanaian gene pool was closer to the reference populations, in agreement with PCoA stratification (Figure 4). Indeed, the reference set analyzed in this study included two popular cultivars that are commonly cultivated in Ghana (Aburoheema and Obatanpa GH, coded IM1 and IM6, respectively). The deep knowledge of the genetic diversity and structure of Sahel and coastal West African maize landraces revealed in the present study provides an essential platform for efficient use of these valuable maize gene pools.

Conclusions
In the present study, we explored the genetic diversity and relationships within and between a maize panel comprising landrace gene pools from Burkina Faso, Ghana, and Togo and compared each to a reference maize population. The analysis of genetic diversity parameters indicated ample genetic diversity in the maize panel. The four multivariate methods were consistent in dividing the maize panel into three distinct genetic groups, each capable of providing different sources of variation for maize genetic enhancement. The genetic divergence of the Burkinabe gene pool was particularly remarkable. It, therefore, clearly represents an invaluable genetic resource that should be exploited to address the overarching goal of improving maize for adaptation to different environments, ecosystems, and stress situations. Overall, the genetic diversity revealed in this study has provided an invaluable resource for future analyses of candidate genes for local adaptations using robust association mapping experiments.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4425/11/9/1054/s1, Figure S1: Map of West Africa showing the countries of origin of the maize landraces analyzed in this study. Figure S2: Distribution of the 5974 DArTseq GBS makers across the 10 chromosomes of the 208 maize accessions. Table S1: Excel file with the description of the accessions analyzed in this study.