Genetic Heterogeneity in Cowpea Genotypes (Vigna unguiculata L. Walp) Using DArTseq (GBS)-Derived Single Nucleotide Polymorphisms

Cowpeas (Vigna unguiculata L. Walp) have been credible constituents of nutritious food and forage in human and animal diets since the Neolithic era. The modern technique of Diversity Array Technology (DArTseq) is both cost-effective and rapid in producing thousands of high-throughputs, genotyped, single nucleotide polymorphisms (SNPs) in wide-genomic analyses of genetic diversity. The aim of this study was to assess the heterogeneity in cowpea genotypes using DArTseq-derived SNPs. A total of 92 cowpea genotypes were selected, and their fourteen-day-old leaves were freeze-dried for five days. DNA was extracted using the CTAB protocol, genotyped using DArTseq, and analysed using DArTsoft14. A total of 33,920 DArTseq-derived SNPs were recalled for filtering analysis, with a final total of 16,960 SNPs. The analyses were computed using vcfR, poppr, and ape in R Studio v1.2.5001-3 software. The heatmap revealed that the TVU 9596 (SB26), Orelu (SB72), 90K-284-2 (SB55), RV 403 (SB17), and RV 498 (SB16) genotypes were heterogenous. The mean values for polymorphic information content, observed heterozygosity, expected heterozygosity, major allele frequency, and the inbreeding coefficient were 0.345, 0.386, 0.345, 0.729, and 0.113, respectively. Moreover, they validated the diversity of the evaluated cowpea genotypes, which could be used for potential breeding programmes and management of cowpea germplasm.


Introduction
Cowpea (Vigna unguiculata (L.) Walp.) is a pulse crop grown in the wet and dry agro-ecologies of the tropics, primarily where there is a lack of protein sources for human consumption [1].It is a source of inexpensive protein, vitamins, and minerals in both rural and urban areas and is a meat protein substitute that is rich in stable carbohydrates [2].The bulk producer of cowpea grains in sub-Saharan Africa is Nigeria, with approximately 3.6 million metric tons produced in 2021, while consuming more than 3.6 million metric tons [3].In South Africa, smallholder farmers produce a yield of less than 500 kg per hectare for subsistence demands [2].The FAO [3] reported a production of approximately 4.87 thousand metric tons in South Africa in 2021.
The productivity of cowpeas (mainly cultivated in semi-arid tropics) may be impeded by irregular rainfall patterns, despite being the most drought-tolerant crop [1].Heat can severely affect the yield of sensitive varieties because of poor pollen development, which causes flower abortion when temperatures reach 35 • C at night and results in poor seed and pod set [4].Production is also hampered by soils that lack organic matter and have low phosphorus levels [5].
Cowpeas contain phenolic acids, flavonoids, and tannins, which carry bioactive antioxidant, anti-inflammatory, and anti-carcinogenic compounds that provide health benefits to humans and animals [4].The use of flavonoids and tannin-rich forages has yielded improved ruminant health and productivity through increased resistance to gastrointestinal nematodes, reduced bloating, and improved nutrition via the presence of high-quality proteins in the small intestines [4].Cowpea seeds are also said to have cardio-protective potency effects and prevent cardiovascular diseases [4].Infused seeds treat amenorrhea when taken orally, and, according to the indigenous people of South Africa, crushed roots (when eaten with porridge) are believed to treat painful menstruation, epilepsy, and chest pain [4].
Modern DNA marker technology has been applied to cowpea research programmes for the molecular classification of germplasm, genetic development, and quantitative trait loci (QTL) mapping [6].Forty-four genotypes from four different subgenera of the genus Vigna have been classed using restriction fragment length polymorphism (RFLP) markers [6].Both wild and cultivated relatives have been further genetically classed for diversity studies using the following markers: random amplified polymorphic DNA (RAPD), inter-simple sequence repeats (ISSRs), amplified fragment length polymorphisms (AFLPs), and simple sequence repeats (SSRs) [6].Landraces from across the globe and African wild ancestral cowpeas have been collected and genotyped using 1200 single nucleotide polymorphism (SNP) markers to reveal and class gene pools [6].Recently, SNPs have been discovered with next-generation sequencing (NGS) technologies using genotyping-by-sequencing (GBS) to appropriate genetic diversity, phylogenetic interactions, and population structures [6].
More recently, a high-throughput GBS technology platform known as Diversity Array Technology sequencing (DArTseq) has been used to identify SNPs [7].It enables highthroughput, whole-genome profiling of crops, without requiring previous DNA sequence information, by using microarray hybridisations that detect present and absent individual fragments in the genomic profile [7].DArTseq allows for the selection of appropriate complexity-reduction methods for each organism, application, and selected genome assays and has been used for the molecular analyses of the large garlic germplasm bank [8].Identification of new gene sources for the development of better-performing cowpea breeding lines would require the use of DArTseq molecular screening of existing germplasm and improve breeding lines that are currently available.The main objectives of this study were to assess the genetic diversity of select cowpea genotypes using DArTseq (SNPs) and to identify heterogeneous genotypes and their genomic profiles that could be used for breeding programmes and crop improvements.

Plant Material and Sampling Procedure
Ninety-two genotypes that are commonly used in Nigeria and South Africa for breeding programmes based on their yield potential, seed colour, germination period, and abiotic and biotic stress performance were selected from the germplasm collection at the Agricultural Research Council-Grain Crops (ARC-GC).Four seeds from each cowpea genotype were planted using potting soil and covered by a thin layer of vermiculite to retain moisture in a 200 × 5 cm 2 pot.The pots were labelled with their genotypes and placed in a greenhouse during the autumn season at the ARC-GC, Potchefstroom, South Africa.Minimum temperatures ranged from 8 • C to 14 • C and the maximum temperatures ranged from 20 • C to 28 • C in a glasshouse.After 13 days of propagation, young trifoliate, fresh, and succulent leaves were excised from four plants and bulked into a single sample from each genotype, totalling to 368 plants.The leaf samples were placed in sterile centrifuge tubes and freeze-dried at −21 • C for five days using a freeze-drying machine.Leaf samples were labelled with their corresponding genotype sample code (Table 1), stored at −81 • C for 10 days, and sent to the Integrated Genotyping Service and Support (IGSS), Bioscience Eastern and Central Africa Hub-International Livestock Research Institute (BecA-ILRI), Nairobi, Kenya, for DNA extraction, DArT sequencing, and SNP analysis.

DNA Extraction
DNA extraction and sequencing were performed using the DArTseq™ protocol (Diversity Arrays Technology Pty Ltd., Canberra, Australia).Approximately 1 g of young leaf tissue from each accession was used for genomic DNA extraction.Genomic DNA was isolated from frozen leaves using a modified cetyltrimethyl ammonium bromide (CTAB)/chloroform/isoamylalcohol method [9].Frozen leaf tissue was ground and mixed with 2% pre-warmed (60 • C) CTAB isolation buffer containing 1.4 M NaCl, 100 mM Tris pH 8.0, and 20 mM EDTA (Sigma, Saint Louis, MO, USA).The mixture was then transferred to a 2 mL microcentrifuge tube and incubated at 60 • C for one hour.DNA was extracted once with chloroform-isoamyl alcohol (Chl/IAA, 24:1) (Sigma, Saint Louis, MO, USA) and precipitated using two volumes of isopropanol.The obtained pellet was washed with 70% EtOH, dried and dissolved in 100 µL TE buffer with 50 µg/mL RNAse A (Sigma, Saint Louis, MO, USA).The extracted DNA was quantified using 0.8% agarose gel electrophoresis, which was adjusted to 50 ng/µL for DArT and SNP genotyping.

DArTseq Analysis
DNA was processed in digestion/ligation reactions as reported by Kilian [10] by replacing a single PstI-compatible adaptor with two different adaptors corresponding to two different restriction enzymes (REs) as follows: PstI-and SphI-compatible adaptors.The PstI-compatible adapter was designed to incorporate an Illumina flow-cell attachment sequence with staggered sequences of varying length barcode regions, like the sequence reported by Elshire [11].The reverse adapter contained a flow-cell attachment region with an SphI-compatible overhang sequence.Only mixed fragments (PstI-SphI) were effectively amplified in 30 rounds of PCR using the following reaction conditions: 94 • C for 1 min, then 30 cycles at 94 • C for 20 s, 58 • C for 30 s, 72 • C for 45 s, and 72 • C for 7 min.Following PCR, equimolar amounts of amplification products from each sample of the 96-well microtiter plate were bulked and applied to a c-Bot (Illumina) bridge PCR followed by sequencing on Illumina Hiseq2500 (Illumina, San Diego, CA, USA).The sequencing (single read) was run for 77 cycles.
Sequences generated from every lane were processed using proprietary DArT analytical pipelines (PLs).In the primary pipeline analysis, fragments of poor-quality sequences with reproducibility below 90% and a read depth lower than 3.5 for SNPs or 5 for the presence or absence markers were filtered out.More stringent selection criteria were applied to the barcode region compared with the remainder of the sequences.The assignments of the sequences to specific samples carried within the barcode split step were extremely reliable [12].No samples were dropped because of low coverage across loci; however, individual sequences were removed if they did not meet the above criteria.Approximately 2.5 million sequences per barcode/sample were identified and used in marker calling.The average browse depth across loci was 9.2 reads per individual per locus for reference alleles and 6.5 for SNP alleles.Finally, identical sequences were collapsed into fastqcoll files.The fastqcoll files were groomed using DArT PL's proprietary algorithm, which corrects a low-quality base from a singleton tag into a correct base using collapsed tags with multiple members as a template.The groomed fastqcoll files were used in the secondary pipeline for DArT PL's proprietary SNP and SilicoDArT (presence/absence of restriction fragments in representation; PA markers) calling algorithms using DArTsoft14 (Diversity Arrays Technology Pty Ltd., Canberra, Australia).Two types of DArTseq markers were scored, SilicoDArT markers and SNP markers, which were both scored in a binary fashion for presence/absence (1 and 0, respectively) of the restriction fragment with the marker sequence as the genomic representation of the sample.Both SilicoDArT and SNP markers were aligned to the reference genomes of Vunguiculata_469_v1.0 to identify chromosome positions.

SNP Calling
SNP calling was conducted for all tags from all libraries enclosed within the DArT-soft14 analysis and clustered using DArT PL's C++ algorithm program at a brink distance of 3. Parsing of the clusters into separate SNP loci was completed by employing a technique called the balance of read counts for the allelic pairs [11].Additional choice criteria were further added to the algorithm program and supported by an analysis of roughly 1000 controlled cross-populations.Testing for deviations from the Hardy-Weinberg equilibrium of alleles in these populations was conducted to facilitate the selection of technical parameters to effectively discriminate true allelic variants from paralogous sequences.In addition, multiple samples were processed from the DNA to allelic calls as technical replicates and scoring consistency was used as the main selection criteria for high-quality/low-error rate markers.Calling quality was assured by a high average browse depth per locus (the average across all markers was over 30 reads/locus), and DNA was diluted to 50 ng/µL for the GBS analysis [12].

Results
Ninety-two cowpea genotypes were sequenced using SilicoDArT markers (indicating present/absent fragments in DArT genomic representations) and SNP data.A total of 33,920 DArTseq-derived SNPs were recalled for analysis, with only 16,960 reserved for further computed analyses.

Heterogeneity Analysis
Ninety-two cowpea genotypes were sequenced to attain single-nucleotide polymorphic base pairs that were equi-frequent at a specific genomic locus on a chromosome.Gene diversity among cowpea genotypes was assessed (Table 2) using major allele frequencies (MAFs), which yielded an average of 0.729 (range 0.719-0.739).Observed heterozygosity (Ho) averaged 0.386 with a range of 0.358-0.413.Expected heterozygosity (He) averaged 0.345 (range 0.345-0.346),and polymorphic information content (PIC) averaged 0.345 with a range of 0.333-0.354.The inbreeding coefficient (FIS) averaged −0.113 with a range of −0.167 to −0.052, as indicated in the table below (Table 2).

Discussion
The DArTseq-SNP data were analysed by computing a UPGMA dendrogram, which uncovered three major clusters with clear genetic diversity among the selected genotypes.The analysis also revealed significant genetic distance relations, both individually and in populations, that can be used in germplasm selection for breeding objectives and germplasm maintenance (Figure 1).The PCA confirmed some of the distinct genotypes indicated in the previous analysis (Figure 2).The heatmap analysis illustrated half of the genotypes that possessed the greatest genetic distance to narrow down the selection of divergent parental genotypes (Figure 3).Genotypes were grouped into clusters based on their genetic distances.The most divergent parental genotypes were TVU 9596 (SB26), Orelu (SB72), 90K-284-2 (SB55), RV 403 (SB17), and RV 498 (SB16).These findings agree with Malik [13] and indicate that PCAs can be successfully used to classify and measure patterns of genetic diversity in germplasm.Moreover, PCA has been effective in assessing genetic diversity for agro-morphological traits in chickpea genotypes [14].
A marker possessing two or more alleles with 99% frequency is qualitatively referred to as polymorphic [15].The extent of polymorphism is quantitatively measured by heterozygosity and the polymorphism information content (PIC) value [15].Polymorphisms can be classed into three categories as applied in simple-sequence repeats (SSRs) studies including PIC > 0.5 (highly informative), 0.5 < PIC > 0.25 (moderately informative), and PIC < 0.25 (slightly informative) [16].This study demonstrated the genetic diversity of genomic DArTseq-SNPs distributed among eleven chromosomal loci of ninety-two cowpea genotypes, with a mean PIC value of 0.345 and a range of 0.333-0.354(moderately informative polymorphism).This observation could have been caused by low mutations and the bi-allelic nature of SNPs [17].In population genetic analyses, major and minor allele frequencies are determined at a particular locus in a population to indicate genetic diversity [12].Our study focused on the most common alleles for given DArTseq-SNPs in a population.The major allele frequency had a significant mean value of 0.729 with a range of 0.719-0.739.The result inferences are based on frequencies that are MAF ≥ 0.5% [18].Inferences with values close to 1 predict a better distribution of allele frequencies at a given locus in the observed population [12].The inbreeding coefficient (F IS ) was also used as a measure of the non-random association of alleles in individuals in a population.This study recorded a F IS mean value of −0.113 with a range of −0.167 to −0.052.Negative F IS values indicate the presence of heterozygotes within individuals in a population [19].
Heterozygosity is the most important parameter used to estimate genetic diversity in a population.Gene diversity, also known as expected heterozygosity (He), outlines expected genotypes that are heterozygous under Hardy-Weinberg equilibrium [16].The recorded mean value of He was 0.345 (range 0.358-0.345)and the observed heterozygosity (Ho) mean value was 0.386 with a range of 0.358-0.413.Both mean observed and expected heterozygosity showed moderate genetic variation, as supported by a mean negative inbreeding degree.The mean He was relatively equal to the PIC, and both were less than Ho.This could be a result of the evenness of allele frequencies, indicating individuals that could have identical heterozygote genotypes [17].
Genetic variability in plant material is the basic information needed for breeders to improve crops by adopting an appropriate method of selection [20].This study highlights the DArTseq platform as an approach for genome complexity reduction.The platform provides whole-genome sequencing, which covers adequate genomic information per locus, improved genotype calling, and the ability to sequence more samples for a lower cost [21].The system can be applied without available reference genomes using doubledigest restriction fragment sequencing, which appropriates evolutionary genetic studies in the bio-system [21].DArTseq is a cost-effective genotyping platform used for maintaining, profiling, ascertaining, and managing available biodiversity in germplasm banks.DArTseq sequences can be used to develop DArTseq markers and other molecular markers such as SNPs or SSRs, which can be used in other germplasm banks [8].This information can be further used in association-of-trait-of-interest and marker-assisted breeding.
The DArTseq GBS platform was used by the CIMMYT on their seeds of discovery project, which profiled more than 40,000 wheat gene bank genotypes, as reported by Li [22].The DArTseq platform produces high throughput molecular markers that help to discriminate the gene pools of cowpeas by detecting substantial genetic variation among genotypes [12].DArTseq SNPs are important in the selection of diverse parental genotypes in crosses because of their capability to discriminate gene pools.Breeding programmes can narrow the genetic variation in crops.Thus, cowpea germplasm that is introduced and moved to a new environment could increase genetic variation.While arable land for farming is limited because of globalisation, this propels the genotypic advancements of crops with improved genetic variation to guide breeding programmes [6].

Conclusions
DArTseq provided high throughput SNP markers that illustrate the heterogeneity in the selected cowpea genotypes from the ARC-GC cowpea germplasm collection.DArTseq SNPs provided whole genome information with a good distribution of major allele frequencies on observed genotypes, which is beneficial when selecting parents with novel genes from the current collection with known agronomic traits.This technology has also supplied negative inbreeding coefficients among individual genotypes in a population within a short time.The SNPs could be further used to construct high-resolution novel gene mapping experiments and to identify genes responsible for traits of interest that can be used in genomic-wide association studies.
Genotypes TVU 9596 (SB26), Orelu (SB72), 90K-284-2 (SB55), RV 403 (SB17), and RV 498 (SB16) could be used as divergent parents in breeding programmes.Furthermore, the overall genetic heterogeneity in the selected genotypes indicated moderately informative polymorphisms.These cowpea genotypes are recommended for use in the development of cowpea markers, and they have pre-recorded, agro-morphological attributes that can qualify them for the improvement and characterisation of other genotypes that are not genetically screened in the ARC-GC germplasm collection.

Figure 1 .
Figure 1.Cluster dendrogram (UPGMA) illustrating the genetic distances of 92 cowpea genotypes.Clusters are illustrated with pale orange, red and green colours.

Figure 2 .
Figure 2. Principal component analysis illustrating the genetic populations of 92 cowpea genotypes.The clustering of genotypes is illustrating by pale orange, red and green colours.

Table 1 .
List of cowpea genotypes used for DArTseq genotyping and genetic diversity analysis.