Conversion and Validation of Uniplex SNP Markers for Selection of Resistance to Cassava Mosaic Disease in Cassava Breeding Programs

Cassava mosaic disease (CMD) is a major viral disease adversely affecting cassava production in Africa and Asia. Genomic regions conferring resistance to the disease have been mapped in African cassava germplasm through biparental quantitative trait loci (QTL) mapping and genomewide association studies. To facilitate the utilization of these markers in breeding pipelines to support selections, proof-of-concept technical and biological validation research was carried out using independent pre-breeding and breeding populations. Kompetitive Allele-Specific Polymerase Chain Reaction (KASP) assays were designed from three single nucleotide polymorphism (SNP) markers linked to a major resistance locus on chromosome 12 (S12_7926132, S12_7926163) and a minor locus on chromosome 14 (S14_4626854). The designed assays were robust and easy to score with >99% genotype call rate. The overall predictive accuracy (proportion of true positives and true negatives) of the markers (S12_7926132 and S14_4626854) was 0.80 and 0.78 in the pre-breeding and breeding population, respectively. On average, genotypes that carried at least one copy of the resistant allele at the major CMD2 locus had a significantly higher yield advantage. Nevertheless, variation was observed in prediction accuracies for the major locus (S12_7926132) among sub-families from the two populations, suggesting the need for context-specific utilization, for example, by screening for co-segregation of favorable SNP alleles with resistance in the parents being used for crosses. Availability of these validated SNP markers on the uniplex KASP genotyping platform represents an important step in translational genetics toward marker-assisted selection to accelerate introgression of favorable resistant alleles in breeding populations.


Introduction
Cassava (Manihot esculenta Crantz) is a vegetatively propagated staple crop of great economic importance. More than 800 million people derive the bulk of their dietary energy requirements from cassava every day, and over 500 million of them live in sub-Saharan Africa [1]. Total cassava production in Africa accounts for more than half of and the functional allele at the underlying trait locus [27] and a consistent allelic effect in different genetic backgrounds.
Cassava breeding is a slow and costly process due to its annual growth cycle and low multiplication rate, thereby hindering field phenotyping [30]. MAS has been proposed to overcome some of these challenges through the following: (i) quick elimination of genotypes with unfavorable alleles at the early stage of the selection scheme, thus reducing the number of genotypes requiring field-testing for more complex traits; (ii) selection of genotypes carrying resistant alleles in the absence of the pathogens or vectors; (iii) rapid introgression of resistant genes into existing cassava clones, in places where the disease has recently spread to, for example in Southeast Asian countries [31]; (iv) early selection of traits that are measured at the later developmental stage of the crop; and (v) identification of genotypes that are homozygous or heterozygous for the favorable alleles. Despite the promise from MAS, its use in breeding has been limited, partly due to the delay in marker conversion and validation [32]. The objective of the present study was first, to convert the CMD-resistance-linked SNPs to uniplex allele-specific PCR assays and secondly, to validate trait predictions using newly generated seedlings derived from breeding and prebreeding populations.

Background to Marker Discovery
The markers validated in the present study were derived from Rabbi et al. [25]. In brief, a genome-wide association study (GWAS) was carried out using a population of 5160 cassava clones from the International Institute of Tropical Agriculture (IITA) breeding program. The population was genotyped using genotyping-by-sequencing (GBS) at 100,000 SNPs and phenotyped for several traits, including CMD resistance from 2013 to 2016. The GWAS analysis uncovered a major locus for CMD resistance on chromosome 12 and two minor peaks on chromosome 14. The major peak on chromosome 12, which co-locates with the CMD2 locus is tagged by markers S12_7926132 and S12_7926163. The two SNPs on chromosome 12 are completely linked (linkage disequilibrium, r 2 > 0.98) due to their close physical proximity (31 bp apart). Marker S14_4626854 tags one of the minor peaks on chromosome 14. The significant trait-marker associations from the mixed linear model and genomic region combinations extracted from Rabbi et al. [25] are provided in Table 1. Table 1. Summary statistics of selected resistance-linked single nucleotide polymorphism (SNP) markers from genome-wide association study (GWAS) analysis.

Development of Allele-Specific PCR Markers Linked to CMD Resistance
To facilitate MAS for CMD resistance, the markers from GWAS were converted to uniplex allele-specific PCR assays. These are more suitable for a MAS that requires a large number of accessions to be genotyped using one or few markers. One hundred base-pairs sequences flanking the top SNP markers were extracted from the cassava reference genome (v6.1) ( Table 2). To ensure locus specificity of the PCR assays, a nucleotide-nucleotide BLAST (basic local alignment search tool) search against the genome was done [33]. The sequences uniquely matched their target regions except for SNP S12_7926163, which had an additional but shorter hit on the same chromosome (E-Value 4.00 × 10 -6 , bit score = 54.7) (Supplementary Table S1). The Kompetitive allele-specific PCR (KASP) primers were designed using a proprietary Kraken™ software system from LGC Biosearch Technologies (Hoddesdon, UK) with the default parameters. The technical performance of the designed SNP assays, including call rate, genotype scoring clarity, and performance under varying DNA concentrations, was assessed using a panel of 188 diverse cassava genotypes. The performance of the KASP assays was assessed using breeding and prebreeding populations evaluated in the early stages of selections, seedling nursery (SN), and first clonal evaluation trials (CET) [30]. These populations are independent of the population used for the GWAS discovery of the markers used in this study.
The breeding population was part of IITA's regular recurrent selection breeding pipeline and had been derived from controlled crosses among elite genotypes in 2018. The SN trial consisting of 3531 progenies from 74 families (mean family size of 48, ranging from 5 to 243) was established in January 2019 in Ibadan, Nigeria. A CMD-susceptible clone (TMEB117) was planted as a spreader row around and among the seedlings to ensure sufficient exposure to cassava mosaic virus (CMV). The SN trial was planted at a spacing of 1 m × 0.25 m and harvested 10 months after planting (MAP); a selection of 350 genotypes (around 10% of the total) was advanced to CET at the same location. The selection was based on plant vigor and root yield. Susceptibility to CMD in the SN was not used as a selection criterion to retain variation for the trait at the CET stage. The CET, carried out between November 2019 and September 2020, was established using an incomplete block design with a spacing of 1 m between rows and 0.5 m within rows. Seven checks (TMEB419, TMEB693, IITA-TMS-30572, IITA-TMS-1KN130010, IITA-TMS-IBA000070, TMS14F1285P0006, and TMS13F2207P0001) were planted in each of the 10 subblocks, making a total of 420 plots.
The prebreeding population was derived from open-pollinated crosses between exotic and African cassava germplasm. The exotic progenitors were from International Center for Tropical Agriculture (CIAT), and the African germplasm was from IITA. The objective of these crosses was to develop germplasm incorporating resistance to CMD, high content of provitamin A and starch, and tolerance to acid soils and drought. An SN trial of 5608 genotypes from 353 full and half-sib families was established in February 2018 in Ibadan, Nigeria at a spacing of 1 m × 0.25 m. The mean family size was 16, ranging from 1 clone to 165 clones. Variety TMEB117 was also planted as CMD spreader rows in the trial. After harvest at 10 MAP, a subset of the seedlings consisting of 790 accessions based only on vigor was selected to ensure variation for CMD severity. The selected genotypes were advanced to a CET and established in Ikenne, Nigeria using an incomplete block design (18 sub-blocks with 50 plots each) along with four checks (TMEB419, IITA-TMS-IBA30572, IITA-TMS-IBA070593, and IITA-TMS-IBA000070). The trial was planted in 2018 and harvested in 2019. The two locations-Ibadan (7 • 24 N, 3 • 54 E; 200 m above sea level) and Ikenne (6 • 52 N 3 • 42 E; 61 m above sea level)-were selected for the trials because of the high pressure from CMV. All field management practices were performed according to technical recommendations and standard agricultural practices for cassava [34,35].

Phenotyping
The SN is the first stage of phenotyping progenies newly generated from crosses. Seeds are usually pregerminated in pre-nurseries before being transplanted to the field. This is the early stage where plants are exposed to prevailing pests and diseases for the first time in their 10 to 12 months of growth. The peaks of incidence and severity of the disease usually occur at around 3 to 6 MAP during the rainy season. Due to the stochasticity of CMV vector (white-fly) in foraging and the related transfer of CMV, plants can escape infection at the seedling stage. In addition, plants that have been infected late in the season would fail to show symptoms. Most of the susceptible plants that may have escaped CMD infection or failed to express symptoms at the SN stage generally show disease symptoms at CET.
Severity scores for CMD were recorded from 1 to 6 MAP at a month's intervals at the SN of the prebreeding population and at 3 MAP for the breeding population. In addition, at the CETs of the two populations, the genotypes were scored for CMD severity on a plot-basis at 1, 3, and 6 MAP on a scale from 1 (no symptoms) to 5 (severe symptoms). The score was based on the maximum severity observed for the plot. At harvest, the genotypes were evaluated on a plot-basis for yield and yield component traits that included the number of marketable storage roots, fresh root weight, and shoot weight in kilogram.

Genotyping
Leaf samples were collected from vigorously growing plants at the seedling stage in both breeding and prebreeding seedling trials. From each plant, three-leaf discs of 6 mm diameter were freeze-dried for at least 72 h and genotyped with three markers (S12_7926163, S12_7926163, and S14_4626854) linked to CMD resistance (Table 2) using KASP assay at Intertek Laboratory, Australia. Two nontemplate controls (NTC) were included in each plate. The protocols for the preparation and running of KASP reactions are provided in the KASP manual [36]. In brief, genotyping was carried out using the highthroughput PCR SNPline workflow using 1 µL reaction volume in 1536-well PCR plates. The KASP genotyping reaction mix comprises three components: (i) sample DNA (10 ng); (ii) marker assay mix consisting of target-specific primers; and (iii) KASP-TF TM Master Mix containing two universal FRET (fluorescence resonant energy transfer) cassettes (FAM and HEX), passive reference dye (ROX™), Taq polymerase, free nucleotides, and MgCl 2 in an optimized buffer solution. The SNP assay mix is specific to each marker and consists of two Kompetitive allele-specific forward primers and one common reverse primer. After PCR, the plates are fluorescently read, and allele calls are made using KRAKEN TM software.

Data Analysis 2.4.1. Phenotypic Data Analysis
A linear mixed model was used to obtain the best linear unbiased predictions (BLUPs) for each genotype in the CETs of prebreeding and breeding populations. The model was fitted using the lme4 package [37] in R software version 4.0.3 [38]. Checks were considered as fixed effects while accessions, and blocks were considered as random effects. The mathematical model for the incomplete block design by Kling [39] is as follows: where Y ij is the vector of phenotype data, µ is the grand mean, β is the block effect, c j is the check effect, τk (i) is the accession effect, and ε ij is the residual term. Broad-sense heritability for CMD severity score at 3 months, root number, and root weight for the two populations was calculated using the formula below: where H 2 is the broad-sense heritability, σ 2 g is the variance component for the genotype effect and σ 2 e is the variance component for the residual error. Pairwise correlation analysis of the traits was determined using the corr.test function in the psych R package [38] to assess the relationship between CMD severity scores at various time intervals in the SN and CET as well as between CMD severity scores and yield-related traits. BLUP estimates were used for the CMD severity score and yield-related traits in the CETs.

Marker Prediction Analysis Using Logistic Regression
The major CMD resistance locus on chromosome 12 is known to confer a dominant type of resistance [17,18]. Clones carrying at least one copy of the favorable allele ("T") at SNP S12_7926132 are expected to be resistant (score 1 on the disease severity rating scale). To assess the marker's performance marker in predicting resistance or susceptibility, the phenotype was converted into a binary variable (either affected or unaffected). Individuals with a categorical CMD severity score greater than 1 were classified as affected; all others were classified as unaffected. Prediction analysis was carried out using binary logistic regression as implemented in the R package tidymodels [40]. The data were divided into a training set and a testing set at a ratio of 3:1 based on the binary variable. For the training set, bootstrapping was carried out to create resamples for model validation. The markers and families were considered as independent variables. The mathematical model is as follows: where p i is the probability that a genotype is resistant or susceptible, n is the number of CMD-resistant markers integrated into the model, α is the intercept constant, f i is the family effect, β 1 , β 2 . . . β n are the coefficients for the markers 1, 2, . . . n, and x 1,i , x 2,i . . . x n,i is the value for the markers 1, 2, . . . n for genotypes i. Area under the curve (AUC) values of the receiver operator characteristic (ROC) curve were used as a single measure that summarizes the discriminative ability of the markers. The ROC plots sensitivity (true positive rate) against 1-specificity (false positive rate) and was constructed using the predictive probability as a covariate.

Within-Family Prediction Analysis
To understand the performance of the CMD2-linked SNP (S12_7926132) in different genetic backgrounds, a within-family prediction analysis was conducted using linear regression for the SN stage data that considered only families with more than 20 genotypes. The linear regression was then performed using the lm function in R. Marker alleles (TT, TG, and GG) of S12_7926132, and the observed CMD severity scores were respectively considered as independent and response variables.

Estimation of Biological Metrics
Using a confusion matrix, several performance statistics were estimated to determine the ability of the markers to predict the response of genotypes to CMD (resistance or susceptibility). These included accuracy (ACC, the proportion of correctly predicted genotypes, as either resistant or susceptible); false-positive rate (FPR, the proportion of genotypes diseased although predicted to be resistant); and false-negative rate (FNR, the proportion of genotypes resistant although predicted to be susceptible); and these statistics were calculated using the formula below: False positive rate, FPR = FP FP + TN False negative rate, FNR = FN FN + TP (6) where FP = false positive, TN = true negative, FN = false negative, TP = true positive.

Phenotypic Variation for Resistance to CMD
The frequency distribution of CMD severity scores in the two populations revealed, as expected, a bimodal pattern with two peaks consisting of no symptoms and varying degrees of symptoms ( Figure 1). In the breeding population, more than 65% of the genotypes evaluated at SN and CET showed resistance to CMD (Supplementary Figure S1); this expression may be linked to their progenitors. In the CET of the same population, the number of genotypes (77%) that showed resistance at 6 MAP increased, compared to the number observed at 1 MAP (65%) and 3 MAP (72%), indicating that some genotypes recovered from CMV infection (Supplementary Figure S1). In the prebreeding population, most of the genotypes evaluated at the SN stage started with no symptoms of CMD (severity score 1), but as their exposure to CMV increased over time through whitefly vectors, other classes of CMD severity scores (2-5) also increased (Supplementary Figure S2). Most of the susceptible plants showed symptoms at 6 MAP. The disease progression between the seedlings derived from Africa and those with Latin American progenitors was compared (Supplementary Figure S3). The impact terms of incidence and severity on the half-sibs from Latin-American progenitors were much higher than on those from African progenitors.
Broad-sense heritability of root number, root weight, and CMD severity score in the CET trials were 0.21, 0.33, and 0.90, respectively, in the breeding population, and 0.44. 0.46, and 0.84, respectively, in the prebreeding population ( Table 3).
The CMD severity scores recorded at SN and CET were positively correlated in both breeding and prebreeding populations (Pearson's r > 0.5, Table 4). A significant positive correlation was also observed between CMD severity at 1 and 3 MAP for the CETs of the two populations. A significant negative relationship was observed between disease severity and root number as well as with root weight in the CETs. However, the magnitude of the correlation coefficient was higher in the prebreeding population. Table 3. Broad-sense heritability calculated on a mean plot basis for cassava mosaic disease (CMD) severity score, root number, and root weight in the clonal evaluation trial (CET).

Breeding Population
Pre-Breeding Population  Broad-sense heritability of root number, root weight, and CMD severity score in the CET trials were 0.21, 0.33, and 0.90, respectively, in the breeding population, and 0.44. 0.46, and 0.84, respectively, in the prebreeding population (Table 3). Table 3. Broad-sense heritability calculated on a mean plot basis for cassava mosaic disease (CMD) severity score, root number, and root weight in the clonal evaluation trial (CET).

Breeding Population
Pre-Breeding Population   The resistance-linked SNP markers in Table 2 were successfully converted to allelespecific PCR assays. These markers were shown to have a high call rate and scoring clarity at both the technical validation stage (data not shown) and in the genotyping of samples from the present study (Supplementary Figure S4). For each marker, we observed three distinct clusters: favorable homozygous genotypes, unfavorable homozygous genotypes, and heterozygotes. Due to the close physical proximity between the two SNPs on chromosome 12, and the resulting strong linkage disequilibrium (r 2 > 0.98), only the marker S12_7926132 was used for the downstream analysis. This SNP marker is close to a peroxidase gene (PEX22), the hypothesized resistant gene at the CMD2 locus [25].
The frequencies of the favorable alleles S12_7926132 (T) and S14_4626854 (A) were higher in the breeding population than the pre-breeding population at the SN stage ( Figure 2). For marker S12_7926132, 9.9% and 5.8% had genotype TT, 56.7% and 40.4% had genotype TG, and 32.6% and 53.4% had genotype GG. For marker S14_4626854, 14.8% and 1.1% had genotype AA, 31.1% and 23.7% had genotype AG, 53.6% and 74.9% had genotype GG in the SN of the breeding and prebreeding population, respectively (Supplementary Table S2). We also observed an increase in frequency from SN to CET for both populations (Figure 2).

Marker Effects on CMD Resistance
Marked differences were observed in the allele substitution effects of the resistancelinked markers on the degree of resistance and susceptibility, particularly at the CMD2 locus on chromosome 12 ( Figure 3). Accessions with genotypes TT and TG had low CMD severity scores in the breeding and prebreeding populations at SN and CET stages. The median value of the accessions with at least one copy of resistance allele (TT or TG) in the SN trials suggests the dominant mode of action of the CMD2 locus. Nevertheless, between 19 and 34% of genotypes carrying the resistant allele in the SN of the breeding and prebreeding populations showed CMD symptoms, indicating that the favorable SNP allele may not be linked to the functional resistance allele. In the prebreeding population, the effect of the marker on chromosome 14 revealed that genotype GG was associated with susceptibility while genotypes AG and AA accounted for resistance. However, there was no difference between genotypes carrying the resistant and susceptible alleles in the CET of the breeding population.

Marker Effects on CMD Resistance
Marked differences were observed in the allele substitution effects of the resistancelinked markers on the degree of resistance and susceptibility, particularly at the CMD2 locus on chromosome 12 ( Figure 3). Accessions with genotypes TT and TG had low CMD severity scores in the breeding and prebreeding populations at SN and CET stages. The median value of the accessions with at least one copy of resistance allele (TT or TG) in the SN trials suggests the dominant mode of action of the CMD2 locus. Nevertheless, between 19 and 34% of genotypes carrying the resistant allele in the SN of the breeding and pre-breeding populations showed CMD symptoms, indicating that the favorable SNP allele may not be linked to the functional resistance allele. In the prebreeding population, the effect of the marker on chromosome 14 revealed that genotype GG was associated with susceptibility while genotypes AG and AA accounted for resistance. However, there was no difference between genotypes carrying the resistant and susceptible alleles in the CET of the breeding population.

Effect of Resistance-Linked Alleles on Yield Traits
The negative relationship observed between CMD severity and yield-related traits (root weight and root number) led to the use of a pairwise t-test to compare the differen genotypic classes at the chromosome 12 marker. The average root yield of clones with genotype TT (1.46 ± 0.85) and TG (1.46 ± 0.82) was significantly higher than in those with two copies of the susceptible allele GG (1.00 ± 0.67) for marker S12_7926132 in the CET of the breeding population ( Figure 4). Similarly, clones carrying one or two copies of the resistance allele had an average higher root yield in the prebreeding population than those homozygous for the non-resistance allele (TT, 2.39 ± 1.68; TG, 2.90 ± 2.10; and GG, 1.88 ± 1.70). On the other hand, clones with one copy of the resistant allele linked to S14_4626854 had an average higher root yield than the homozygotes in the pre-breeding population There were no marked differences between genotypic classes of the same marker for roo weight per plant in the breeding population ( Figure 4).

Effect of Resistance-Linked Alleles on Yield Traits
The negative relationship observed between CMD severity and yield-related traits (root weight and root number) led to the use of a pairwise t-test to compare the different genotypic classes at the chromosome 12 marker. The average root yield of clones with genotype TT (1.46 ± 0.85) and TG (1.46 ± 0.82) was significantly higher than in those with two copies of the susceptible allele GG (1.00 ± 0.67) for marker S12_7926132 in the CET of the breeding population ( Figure 4). Similarly, clones carrying one or two copies of the resistance allele had an average higher root yield in the prebreeding population than those homozygous for the non-resistance allele (TT, 2.39 ± 1.68; TG, 2.90 ± 2.10; and GG, 1.88 ± 1.70). On the other hand, clones with one copy of the resistant allele linked to S14_4626854 had an average higher root yield than the homozygotes in the pre-breeding population. There were no marked differences between genotypic classes of the same marker for root weight per plant in the breeding population (Figure 4).

Population-Level Marker Performance
To assess the performance of markers at the population level, we carried out binary logistic regression. Mean prediction accuracy from the training set bootstraps was 76% in the prebreeding population and 80% in the breeding population ( Table 5)

Population-Level Marker Performance
To assess the performance of markers at the population level, we carried out binary logistic regression. Mean prediction accuracy from the training set bootstraps was 76% in the prebreeding population and 80% in the breeding population ( Table 5). The model's AUC values were 0.80 for the training set of the breeding population and 0.82 for the prebreeding population. The mean prediction accuracy and AUC values were approximately similar in the testing set for both populations. Sensitivities and false-positive rates (1 minus specificities) of the marker predictions and observed CMD scores in the two populations are shown in ROC curves (Supplementary Figure S5). Overall, the markers in the breeding population performed better in predicting resistance (84% accuracy) than susceptibility (67% accuracy) ( Table 6). In the prebreeding population, 71% resistance and 85% susceptibility were predicted.   Overall, the markers in the breeding population performed better in predicting resistance (84% accuracy) than susceptibility (67% accuracy) ( Table 6). In the prebreeding population, 71% resistance and 85% susceptibility were predicted.

Performance Metrics of Marker S12_7926132 within the Families
In addition to the population-wide metrics, we also assessed marker performance at the family level. For this analysis, we considered only the major locus on chromosome 12 and the SN data. Within-family marker-trait regression was significant in the majority of families from the breeding population (70%) relative to the prebreeding population (40%) (Figure 5a). The effect size of the resistant allele varied among families. Similarly, withinfamily prediction accuracy, as well as false-positive statistics were relatively superior for the breeding population (accuracy greater than 0.75 and false-positive rate below 0.20) than in the prebreeding population (Figure 5b). The families with the lowest accuracies and high false-positive rates in the breeding population share common male parents. For the prebreeding population, not all families with low accuracies reveal common underlying relationships. Although the progenitors of these families carried a copy of the favorable SNP allele, this allele did not co-segregate with the resistant phenotype, particularly for families derived from Latin-American progenitors. This suggests that the SNP marker may pre-date the emergence of the causal resistant gene found in the Africa cassava germplasm. The lower accuracy in the prebreeding population could also be explained by the half-sib family structure resulting from random pollination. Each half-sib family is derived from different male parents, each of which may or may not be carrying the functional resistance gene at the linked SNP marker.

Discussion
Next-generation sequencing (NGS) has emerged as a powerful tool to detect DNAsequence polymorphism-based markers and is becoming an important tool for nextgeneration plant breeding [41]. The availability of NGS technologies and high-throughput genotyping platforms has enabled the construction of high-density genetic linkage maps [42] and the identification of SNPs associated with CMD resistance [24], quality traits such as provitamin and dry matter contents [43,44], several agronomic traits [45], stress-related, quality, and agro-morphological traits [25] in cassava. Despite the discovery of numerous QTLs linked to key traits, the evidence for the usefulness of the markers tagging these loci in cassava breeding evaluation program is scarce.
The use of trait-linked SNP markers in MAS allows breeders to preselect genotypes that combine desired traits for subsequent field evaluation and to screen segregating populations rapidly, thereby reducing the size of phenotyping trials with associated cost reductions. Markers linked to CMD resistance can be useful for preemptive breeding or the rapid transfer of resistant alleles into elite and adapted genetic backgrounds where new disease outbreaks have been reported. In this study, uniplex SNP markers associated with CMD resistance were developed and validated in two independent cassava populations. In comparison to a fixed SNP array, the uniplex PCR assay offers greater cost-effectiveness and flexibility in terms of genotyping different combinations of sample numbers and markers [46][47][48][49][50]. The overall performance of the KASP assay was outstanding from the robustness and ease of scoring of the marker genotype classes.
Technical and biological validation is essential to assess the reliability and accuracy of markers linked to the trait of interest as well as to establish their utility for practical applications in plant breeding [51][52][53]. To ensure a reliable and unbiased estimate of marker performance, validation was carried out using two independent populations. The first was a breeding population from IITA's regular recurrent selection pipeline; the second was a prebreeding population consisting of progenies from the intercross between exotic progenitors from American and African varieties.
Area under the curve (AUC) is a useful statistic in measuring the accuracy of the markers fitted using a logistic regression model in predicting resistance or susceptibility to CMD. The AUC values range from 0 to 1, where a value of 0.5 indicates model accuracy no better than random and a value of 1.0 indicates a perfect model fit [54]. The value of AUC in the training and testing sets of the two populations was greater than 0.7, indicating that the markers had a good discriminatory ability. Similar prediction accuracies and AUC values of the testing and training sets across the populations indicated stability in the model to predict independent data and lack of overfitting. The performance of marker S12_7926132 in diverse families and different genetic backgrounds observed in the present study indicated that this marker can be deployed for marker-assisted selection in breeding programs targeting CMD resistance. However, it should be noted that the marker did not perform equally well in all families that segregated for the favorable SNP allele. For example, in the breeding population, families 54,182,193,253,394, and 397 had nonsignificant marker effects and low accuracy or high false-positive rates which could result from related parental clones. The ineffectiveness of the marker in these families could be due to recombination events between the favorable marker allele and the QTL; the presence of a functional resistant allele that arose in a haplotype that is already common in cultivated cassava germplasm; the presence of an inhibitor gene hindering the expression of CMD2 gene; or clones in the populations that might have a different source of favorable allele available for exploitation by breeding programs [51,55]. The moderate significance of marker S14_462684 in the two populations could be attributed to the moderate SNP effect, low frequency of the favorable allele, as well as the epistatic effect of the major locus [23,25].
Due to the dominant nature of resistance at CMD2 locus, the presence of one or two copies of these alleles in any genotype should confer resistance to CMD. In the study populations, a bimodal distribution was observed consisting of the resistant group (disease score 1) and the susceptible group (score 2 to 5). There was a normal distribution within the susceptible group with a few individuals showing scores of 2 or 5 and more showing scores of 3 and 4. The variation in the degree of susceptibility could be due to differences in viral loads within the plots, time since initial infection, presence of mixed infections from different CMV strains, or other genetic factors related to fitness and background immunity [56][57][58].
While CMD2-linked markers can be used to increase the frequency of the favorable allele in cassava breeding germplasm, breeders should be cautious of the possibility of driving the chromosome 12 region to fixation. This can reduce diversity around the CMD2 locus genomic region which may affect the genotype's fitness particularly if the resistant haplotype is linked to unfavorable alleles at nearby genes [59,60]. Furthermore, planting varieties of cassava with exclusive dependency on a single dominant gene might lead to a breakdown in resistance [18,61]. This threat necessitates the pyramiding of additional sources of resistance including polygenic resistance that are known to be more durable [17,18,61]. Increasing the frequencies of favorable alleles at quantitative resistance can be achieved through genomic selection, which is better at handling highly polygenic traits [24].

Conclusions
The technical and biological performances of two KASP markers were assessed on major and minor loci linked to CMD resistance in two independent cassava populations. KASP marker (S12_7926132) linked to the CMD2 predicted the resistance or susceptibility of new seedlings with reasonable accuracy either at the population or family level. In addition, selection for the resistant allele linked to this locus would increase yield by an average of at least 27% over genotypes with the susceptible allele, thereby mitigating the economic impact of CMD. Following successful conversion and validation, the CMD resistance-linked marker (S12_7926132) can be integrated into the breeders' MAS toolbox and can be used to increase screening capacity at the early stages of selection.
Supplementary Materials: The following are available online at https://www.mdpi.com/2073 -4395/11/3/420/s1, Figure S1: Frequency distribution of cassava genotypes for CMD severity score in the SN and CET of the breeding population, Figure S2: Frequency distribution of cassava genotypes for CMD severity score in the SN and CET of the prebreeding population, Figure S3: Frequency distribution of cassava genotypes for CMD severity score in the SN of the prebreeding population, Figure S4: Allele discrimination plots from KASP genotyping for SNP markers linked to resistance to CMD (S12_7926132, S12_7926163, and S14_4626854) in the SN of the breeding and the pre-breeding population, Figure S5: Receiver operating characteristic (ROC) curve of the truepositive rate against the false-positive rate for the training and testing sets in the (a) breeding population and (b) prebreeding population, Table S1: BLAST results of the flanking sequences of the three SNP markers on the cassava reference genome, Table S2: Number of predicted genotypes for markers linked to resistance (S12_7926132 and S14_4626854) in the seedling nursery of breeding and prebreeding populations.  Data Availability Statement: The phenotypic and SNP data that supports this research is openly accessible on the cassava breeding database (www.cassavabase.org (accessed on 21 February 2021)) and can be downloaded from ftp://ftp.cassavabase.org/manuscripts/Ige_et_al_2021/ (accessed on 21 February 2021).