Genomics-Assisted Breeding in the CGIAR Research Program on Roots , Tubers and Bananas ( RTB )

Breeding in the CGIAR Research Program on Roots, Tubers and Bananas (RTB) targets highly diverse biotic and abiotic constraints, whilst meeting complex end-user quality preferences to improve livelihoods of beneficiaries in developing countries. Achieving breeding targets and increasing the rate of genetic gains for these vegetatively propagated crops, with long breeding cycles, and genomes with high heterozygosity and different ploidy levels, is challenging. Cheaper sequencing opens possibilities to apply genomics tools for complex traits, such as yield, climate resilience, and quality traits. Therefore, across the RTB program, genomic resources and approaches, including sequenced draft genomes, SNP discovery, quantitative trait loci (QTL) mapping, genome-wide association studies (GWAS), and genomic selection (GS), are at different stages of development and implementation. For some crops, marker-assisted selection (MAS) is being implemented, and GS has passed the proof-of-concept stage. Depending on the traits being selected for using prediction models, breeding schemes will most likely have to incorporate both GS and phenotyping for other traits into the workflows leading to varietal development.


Introduction
Root, tuber, and banana (RTB) crops-cassava (Manihot esculenta), potatoes (Solanum spp.), sweet potatoes (Ipomoea batatas), yams (Dioscorea spp.), bananas, and plantains (Musa spp.), and tropical and Andean roots and tubers-play a very important role in food security and income, and represent significant agricultural opportunity in tropical areas of sub-Saharan Africa (SSA), Asia, and Latin America, especially for the more than 300 million growers below the poverty line [1][2][3][4].Major global constraints are endangering the important role these crops play in agri-food systems.
Population growth will more than double food demand in SSA by 2050, where people are highly dependent on RTB crops [1,5].Climate change studies project a significant continued warming trend in SSA [6,7].These changes will directly impact productivity and resilience of RTB crops, and potentially undo progress in poverty reduction and markedly increase food insecurity, especially in SSA, where RTB crops are the most important [5,8].Notably, RTB crops are generally more resilient to adverse climate conditions, for instance, cassava is projected to be less negatively affected by increased temperatures, with climate change as grain crops [9].Screening of sweet potato genetic resources has identified a few hundred accessions that can produce storage roots under high temperatures when most other sweet potato accessions cannot [5].In addition, roots and tubers are not dependent on pollination, which is the most sensitive stage to abiotic stress in grain and many horticultural crops [10], thus underscoring the food security importance of RTB crops, more so, in a climate change scenario.However, a number of studies show increased temperatures projected by climate change scenarios will negatively affect RTB crops (except for cassava, which will be more affected by long drought periods), especially during root thickening [6,9,11].
Moreover, effects of climate change on pests and diseases could be significant, as their geographic distribution and pressure on crops would change, and this would require a different and additional modelling effort [5,12].In 2015, cassava mosaic disease (CMD), the world's most devastating disease on cassava, was reported in Cambodia, with potential threats to the entire Southeast Asia region [13].In addition, prolonged dry periods affect the viability and availability of planting material for sweet potato (vines), cassava (stems), and yams (vines and tubers).
The CGIAR Research Program on Roots, Tubers and Bananas (RTB) was established to create synergy among international centers and a wide array of national and international partner institutions working on RTB crops, to increase productivity and nutritional value, enhance utilization and market access, and improve livelihoods of the producers (http://www.rtb.cgiar.org/).The participating centers, led by the International Potato Center (CIP), are Bioversity International, the International Center for Tropical Agriculture (CIAT), the International Institute of Tropical Agriculture (IITA), and the French Agricultural Research Centre for International Development (CIRAD).The program works with more than 360 other partners, to increase the benefits of their research for smallholder farmers, consumers, and other stakeholders.In many cases, especially for resource-poor farmers, providing new varieties that address the constraints mentioned above is the best and sometimes only option.Consequently, RTB breeding programs are targeting major constraints limiting yield potential, adaptation, and end-use quality.
Cheaper sequencing has opened many possibilities to apply genomics tools to advance crop-breeding programs for complex traits, such as climate resilience [14] and quality traits [15].This has led to the coining of the term "genomics-assisted breeding" [16], or "whole-genome selection" [17].Therefore, the use of advanced DNA technologies, high throughput phenotyping protocols, knowledge from other -omics approaches (e.g., gene expression via transcriptomics, protein function via proteomics, and metabolic pathways via metabolomics), allows the identification of molecular markers linked to complex traits, the dissection of genetic variability, identification of putative candidate genes, and their causative alleles for gene expression or gene function.
RTB crops are linked by common characteristics that pose special challenges for their improvement by breeding, further underscoring the potential of genomic-assisted breeding.The commercial cultivated types of potato, sweet potato, yam and banana are polyploid, and together with diploid cassava, are all highly heterozygous.Potato, sweet potato, and cassava can suffer from inbreeding depression, and self-incompatibility in potato needs to be overcome by introgressing self-compatibility genes [18,19], or chemical treatments followed by embryo rescue.This not only affects development of parental lines, but also limits the development of populations created for genetic discovery, such as introgression lines, or recombinant inbred lines used in self-pollinating crops, such as rice or tomato [20].Efforts are underway, nevertheless, to develop inbred lines for hybrid breeding in cassava and potato [18,19,21].Yam is a multispecies dioecious crop, but suffers from no to very low or erratic flower production, and difficulties with interspecific hybridization, thus affecting the breeding strategies for crosses and introgression of traits from one species to another.The multiple species structure of the RTB crops [2], especially in banana [22] and yam [4], also, to some extent in potato, makes breeding more complex, and will also require adaptation of genomics analysis software to account for these genomic complexities.
RTB crops are vegetatively propagated, which makes production and dissemination of sufficient quantity of quality propagules of breeding material for field trials problematic.This can hinder breeding programs, to maintain breeding lines, and to coordinate harvest with plantings in each breeding cycle.Low multiplication ratios slow down the bulking of material for extensive field trials [21].For sweet potato, cassava, and yam, and many banana lines, botanical seed numbers are low, thus requiring larger efforts to produce seed from crosses in a breeding program, and to generate large breeding populations.In addition, field-testing approaches, especially for banana and plantain, are highly demanding in space and time, requiring 6 m 2 per plant, 12 to 18 months for the plants to mature, and evaluation needing to be carried out over several crop cycles [23].Applying genomic tools to pare down the number of lines to test in trials can help alleviate these challenges.
Although many RTB varieties have been developed and deployed, usually having improved yields and disease and pest resistance, they are still very dependent on rates of adoption, and on end-user preferences for a wide variety of quality traits, as these staple crops are processed into various products [2].A Bill and Melinda Gates Foundatin (BMGF)-funded project began in 2018 to tackle the issues of prioritizing quality traits for fresh and major processed RTB products in SSA, and developing high-throughput phenotyping tools for use by breeders to incorporate in their RTB breeding programs (https://www.cirad.fr/en/news/all-news-items/press-releases/2018/rtbfoods).The development of genomic tools to screen for these quality traits should accelerate the breeding and enhance the adoption of varieties with the required end-user preferred traits.
In this paper, we review the status of genomic resources being developed by RTB program, and their potential use for genomics-assisted breeding in order to enhance genetic gains for various important traits to address the Sustainable Development Goals of poverty reduction, decreased hunger, and better nutrition.

Potential of Genomics-Assisted Breeding
RTB breeding program strategies rely mainly on recurrent phenotypic selection as populations are advanced, and individuals selected as superior parents to generate improved populations, from which genotypes are selected for variety testing and multiplied clonally.Due to the challenges presented above, developing and applying genomic tools to aid in the selection, especially of complex traits, would both shorten the time required for advancing material, save on expensive and/or very time-consuming phenotyping screens, and significantly reduce the field resources needed to screen material (or allow larger numbers of segregating genotypes to be evaluated).All of these would result in increased genetic gains in the breeding programs both by shortening the breeding cycle, and allowing stronger selection intensity and more precise selection [24,25].Next-generation sequencing technologies (NGS), and the development of analytical tools for assessing genetic diversity of large collections of accessions and breeding lines by SNP calling, linkage, and QTL mapping, association studies, genomic selection, and tools for visualization will allow for much finer dissection of genetic variability.They will also lead to the development of effective molecular markers for desired traits, and design of parental line selection and variety selection strategies based on genomic information that precludes expensive and/or time-consuming phenotypic selection [24,[26][27][28][29][30][31].Likewise, large collections of diverse accessions can be sequenced, and regions of introgressions and desired haplotypes identified.
In addition, advances in both technologies and analysis tools for transcriptomics, proteomics, and metabolomics enhance gene discovery.This can lead to molecular markers for causal genes, as well as providing knowledge of the processes involved in the target traits of breeding programs [32][33][34][35].The advances also contribute to elucidating mechanisms of action for resistance to pests and diseases, tolerance to abiotic stresses [36], and biochemical pathways for important traits for quality and nutrition [4], and provide information on allele-specific expression [36].

Challenges of Genomics-Assisted Breeding in RTB Crops
With the advent of ever-more affordable and powerful NGS, bioinformatic tools are constantly being developed and improved to carry out genomic analyses, that allow for the mapping of genetic variability and molecular marker discovery [37].Although there are numerous tools for analysis of genomic data and applications for marker development or genomic selection, most of these have been developed for diploid organisms.Consequently, the highly heterozygous nature of the RTB crops, in addition to the polyploid genomes in most cases, present special challenges at all stages of genomic tool development, from genome assembly and mapping of DNA sequence reads, genotyping, linkage mapping, association studies, and genomic selection.Therefore, this requires new tools to be developed, with special applications of bioinformatics, and limiting the use of excellent tools already in existence [26][27][28]30].Genome assemblies are much more fractured due to sequence variability between haplotypes [38].The many platforms available for genotyping are more amenable to distinguish homozygotes from heterozygotes in diploids, and although possible, are less exact in distinguishing the higher ploidy heterozygote classes, especially in autopolyploids [39].As the ploidy level increases, so do the number of allelic options.Consequently, it is necessary to have tools to determine the allelic dosage effects for a given locus.In addition, meiotic pairing does not always lead to random chromosomal segregation, with phenomena such as double reduction taking place [40,41].This increases the number of genotype classes that can arise.In allopolyploids, as their subgenomes come from different ancestors, the genotyping approach has to be capable of distinguishing between homologues and homeologues (different subgenomes).Therefore, much more needs to be done to fulfill the needs of applying genomic data in the RTB crops, in the context of polyploidy and high heterozygosity.

Approaches to Tackle Challenges in Polyploid Genomic Analyses
The challenges posed by polyploidy for genome assembly, is being tackled by the development and application of third generation sequencing approaches, such as PacBio and 10 X Genomics [29].These result in much longer reads (>10 kb), that, combined with approaches such as optical mapping and proximity ligation, is allowing the assembly of complex genomic regions.This will lead to the separation of homeologues, thus beginning to properly assemble polyploid genomes [29,42,43].Consequently, this will necessitate the development of corresponding analytical tools.
Genotyping in polyploid species is going ahead, and new tools and approaches are in development, such as TetraploidMap, GWASpoly, GBS-SNP-CROP, among others [31,[44][45][46][47]. SNP arrays that can distinguish allelic dosages have been developed for octaploid strawberry [48], hexaploid wheat [49], and tetraploid potato [39,50], among others.Allele dosage was inferred from Illumina Infinium 8300 Potato SNP array intensity data, and the dosage was incorporated into the linkage analysis to develop high density linkage maps in potato [41].Targeted sequencing approaches, such as PCR-GBS and Sequence Capture, can target specific genomic areas at much denser sequence coverage, thus also allowing the determination of allele dosage.
In addition, approaches to map haplotypes and carry out phasing of chromosomal segments are being developed, as recently shown for hexaploid sweet potato [47].Due to a high prevalence of polymorphic sites in the sweet potato genome, with an average of two or three sites present in a typical Illumina sequence read of 100-150 bp, it was possible to map haplotypes and phase the sequence fragments [47].An algorithm was developed to assign reads to six presumptive haplotypes.The haplotypes could be identified from combinations of alleles over several polymorphic sites connected in a read.This approach was used to grow and extend the regions, resulting in about 30% of the genome phased into six haplotypes.The authors used Illumina paired-end reads, and it will be interesting to see how this approach can be combined with third generation sequencing to more fully phase polyploid genomes [29,38].The importance of identifying the different haplotypes was reflected in the finding that they could contain different versions of a gene model, that would not be captured by the typical consensus sequence.For instance, in the reference, the gene might show up as non-functional, due to a truncation, but some of the haplotypes might contain the intact gene.However, genomes with lower heterozygosity, or those with large structural variations, such as banana [38], might not be amenable to this approach used for sweet potato.
Since the phasing allows for the separation of subgenomes, a 90K SNP array has been developed for genotyping the octaploid strawberry, where "haplo-SNPs" from subgenomes are identified from haplotypes [48], which together with pedigree analysis, was used to map QTLs in their subgenomic locations, for fruit quality traits and flowering habits [51].The analysis by Bevan et al. [42] explores using haplotypes to integrate genomics and breeding of polyploids.They suggest two approaches, one retrospective, and one prospective.In the former, key breeding lines and pedigrees, that have been phenotyped in breeding programs, are sequenced, the haplotypes are defined, and markers are phased with the chromosome homeologues associated with the haplotypes.Then, the markers are used to combine desired haplotypes by crossing the relevant lines, each with well-defined phenotypic effects.In cases where the haplotype has been broken, this allows for separating undesirable effects.This approach is, therefore, different from the typical genomic selection allowing better capture of the genetic diversity found in the polyploid genomes [42,43].In the prospective approach suggested by Bevan et al. [42], populations of progenitors and wild relatives are sequenced, in order to identify ancestral haplotigs (haplotype-specific contigs), capturing a wider range of genetic variation.This can allow the discovery of new loci associated with desired traits, defining new haplotypes with greater variation, fewer deleterious alleles, and improved phenotypes.The markers that define the relevant haplotype would then be used in the breeding program.This approach requires the screening of large numbers of progeny with several markers spanning the haplotypes, to ensure phasing and selection of the desired variants of each haplotype.Therefore, by identifying haplotypes, and screening progeny for combinations of these, this will further improve the predictive performance of models based on markers, for selection [42,43].

Role of RTB in Crop Improvement
RTB brings together scientists and breeders to create synergies and enhance the breeding efficiency throughout the program.A RTB breeding community of practice (BCoP) has been established to share tools and strategies that are applicable to the different crops, functioning in a decentralized, integrated and crosscutting manner (https://cgiar.sharepoint.com/sites/IITA/Projects/rtb/SitePages/Home.aspx).The BCoP aims to resolve the major challenges to RTB breeding research, breeding program management, and communication.The BCoP will tackle especially those challenges posed by the RTB crops (heterozygosity, polyploidy, clonal propagation, etc.) by developing cross-cutting methods, and transfer of knowledge [1].Activities include implementing strategies to accelerate genetic gains, improve the efficiency of breeding pipelines, and shorten breeding cycles.For instance, in sweet potato, an accelerated breeding scheme (ABS) has been developed and implemented, and has shortened the cycle by half [52], from 7-9 years to only 4, which can also be applied to the other RTB breeding programs.
Adoption studies of modern varieties in SSA are grounds for optimism that yield gaps that can be closed, since the adoption of cassava, potato, yam, sweet potato, and banana modern varieties (many developed or disseminated by CGIAR) was estimated to be 39.7%, 34.4%, 30.2%, 6.9%, and 6.2%, respectively, of the total area cropped [53].
A survey by CIAT of 982 cassava farmers in Vietnam, together with DNA fingerprinting, showed that 85% of the varieties grown are improved, CIAT-related ones.The variety KM419, bred by CIAT, is now the dominant variety grown in 38% of the area [54].In China, from an expert elicitation survey by CIP, it was estimated that 25% (1.2 M Ha) of the potato growing area is planted with materials derived from CIP progenitors or actual CIP varieties [54].
The Cassava Monitoring Survey (CMS) documented the extent of adoption of improved cassava varieties in Nigeria, while validating the survey results with DNA fingerprinting [55].Although household surveys suggested that 60% of households in the major cassava growing regions of Nigeria had adopted improved varieties, concomitant fingerprinting by GBS showed a similar percent of adoption, but extensive misclassification of variety names by the farmers surveyed.This affects interpreting surveys of determinants of adoption, as well as elucidates the difficulties in tracking dissemination of improved varieties.In any case, the study identified major preferred traits, such as early maturity, high yield, storage root size for production, and ability of varieties to be processed into garri or fufu/akpu, and ease of peeling (main gender-related difference for women-preferred trait) for processing, and quality traits, such as garri and fufu taste, good poundability, and palatability when boiled.An earlier variety fingerprinting by GBS study in Ghana showed similar results for lack of consistency in variety names [56].
In order to address vitamin A deficiency in SSA, CIP has promoted the adoption of improved varieties of biofortified orange-fleshed sweet potatoes (OFSP), by tackling both the breeding, seed production and dissemination, and nutrition information campaigns through the Sweet Potato for Profit and Health Initiative (SPHI, http://www.sweetpotatoknowledge.org/topics/sweetpotato-forprofit-and-health-initiative-sphi/).This has resulted in the release of over 109 varieties over the last 7 years, 76 of them OFSP.The cumulative number of beneficiaries since the launch of the platform (i.e., 2010 to date) is almost 4.3 million households, with an estimated 20 million individuals having been reached directly or indirectly with improved varieties of sweet potato.
Work by RTB banana and plantain breeders has resulted in the release of high-yielding, pest and disease-resistant varieties, that are now being tested on farmers' fields for enhancing adoption [22].In addition, IITA has released improved plantain hybrids in West and East Africa [22,57].The banana breeding communities in Uganda and Tanzania have come together under the leadership of IITA, to improve the production and productivity of cooking bananas important in this region (East African Highland Bananas-EAHB and Mchare) by breeding hybrid varieties that address major constraints in production that will result in yield increases of at least 30% (http://breedingbetterbananas.org/).
The CGIAR recently established the Excellence in Breeding Platform (www.excellenceinbreeding.org).The platform will draw from innovations in the public and private sector, provide access to cutting-edge tools, services, and best practices, application-oriented training, and practical advice to enhance the uptake of genomic tools.RTB breeding programs (both CGIAR and national programs) will access such innovations and contribute knowhow from within the program to the platform via the RTB BCoP.The CGIAR established the Gender in Breeding Initiative (GBI, http://www.rtb.cgiar.org/gender-breeding-initiative/), with the purpose of ensuring gender-responsive breeding programs, which is especially important in RTB crops, where women are usually the producers, processors, and marketers.Screening complex end-user preferred traits using genomic approaches needs to appreciate and understand significant gender differences in trait preferences, or gender equity may be adversely affected through outcomes of breeding programs.

Genomics-Assisted Breeding
RTB brings together expertise in quantitative genetics and plant breeding, genomics and bioinformatics, statistics, and phenomics, which is contributing to the implementation of genomics-assisted breeding in the RTB program.Consequently, molecular geneticists and bioinformaticians have been developing molecular and analytical tools to support the breeding programs.This includes high-density profiling of genome-wide genetic variation using SNP markers (SNP-Chip, GBS, RADseq, and DArTseq), and the development of molecular markers, thus adding value to germplasm collections, breeders' populations, and in situ diversity.In addition, the research programs are moving forward in the implementation of genomic selection (GS), including strategies for forming training populations for estimating breeding values for GS, developing improved prediction models for parent and variety selection, as well as carrying out genome-wide association studies (GWAS), and quantitative trait loci (QTL) studies, with a view towards accelerated breeding [1,25].In addition, the RTB research program is helping implement the use of high throughput phenotyping tools and strategies, such as the use of remote sensing, including remote aerial (e.g., drones, satellites) for monitoring and selecting for growth and development traits under environmental variations, such as drought or pest and disease pressure, the use of ground penetrating radar for phenotyping roots and tuber initiation and bulking, the use of near-infrared (NIRS) and X-ray fluorescence spectroscopy (XRF) for indirect determination of chemical composition, the use of electronic data collection, and image-based analysis [54,[58][59][60].Phenotyping is essential for applying genomic tools to breeding, yet the extensive work done in this area is beyond the scope of this review.

Draft Genomes
Various genomic resources have been developed or are in development.This has been enhanced by the publication of the reference genomes for the major RTB crops (or close relatives that are diploid, or haploids).In the case of the cassava, banana, yam, and potato genomes, the RTB program is an active participant.

Banana Draft Genome Sequence
The Global Musa Genomics Consortium, coordinated by Bioversity, was very active to get the banana genome sequencing funded.CIRAD led the sequencing project of the A genome of banana [61], with support of the consortium members for data analyses.Using a doubled-haploid of M. acuminata "DH-Pahang", of the subspecies malaccensis, a draft sequence of 472 Mb out of the 523 megabase genome was assembled.It was possible to anchor 70% of the assembly along the 11 Musa linkage groups, and 36,542 protein-coding gene models were identified.The genomic analysis detected three rounds of whole-genome duplications in the Musa lineage.The genome was then improved with NGS and new bioinformatic tools, in collaboration with Bioversity [62].Additional genomes from other subspecies have been sequenced through the RTB program to support discovery of new alleles and genomic-based crop breeding strategies [63].In addition, a draft sequence of the B genome of banana has been generated, showing a smaller genome than the A genome, but a similar number of predicted genes (36,638), with a high degree of sequence divergence between the two genomes [64].When mapping RNA-seq data from both AAA and AAB hybrids to both genomes, plantains (AAB genomes) showed the expected 2:1 distribution of reads across the A-and B-genomes.For the banana (AAA genomes) they found regions of significant homology to the B-genome, suggesting interspecific recombination events between homeologous A and B chromosomes in Musa hybrids.

Cassava Draft Genome Sequence
CIAT participated in the sequencing of the cassava genome, and provided a partially inbred line, AM560-2 to develop the reference genome.The cassava genome is estimated to be 770 Mb, and the sequencing project, using the 454 platform for whole genome shotgun sequencing, covered 69% of the genome, and 96% of the protein coding space, with 30,666 predicted genes [65].This was followed by SNP-based genetic linkage maps [66], so that a robust and dense genetic map was generated from 10 mapping populations that had been genotyped by GBS.The individual component maps were then integrated into a single composite framework map with the expected 18 linkage groups (LGs), which was then used to organize the v4.1 draft cassava genome sequence [65] into 18 pseudomolecules, anchoring 90.7% of the predicted genes onto LGs.The map then had an increased number of markers (>22,400), and most genes in the draft genome were now placed on respective LGs at their approximate chromosomal position, resulting in the first chromosome-scale assembly of cassava, and a map that will serve as a valuable guide for future genome assembly improvements [67].

Guinea Yam Draft Genome Sequence
IITA led the sequencing of the white Guinea yam (Dioscorea rotundata Poir.), the most popular yam species in West and Central Africa.A single genotype, TDr96_F1, which was determined to be diploid, was selected for sequencing, and this resulted in an assembled genome of 594 Mb, 76.4% of which was distributed among 21 linkage groups [68].The study predicted 26,198 genes in the genome.Using the genome sequence and bulked-segregant analysis, the authors applied a QTL-seq approach to map a sex-determination locus on linkage group 11, linked to female flowering in a female heterogametic flowering system (ZZ = male; ZW = female).This has been converted into a molecular marker, to determine the sex as a flowering habit of putative parental lines, thus allowing a more efficient planning of crossing blocks [68].The marker was validated on 54 lines, of which 93% showed association with the sex phenotype.The marker will be further validated on 190 lines so that it can be then be utilized in guinea yam breeding.

Potato Draft Genome Sequence
CIP participated in the sequencing of the potato genome as part of a large Potato Genome Sequencing Consortium [69].A doubled monoploid, Solanum tuberosum group Phureja DM1-3 516 R44, derived from tissue culture, was used to generate the reference sequence, as well as a heterozygous diploid breeding line, S. tuberosum group Tuberosum RH89-039-16, for comparison.Using whole genome shotgun sequencing with three platforms, a final assembly of 727 Mb (out of an estimated genome of 844 Mb) was generated, with 39,031 predicted genes.

Sweet Potato Draft Genome Sequence
To deal with the hexaploid genome of sweet potato, the closely-related wild diploid species Iponomea trifida, a likely ancestor of cultivated sweet potato, was sequenced [70].Although I. trifida generally exhibits severe self-incompatibility, a self-fertile line was used to develop a selfed line, Mx23Hm, for sequencing using the Illumina HiSeq platform, resulting in an assembly of 513 Mb, while that of a second line sequenced (0431-1 ITRk_r1.0) was 712 Mb.However, this genome assembly was very fragmented, making it difficult to apply to sweet potato [47].Consequently, using an approach of haplotype mapping, a hexaploid newly-bred carotenoid-rich cultivar, Taizhong6, was used for genome sequencing [47].Separate assemblies were generated for the six haplotypes.The sweet potato genome's six haplotypes are comprised of two of B1 origin, which were shown to closely resemble the diploid I. trifida haplotypes, and another four of B2 origin.The analysis showed extensive genomic shuffling events due to recombination, homologous exchange, gene conversions, and introgressions.Improvements could come from applying third-generation long-read sequencing technologies, increasing the length of the contigs and the phased haplotype blocks [38].

Genomic Characterization of Genetic Resources
An additional advantage conferred by the adoption of high throughput genotyping and phenotyping technologies, is the characterization of genetic resources in genebanks, leading the way to make use of this huge genetic potential much more effectively [15,71].By applying genomic predictive approaches, the accessions can be rapidly screened for new traits, and genetic diversity analysis can aid in developing diverse gene pools for target traits and selection of diverse parental materials to maximize heterosis in crossing designs.
In Phase I of the RTB Program (2011-2016), huge strides were made in the sequencing (usually by GBS or Rad-Seq) of RTB accessions in genebank collections and breeding populations.Over 2500 elite breeding lines and landraces maintained in the cassava collections of CIAT and IITA, including 21 wild species, have been sequenced, and an additional genotyping of 1045 clones from Tanzanian breeding programs is underway [54,67].The Tanzania breeding germplasm contains critical sources of resistance to cassava brown streak virus.At the CIP genebank, the entire cultivated potato collection has been genotyped with the potato SNP chip (~4400 accessions).In addition, the entire sweet potato collection (~6000 accessions) has been genotyped with 20 SSR markers and DArTseq.At IITA, 942 Dioscorea rotundata and 100 D. alata accessions, along with a number of other cultivated and wild yam species, have been sequenced by GBS ( [54]; R. Bhattacharjee, personal communication).In addition, in collaboration with IBRC, Japan, a diversity panel of 310 accessions and 156 biparental mapping population progenies of D. rotundata have been genotyped with whole genome resequencing (WGRS).GBS and DArT sequencing have been carried on different biparental mapping populations (five D. alata and one D. rotundata mapping populations).Additionally, 284 (223 D. rotundata, 12 D. alata, 22 interspecific, 12 D. dumetorum, five D. cayenensis, and 10 D. prahensilis) genotypes have been genotyped with 192 Kompetitive Allele Specific PCR (KASP)-SNPs to develop a SNP-chip of 48 polymorphic SNPs for routine use in yam breeding program.The largest international ex situ collection of banana germplasm is located at the International Transit Centre (ITC) in Leuven, Belgium, which is maintained by Bioversity International.There are 1518 accessions in the collection, with a good overall coverage of the groups of cultivated bananas.Of these, 630 accessions were genotyped with SSR markers [72], and this information can be accessed in the Musa Germplasm Information System (MGIS) platform [71].
In an effort to enhance utilization of genomic resources for breeding, databases have been developed to both capture the genomic and phenotypic information, and apply bioinformatic and display tools for use by breeders.Harmonization of trait descriptions into ontologies, to facilitate and make both databasing and electronic data capture more effective, is ongoing by an ontology community of practice (http://www.cropontology.org/) and being incorporated into the databases.Database development, in conjunction with the Boyce Thompson Institute at Cornell University, has resulted in databases for cassava, banana, yam, and sweet potato (RTB bases-http://www.rtbbase.org).Through these databases, it is possible to access genomic and phenotypic information, link to pedigrees, and manage breeding trials.These complement some databases pursuing the same objectives with genetic resources in genebanks, such as MGIS [71], which allows germplasm collection examination, accession browsing, including genotyping studies and associated datasets, and orders for germplasm.MGIS is also interoperable with complementary databases, like the Banana Genome Hub and the MusaBase breeding database.The GOBII project (http://gobiiproject.org/) has endeavored to make bioinformatic tools available for marker discovery and development in cereal crops, and now, also in cassava.The South Green platform (www.southgreen.fr)has developed databases for carrying out genomic studies in GWAS, as well as offering a community genomic portal which brings together genomic data under different forms, from complete genome sequence along with gene structure, gene product information, metabolic pathways, gene families, transcriptomic assays, genetic markers, as well as genetic and physical maps.This includes banana and cassava (Banana Genome Hub, Cassava Genome Hub) [73].Other tools, such as CIP's HIDAP and CIPCROSS, and IITA's YAMCROSS, attempt to bring in trial mapping and planning, so that a breeder can coordinate both the breeding activities and the genomic approaches.This has been developed in large BMGF-funded projects for cassava (NextGen cassava), yam (AfricaYam), banana (BBB), and sweet potato (GT4SP).The various bioinformatic and data management platforms now need to come together.A concerted effort is being made to develop an interface that will allow data to flow from breeding and germplasm resources information systems through the Breeding API (BrAPI) project (https://www.brapi.org/).

QTL Mapping Using Next-Generation Sequencing
NGS has been used to carry out more precise QTL mapping.A potato biparental diploid mapping population (DMDD) with 180 progenies was evaluated for drought tolerance in two tropical highland locations in Peru, one under greenhouse conditions, and the other under field conditions [74].The progenies were subjected to normal irrigation, or water was withheld about 60 days after planting.Several traits were measured, with biomass and tuber weight and a genetic map was constructed from marker data published earlier [75].Several QTLs were identified, both for morphological and physiological traits, most mapping to chromosomes 5 and 8.The strongest drought-specific QTL was for stem diameter, that mapped to chromosome 5, explaining 16.5% of the variation.
Cassava Brown Streak Disease (CBSD), caused by the viruses CSBV and Ugandan cassava brown streak virus (UCBSV) and transmitted by whiteflies, is most devastating in East Africa [76].In addition to its effect on foliage, it also affects the storage roots, creating brown corky necrotic patches, making them inedible [77].Using local East African varieties carrying strong field resistance to the viruses, QTL mapping was carried out using biparental populations.In a population with Namikonga as the CBSD-resistant parent, and Albert as the CMD-resistant parent, 240 F1 progeny were used to identify QTLs for resistance to CMD, CBSD foliar symptoms, root symptoms, or both.The analysis was based on GBS data, and phenotyping data collected over two successive years at two virus hotspot locations in Tanzania.One QTL for resistance to CBSD root symptoms and foliar symptoms was identified in chromosome 2 (qCBSDRNFc2Nm), with a second QTL affecting root necrosis only in chromosome 11 (qCBSDRNc11Nm).An additional putative QTL in chromosome 18 (qCBSDRNc18Nm) also affected root necrosis only.The QTL in chromosome 11 encompassed 27 genes, including two LRR proteins which could be involved in defense-related processes.This QTL also had the highest LOD score (7.50), explaining 17.3% of the phenotypic variation.CBSD induced root necrosis appeared to be under somewhat different genetic control to CBSD induced foliar symptoms.This was in agreement with field observations of inconsistencies between foliar symptoms and root necrosis.Two QTL were identified for CMD resistance within the previously reported range of the CMD2 locus [77].A second population with the Tanzanian variety Kiroba as the CBSD-resistant parent and AR37-80 as the CBSD-susceptible parent found QTL for CBSD foliar symptoms in both arms of chromosome 4, in chromosomes 6, 17, and 18; for root necrosis, only on chromosomes 5 and 12; and for CBSD foliar and root, symptoms in chromosome 11 (qCBSDRNFc11KR) (although on a different arm of the chromosome to that of Namikonga) and in chromosome 15 (qCBSDRNFc15K).QTL appear to be largely different between Kiroba and Namikonga, which is consistent with their genetic relationships, and offers opportunities for pyramiding QTL for enhanced resistance to CBSD [78].
Anthracnose disease caused by Colletotrichum gloeosporoides (Penz) is a devastating disease of yam, particularly water yam (Dioscorea alata L.) causing, in yield, losses of up to 90% in severe conditions [79].A number of different sequencing approaches were followed on two water yam genotypes, one resistant and the other susceptible to anthracnose [80].This resulted in the identification of both SSRs and high-quality SNPs.The 388 EST-SSRs identified have been used to generate a linkage map and identify QTL (s) for anthracnose disease (R. Bhattacharjee, personal communication) based on whole-plant assays of a mapping population of 94 progenies using the most virulent isolate of C. gloeosporoides.One QTL was consistently observed on LG 14 at position interval of 71.12-84.76cM, explaining 68.94% of the total phenotypic variation, and with a LOD score of 3.5.Additionally, the PacBio sequencing of one of the parents used in the above mapping population, TDa 95/00328, is underway, and five additional mapping populations are being characterized for anthracnose disease and genotyping by DArTseq for validation of the QTL and identification of additional QTL(s) for this disease to be used in MAS or GS.
In sweet potato, as part of the Genomic Tools for Sweet Potato (GT4SP-https:// sweetpotatogenomics.cals.ncsu.edu/),genotyping of mapping populations is ongoing.A biparental mapping population BT (315 progeny; BT = Beauregard × Tanzania) was genotyped using GBS.A genetic map with 15 base linkage groups has been developed using new linkage mapping methods developed for polyploids (D.Gemenet, personal communication).In addition, mapping of QTLs for a range of traits of interest to sweet potato (quality-related, yield and yield components, morpho-physiological, drought tolerance-related) is being currently carried out using new QTL mapping software for polyploids developed in-house (D.Gemenet, personal communication).Likewise, another population (MDP = Mwanga Diversity Panel) with about 2000 genotypes comprised of 64 families generated from crossing between parents from two pseudo-heterotic groups (8 parents from each group) is currently being phenotyped, and will be genotyped using GBS.This population, and an additional one to be assembled, will be used to train GWAS and GS models that take into account family structure.In addition, a diploid mapping population of 212 progeny from a cross of two lines of Ipomoea trifida, a relative of sweet potato, was also genotyped using GBS and genetic maps developed (D.Gemenet, personal communication).

GWAS Studies across RTB
Scientists across RTB have started to use GWAS to tag complex traits with molecular markers.As a first step, accessions have been characterized and selected to form part of diversity panels which are then genotyped (usually by GBS or DArTSeq) and phenotyped (or have historical phenotypic data).Therefore, sets of diversity panels have been established for cassava, banana, potato, and yam.These sets are now being characterized for many important traits, ranging from yield parameters, biotic and abiotic stress tolerances, and quality traits.
In banana, parthenocarpy is required in order to set fruit without pollination, and female sterility is also required, so that the fruit are seedless (or with seed, if making crosses at the diploid level).A diversity panel of 105 banana gene bank diploid accessions (27 seeded accessions and 78 unseeded) belonging to different clonal lineages, with pure Musa acuminata genetic backgrounds, both cultivated and wild types, was genotyped by GBS.The sequence reads were mapped on the M. acuminata reference genome assembly, and 129,658 SNPs were identified using TASSEL-GBS.After extensive filtering, this resulted in 5544 highly reliable markers which were used for GWAS analysis.Since parthenocarpy and female sterility are very important both for breeding and for generating seedless lines, these traits were analyzed by GWAS, by running different models, from a simple general linear model (GLM) to mixed models, taking into consideration kinship matrices and population structure.Twenty-one SNPs associated with these traits were identified, corresponding to 13 candidate genomic regions [81].From linkage disequilibrium decay analysis, with markers found, on average, every 60 kb, a window of 40 kb upstream and downstream of the SNPs was scanned to identify putative candidate genes potentially linked to the SNP.This resulted in the identification of 11 candidate genes after comparative genomic analyses.These included genes putatively involved in growth regulator signaling, gametophyte development, and an ortholog of histidine kinase CKI1, which could play a role in female sterility [81].However, the long-distance LD observed in the study, limited the resolution of the associations.The GWAS panel is now being assayed for drought tolerance over a two-year period in Arusha, Tanzania, at the IITA station, in order to develop markers for drought tolerance.Traits being assayed are related to growth parameters (A.Brown, personal communication).
Late blight caused by the oomycete Phytophthora infestans is one of the main constraints for potato production worldwide, and therefore, resistance to this disease is a must have in most of the product profiles of the potato breeding program of CIP.Quantitative inheritance, with a wider range of effectiveness against many races of the pathogen, is measured by following the progression of the disease over time, by measuring the area under the disease progress curve (AUDPC).This type of resistance does not provide immunity, but results in slower progress of the symptoms, leaving enough green leaf tissue for photosynthesis, and consequently, tuber filling, so that the yield is not compromised.CIP has developed a relative scale to measure the quantitative resistance compared to a control cultivar, thus allowing comparisons across sites and seasons [82].CIP originally developed the breeding population B3, based on this quantitative resistance, by not incorporating R genes that would mask the expression of the former.The B3 population has very high levels of quantitative late blight resistance that using the AUDPC values, ranges from approximately 0.5 to 5, when a susceptible cultivar, such as Desiree, has the value 8 or 9 using the same scale.Using historical data of resistance to LB in 195 lines derived from this population, over 6 years in two tropical highland locations, the genotypes were classified as resistant or susceptible, and if the resistance was stable over the years and locations [50].The Infinium 8303 Potato SNP Array platform [83] was used to genotype 103 of these accessions [50].Each sample was categorized into one of five possible tetraploid genotypes (AAAA, AAAB, AABB, ABBB, or BBBB).After filtering, 3192 SNPs were used for the GWAS analysis.The markers were coded as diploids, even though the initial marker identification was done at the tetraploid level.A mixed model that accounted for both population structure and kinship was utilized.Two markers on chromosome 9 and one marker on chromosome 7 had significant association with the late blight resistance phenotype.The marker c2_56418 was significantly associated with AUDPC in several locations, and was in either the homozygous (AAAA) or heterozygous (AAAC or AACC) states.The most resistant B3 genotypes carry the C-allele in this SNP, but in other potato breeding populations, late blight resistance is not associated with the C-allele, making the use of this marker challenging in other genetic backgrounds.Nevertheless, the marker is located toward the end of the long arm of chromosome 9, where a large-effect quantitative trait locus for late blight resistance was mapped [84], and recently, a homolog of the R8 resistance gene was identified in this QTL by R gene enrichment sequencing [85].A set of KASP) [86] markers has been developed for this quantitative resistance, and can be used in selecting for the highest level of resistance in the B3 population.
In addition, KASP markers closely linked to the Potato Virus Y (PVY) resistance gene, Ryadg, have been developed.Ryadg gene originated from Solanum tuberosum Group Andigena, and confers extreme resistance to potato virus Y and is localized on chromosome XI [87].This gene is an important contributor to the PVY resistance in CIP potato breeding program.A dosage-specific high-resolution melting (HRM) assay for the detection of this gene in multiplex progenitors has been developed based on the M6 marker (M Ghislain, personal communication).KASP markers were designed based on several SNPs identified between the M6 DNA sequence from a resistant potato genotype and a susceptible genotype, and can be utilized in marker-assisted selection (H.Lindqvist-Kreuze, personal communication).Normally, the PVY resistance screening is done during the first clonal generation stage by mechanical inoculation in the greenhouse followed by DAS ELISA test.Replacing this step with the KASP marker system at the seedling stage and discarding susceptible progenies will bring significant savings, enabling affordable testing of large numbers of samples.This, then, would lead to fewer cycles of recurrent selection to achieve desired combinations of traits.Nevertheless, further markers need to be developed for PVY resistance originating from other sources than Andigena.
GWAS was also performed to analyze the genetic architecture of tuber induction and bulking under warm and long daylength conditions.A diversity panel consisting of 171 CIP advanced breeding lines was genotyped with the Infinium 8303 Potato SNP Array platform [83]; Bonierbale and Mihovilovich, personal communication).The lines were phenotyped in field plots in the lowland subtropics (Lima, Perú; 12 • 3 0 S), using high pressure sodium vapor lamps to extend the photoperiod to 16 hours.Tuber induction was evaluated 40 days after plant emergence, using a single node cutting method and scoring tuber formation in the bud.Tuber bulking, and stolon length and number were determined at harvest.GWAS analysis showed significant SNPs associated with bulking on chromosomes 2 and 4, with tuber induction on chromosomes 2, 3, 4, and 5, and with stolon length on chromosomes 2, 4, 5, and 6 (Bonierbale and Mihovilovich, personal communication).
In addition, a diversity panel of 150 Andean diploid landraces (selected using 33 SSRs from a collection of 425 lines) were genotyped by GBS, and a set of 14,453 high quality SNP markers were used for GWAS (Bonierbale and Mihovilovich, personal communication).The panel was grown in two highland locations in Peru, differing in Zn and Fe content in the soil.Under the neutral pH, adequate soil zinc content of La Victoria (Huancayo), average zinc and iron concentrations in tubers were significantly higher than under the acidic pH, low soil zinc content of Huancani (Jauja).A mixed model was used that accounted for both population structure and kinship.GWAS analysis identified SNPs associated with Zn and Fe levels separately, and only in the higher Zn soil location.The SNP with the largest effect was located on chromosome 8.This SNP was associated with both iron and zinc content, suggesting the QTL is genetically linked, or a pleiotropic one controlling both traits at once.A significant SNP (S11_40562958) associated with increased vitamin C was common to both locations.Therefore, breeding and use of markers for biofortification in potato, need to take into consideration the mineral composition of the trial sites.Indeed, Zn biofortification work in cereals has shown that soil Zn fertilization increased the Zn content of the crops [88].
A set of 1973 sweet potato accessions from the CIP genebank were grown in two environments in Northern Peru, in summer under heat stress with mean soil temperature at night of 30 • C, and in winter with a mean soil temperature of 24 • C [5].The accessions were assessed for early bulking and yield-related traits.The lines were genotyped by DArTseq and are being analyzed, with the intent of identifying alleles associated with sweet potato root formation under high heat conditions that generally result in the formation of pencil roots.This can be of very high importance in breeding for climate resilient sweet potatoes in warming climates [5].
In yam (D. rotundata) a diversity panel of 310 lines consisting of landraces and breeding lines has been phenotyped for several morphological and agronomic traits, including flowering type, dry matter content, tuber shape, virus severity, anthracnose severity, and others, to establish genome-wide associations (https://africayam.org).The lines have also been subjected to whole genome resequencing with pair-end sequencing on an Illumina platform, >10 × (~6 Gb) coverage.The raw sequence data was aligned to pseudomolecules in the Guinea yam reference genome to obtain a total of 14,905,577 SNPs.In addition, metabolite profiling of 49 lines of five different Dioscorea spp.(D. rotundata, D. alata, D. dumetorum, D. cayenensis, and D. esculenta) was carried out [4].Over 200 compounds were measured in tubers, providing a major advance for the chemo-typing of this crop.Combined analysis on leaf and tuber material identified a subset of metabolites that allowed accurate species classification, and highlighted the potential of predicting tuber composition from leaf profiles [4].Additional profiling of carotenoids of these lines across five species showed variations in β-carotene content between species, with the identification of accessions rich in β-carotene which can aid provitamin A biofortification, and the tentative identification of a carotenoid which might play a role in tuber dormancy-C25-epoxy-apocarotenoid persicaxanthin [89].In addition, a total of 2000 D. alata lines from several national and international collections across different geographical areas covering Africa, Asia, Latin America and the Pacific are being genotyped by DArTseq within NSF-BREAD funded project to understand the global genetic diversity of this crop (R. Bhattacharjee, personal communication).
CMD has long been a major disease in cassava in Africa.It is caused by several related species of geminiviruses, and is transmitted both through infected planting material and by its vector, the common whitefly (Bemisia tabaci G.).Major resistance genes have been described, termed CMD1, which confers moderate, polygenic resistance, bred into the Tropical Manihot Selections (TMS) series of varieties at IITA, and CMD2, a single locus with high resistance, coming from a Nigerian landrace TMEB3 [90,91].A GWAS study was carried out on 6128 African breeding lines that had been genotyped using GBS [92].The GBS reads were mapped on the reference genome cassava version 5 (http: //phytozome.jgi.doe.gov) and 42,113 filtered SNPs were used for the GWAS analysis.The best associations were found using a multilocus mixed model (MLMM) for the mean cassava mosaic disease severity (MCMDS) to assay virus resistance over time (resistance was assessed several times during the growing season), with a major peak having the top SNP identified with other linked SNPs within 400 kb, followed by a secondary peak.This single region on chromosome 8 (containing both peaks) accounted for 30% to 66% of the genetic resistance, showing partial dominance, and 13 additional regions were identified having minor effects, which included a region that could cover the CMD1 locus.The major locus for CMD2 covers a large region (~8 Mb), and might have multiple resistance genes, leading to varying degrees of resistance between genotypes.Further analysis showed there were a number of putative candidate genes identified in this region, including two peroxidases (Cassava4.1_029175and Cassava4.1_011768)which have been shown to be downregulated with cassava mosaic virus infection in susceptible genotypes [93], and a thioredoxin which could play a role in activation of plant defenses.These are being converted to KASP markers (I Rabbi, personal communication).Nevertheless, the genome assembly still contains gaps, and the reference genotype was a South American accession, that might not possess the causal genes.
Recently, GWAS was carried out to identify markers associated with resistance to CBSD [94].Two breeding panels were genotyped for SNP markers using genotyping by sequencing, and phenotyped for foliar and CBSD root symptoms at five locations in Uganda.This GWAS study found two regions associated to CBSD; one large region of several SNPs in high linkage disequilibrium in chromosome 4, which co-localized with a Manihot glaziovii introgression segment, and one on chromosome 11, which contained a cluster of nucleotide-binding site-leucine-rich repeat (NBS-LRR) genes.Interestingly, the region on chromosome 4 was the same region identified in the Kiroba biparental mapping population.The QTL on chromosome 11 did not, however, co-localize with QTL found on the same chromosome in either the Kiroba or Namikonga populations.The GWAS study did not reveal any QTL associated with CBSD-induced root necrosis.
Dry matter is a very important quality trait in cassava, and a major breeding target in the RTB breeding programs.Efforts at developing high provitamin A cultivars have been hampered by lower dry matter contents [95].The crop's gene pool exhibits considerable natural variation for storage root carotenoids that can be tapped for breeding of biofortified varieties, with some breeding populations reported to accumulate as much as 25.8 µg/g fresh root weight [96,97].Diallel crosses showed that total carotenoids content was mainly due to additive effects [98], and negatively correlated with dry matter content.However, a rapid recurrent selection scheme applied in CIAT to increase carotene content in cassava roots, showed significant gains (from 10 ugrm/grm root FW to 25 ugrm/grm), but also increases in dry matter content, suggesting there is not a pleiotropic effect [96].However, in African germplasm, there appears to be a negative correlation.Therefore, GWAS studies were carried out to identify molecular markers associated with dry matter and beta-carotene content for cassava biofortification, and to deconstruct the reasons for this negative correlation [99].In this case, 672 clones from the IITA TMS collection (termed the Genetic Gain collection), representing landraces and advanced breeding lines selected in the breeding program over the years, were phenotyped for dry matter and yellow root flesh color over three growing seasons, and then genotyped by GBS.The reads were aligned to v. 6.0 of the cassava reference genome, and a panel of 72,279 filtered SNPs were called [99].A mixed linear model was utilized to detect associated SNPs and two major association regions occurring at 24.1 and 30.5 Mbp of chromosome 1 were identified.The first peak was tagged by two SNPs (at LD of 0.3), and the second peak by a single SNP.The same SNP for the first peak of yellowness also tagged the peak for DM content on chromosome 1, suggesting either a strong physical linkage between the genes underlying these important traits, or a pleiotropic effect.Further analysis suggested the former, as the peak region in chromosome 1 was shown to be part of a large LD block containing a large M. glaziovii introgression that commonly occurs in the Genetic Gain collection [67].In addition, a recombination spot was identified in the yellow-fleshed subpopulation, and GWAS carried out separately in the white and yellow-fleshed genotypes, which showed the dry matter association in both subpopulations (if pleiotropic, it would only would have shown the association in one of them).However, the signal association in the white subpopulation was much broader, and less strong, so this might need further examination.Candidate genes for carotenoid (PSY2-phytoene synthase) and starch biosynthesis (UDP-glucose pyrophosphorylase and SuSy-sucrose synthase) were identified in the vicinity of the major SNP for the first peak.The null allele for PSY2 results in white roots, and a functional allele in yellow roots.The authors suggest that allelic variation resulting in differential enzyme activity could affect the flux into the pathway, thus resulting in higher carotenoid content.More studies are needed to fine map this region, and identify the causal locus, and whether allelic variation at these candidate genes (or their promoter sites) lead to the phenotypic effects.

GS in Cassava
Given the complexity of the traits being targeted in the RTB breeding programs, and the length of the breeding cycles, there is great potential for applying genomic selection [25].However, as mentioned above, the polyploid nature and high heterozygosity of these crops makes the application of GS more difficult.GS is currently being implemented to shorten the breeding cycle in Nigeria and Uganda (International Institute of Tropical Agriculture (IITA), Nigeria; the National Root Crops Research Institute (NRCRI), Nigeria; and the National Crops Resources Research Institute (NaCRRI), Uganda) through the Next Generation Cassava Breeding project (http://www.nextgencassava.org) in collaboration with Cornell University.Whereas conventional phenotype-based recurrent selection conservatively takes more than four years, a much shorter breeding cycle of between one and three years has been achieved using GS.The first pilot study to assess the potential accuracy of GS was tested in an IITA breeding population that had deep historical phenotypic data.This population, referred to as the Genetic Gain collection, consisting of mostly advanced breeding lines selected and maintained since 1970, was genotyped using GBS and cross-validation analysis, carried out on 19 traits [100].The correlation between predicted and observed phenotypes ranged from 0.15 to 0.47 for 19 traits, suggesting that GS has potential to accelerate gains in cassava, and the existing training population should give a reasonable estimate of future prediction accuracies.Further assessment of the prospects of GS in cassava was reported by Wolfe et al. [101,102], using a larger population consisting of breeding populations from IITA, NRCRI, and NaCRRI.Factors tested included seven prediction models that considered different genetic architectures (additive and total genetic value), cross-validation within populations, cross-population prediction, and cross-generation prediction.The study also evaluated the impact of increasing the training population (TP) size.Cross-population accuracy was generally low (mean r = 0.18) save for cassava mosaic disease (r = 0.57).Likewise, prediction accuracy across generations was poorer than within-generation accuracy, save for dry matter content and mosaic disease severity.Increasing TP size was found to be an important factor in increasing prediction accuracy.Nonadditive dominance variance was found to be key for root yield predictions, but less so for virus resistance and dry matter content.This also confirms the results of analysis by the CIAT breeding team for Latin American accessions from diallel studies [21] The IITA breeding program has implemented four cycles of GS from 2012 to 2017, and clones from the first cycle are at advanced field testing stages of uniform yield trials and on farm testing (Ismail Rabbi, personal communication).Promising clones will be nominated for variety release.

GS in Potato
Recent research has demonstrated that genome wide prediction using molecular markers is feasible for several traits in tetraploid potato [44,103,104].GS was more precise when it was applied to individuals of the same population as the TP, and less so with unrelated individuals [103].Therefore, it was suggested to best utilize GS in potato by creating a TP of potential parents for relevant traits in a breeding program, and applying selection within that group for identifying parents for crosses.The TPs need to be large and cover a wide range of the trait, both low and high performing lines [103].Similar results were obtained for starch content and chipping quality [104].Taking into account the non-additive effects within the TP improves the prediction accuracy of the whole genotype [44].
Feasibility for applying GS for traits associated with early tuberization and bulking was tested at CIP in a diversity panel of 177 advanced breeding clones by means of cross validation (CV) prediction (Bonierbale and Mihovilovich, personal communication), by comparing CV predictions with actual phenotypic observations in several testing sets created at random from the TP.Two parametric methods, Bayesian ridge regression (BRR) and Bayesian LASSO (BL), and one semi-parametric Bayesian reproducing kernel Hilbert spaces regression model (RKHS), were used to incorporate markers into the models for GS.Prediction accuracies a bit over 0.3 were obtained for bulking ratio at 75 and 90 days after planting (DAP), stolon length, and a bit under 0.3 for marketable tuber number at 75 DAP.The predictions will be tested on a validation population in the future (Bonierbale and Mihovilovich, personal communication).As GS tools for autopolyploids continue to be improved, these can be applied to the genotyped training populations developed at CIP.

GS in Banana
Cooking banana types are a very important staple in East Africa, with average consumption in Uganda reaching 400 kg per person per year [105].They are mainly consumed locally, and play a critical role in food security and income for smallholder farmers.Due to its perennial nature, the crop is subject to severe pest and disease constraints, and vulnerable to drought periods, which are becoming more prevalent with climate change [106].Due to its long cropping cycle, and the need for large surface area for cultivation, any molecular tools that could simplify screening of genotypes is especially beneficial.Banana breeding is based on crossing tetraploids with diploids to generate seedless triploid progenies as cultivars.Parent selections can be done at the diploid and tetraploid levels for desired traits.As mentioned earlier, the vegetative propagation (of the triploids) leads to decreased genetic variability and increased susceptibility to biotic and abiotic stresses.The low fertility or sterility also creates challenges for breeding programs (hence, the GWAS work to identify markers for parthenocarpy and female sterility).
A TP for genomic selection was selected [107], comprised of 307 genotypes-mostly triploids, some diploids (both wild and improved parthenocarpic genotypes) and a few tetraploids (from EAHB triploids crossed with cv.Calcutta 4).Twelve percent were core breeding parent lines, and the rest were hybrids generated from 77 different cross combinations from the breeding programs of IITA and NARO in Uganda.The lines were established in two fields under low and high management conditions.A number of yield-related traits were measured, including growth and phenology traits, yield component traits, and resistance to major diseases, such as Black Sigatoka, as determined by leaf spotting indices.SNP markers from GBS reads for the TP were based on the latest version of the double haploid Musa acuminata cv.Pahang reference genome sequence [62].The predictive ability of six different genomic prediction models, accounting for additive and non-additive genetic effects, was evaluated.In the first analysis, ploidy was not taken into account, and 10,807 filtered biallelic SNPs (BA-SNP) were called.In a separate analysis, the ploidy level was considered together with allele dosage, generating a set of allele dosage SNPs (AD-SNP), using a workflow developed for this purpose [108], which can be applied to other polyploid species.Accounting for allele dosage within the ploidy groups (diploids, triploids, and tetraploids) reduced the number of SNP markers to 5574.The effect of allele dosage on the predictive ability was examined for each trait, using the best genomic prediction model.The performance of different models, fitted with both sets of SNP markers was compared for the 15 traits.In addition, the GEBVs obtained using the AD-SNP set were used to rank the top 100 genotypes from the population, and these were compared to the rankings based on phenotypic data as a way to determine the accuracy of genomic prediction.
The predictive ability of all models varied across traits, and for 12 traits, including fruit filling and fruit bunch traits, the Bayesian models that account for additive genetic effects gave the highest predictions [108].For the other three traits, that were days to fruit maturity, height of tallest sucker at flowering and at harvesting, the RKHS model combining both pedigree and marker information, which accounts for non-additive genetic effects, gave the highest predictions.Predictions were best for fruit filling, fruit bunch, plant stature, less for black leaf streak resistance, and least for suckering behavior.
In order to see the effect of allele dosage on the predictive ability of the models, an equal number of BA-SNP and AD-SNP markers were used, while the phenotypes were combined data from the two fields and two crop cycles (environmental averaged data).In this case, predictive ability of all models based on AD-SNP was trait dependent, but generally reduced by 15%, on average, as compared to the traditional BA-SNP markers.The authors attributed the loss in predictive ability when taking into account allele dosage to variations in minor allele frequencies across loci, and the allopolyploid nature of the banana genome [108].The general trend of predictive ability across traits was maintained as for the BA-SNP, with the best prediction for fruit filling (pulp diameter).Therefore, this important trait could be controlled by large effect QTLs.When the accuracy of prediction was examined by comparing the top 100 genotypes with the highest GEBV to the top 100 genotypes with the highest environment averaged phenotypic data, it varied from 76% to 84% for all the traits, for prediction within the training population [108].

Conclusions
Given the complexities of breeding RTB crops, and the potential to accelerate genetic gains, applying genomic tools has even more significance.Indeed, RTB is incorporating genomic approaches at several levels in its various breeding programs.It is contributing to the generation of genomic resources, including sequencing of draft genomes, and applying NGS to sequence both genetic resources for diversity (from genebank accessions) and breeding lines for further exploitation using genomic approaches, such as GWAS and GS.Moreover, some programs are further ahead in using GWAS for QTL mapping and identifying alleles of significance for complex traits, and/or applying GS models to training populations, and in the case of the NextGen Cassava project, using these to advance breeding populations and realize genetic gains for complex traits.It must be borne in mind that polyploidy is a challenge not only for the RTB crops, but also for many tree, horticultural, forage, and ornamental crops, as well as major staples, such as wheat.Indeed, there is progress in the development of tools and approaches to tackle the issues of polyploidy at all levels of genomic tool application, which will serve the RTB community.This will include important contributions from RTB bioinformaticians and scientists.
To date, high throughput KASP molecular markers have been developed for genotyping breeding lines in potato and cassava, with the former already validated for virus resistance (PVY), and a number of markers for both virus resistance, carotenoid content, and dry matter content in cassava now being validated for use in MAS.As more alleles are identified by GWAS and validated, these will make their way into MAS as well, with several complex traits in the pipeline, such as parthenocarpy in banana, CBSD resistance in cassava, anthracnose disease resistance in yam, iron and zinc content in potato tubers, as well as tuber bulking under warm conditions, to name a few.GS will open options to tackle difficult and complex traits that are less amenable to MAS, as they are controlled by many small-effect genes.Successes in GS prediction efficacies have usually been with simpler traits, but there are continuous improvements in the development of the prediction models.
Therefore, within the RTB breeding programs, MAS is being implemented, and GS has passed the proof-of-concept stage, and so now, strategies must be put in place to incorporate and implement it in the breeding programs, learning from the experiences of the NextGen Cassava project.As new approaches develop, such as combining GWAS data to improve GS models [109], the RTB programs already are having the capacity to absorb and implement them.Looking forward, each breeding program, depending on the most critical traits that can benefit from genomic tools, will have to develop workflows that incorporate genomics-assisted breeding-from MAS to GS, as well as trait discovery more upstream, into the wider breeding schemes.The RTB Breeding Community of Practice, as well as the Excellence in Breeding Platform, will play critical roles to help develop best practices for the uptake and use of genomics-assisted breeding, both by exchanging experiences and learnings, accessing the latest developments, and exploring and developing solutions [25].Workflows for DNA isolation and molecular marker screening need to be incorporated in a timely and economical fashion within the breeding schemes, replacing more complex phenotyping screens, The use of GS will most likely be for parent selection within the RTB programs, but as this becomes more efficient for polyploid genotypes, incorporating dominance and epistatic effects, as well as multiple location environmental effects (including across years), progeny selection, especially in clonally-propagated crops, is also a consideration, as individual genotypes are potential varieties.Nevertheless, as diploid lines are also utilized in the cassava, yam, banana, and potato programs, GS prediction models that incorporate non-additive effects can be first applied in this breeding material.Depending on the traits being selected for using prediction models, breeding schemes will most likely have to incorporate both GS and phenotyping for other traits, into the workflows, and also develop selection indices that combine both [25].This will most likely take several iterations, but there is already valuable experience being