Can Parentage Analysis Facilitate Breeding Activities in Root and Tuber Crops ?

Controlled pollination in root and tuber crops is challenging. Complex ploidy, cross-incompatibility, erratic flowering patterns, outcrossing, etc., limit the efficiency of breeding progress in these crops. Half-sib breeding that involves random pollination among parents is a viable method to harness genetic gain in outcrossing crops that are problematic for performing planned and controlled pollination. The authenticity of resulting progenies from the half-sib breeding is essential to monitor the selection gain in the breeding program. Parentage analysis facilitated by molecular markers is among the available handy tools for crop breeders to maximize genetic gain in a breeding program. It can help to resolve the identity of half-sib progenies and reconstruct the pedigree in the outcrossing crops. This paper reviews the potential benefits of parentage analysis in breeding selected outcrossing root and tuber crops. It assesses how paternity analysis facilitates breeding activities and the ways it improves genetic gain in the root and tuber breeding programs. Conscious use of complementary techniques in the root and tuber breeding programs can increase the selection gain by reducing the long breeding cycle and cost, as well as reliable exploitation of the heritable variation in the desired direction.


Introduction
Root and tuber crops are important crops with increasing food, feed and industrial applications in Sub-Saharan Africa and many other regions of the world [1][2][3].Root and tuber crops have tremendous potential to contribute to food, nutrition and income security of many families around the globe, but this has not yet been fully exploited.Variety development through breeding is among the action steps to unlock the potential of these crops for food, feed and industrial applications.Breeding root and tuber crops, however, presents special challenges and heavily relies on the traditional techniques of exploiting the existing variation.Irregularity in flowering time and flowering intensity, cross incompatibility, polyploidy and fertility are among the factors that add challenges to genetic improvement through breeding in these crops [4,5].
Traditional breeding in root and tuber crops often utilizes open pollination by wind or insects to generate sufficient families with unstructured pedigree and subsequent selection of progeny.This method is not efficient for high quality true seed production, identification of parents that contributed to the progenies developed and estimation of reliable genetic parameters to harness diversity in the desired direction.Many root and tuber breeding programs have generated a significant number of varieties; but these varieties lack reliable and complete pedigree information to monitor the progress [5].For instance, in potatoes (Solanum tuberosum L.), despite the availability of a fairly large amount of pedigree information on the web database, pedigree data of some genotypes are missing [5].A similar situation also exists for cassava, yam and sweet potatoes.The lack of structured and complete pedigree information in root and tuber crops hinders development and efficient management of databases of crops.The lack of structured pedigree data also suggests that parental control is less exploited to harness selection gain in these crops.
The efficiency of breeders, geneticists, botanists and other specialties in plant science is enhanced through their ability to identify, distinguish and estimate the extent of genetic diversity and relatedness in breeding populations and the extent of parental control to guide the breeding progress.In traditional breeding using phenotypic traits alone, efficiency is diminished due to variations in environment and genotype, inadequate data collection and subjective treatment(s).The advent of molecular marker technology has complemented the traditional breeding technique in adequately identifying genotypes and estimating genetic parameters [6].
A record of reliable pedigrees and parental profiles is useful in modern plant breeding.Reliable pedigree information guides breeders in making prudent decisions on existing divergence in progeny, hybrid vigor and effects of inbreeding depression [7,8].Parental profile information is useful to classify recombinant and parental genotypes for linkage analysis [9].Such information is widely applicable for various studies including multiple quantitative traits loci (QTL) mapping [10], association mapping [11,12], resistance inheritance [13] and determination of genetic estimates, breeding values and relationships [14][15][16].Demeke et al. [17] and Isenegger et al. [18] noted that clustering in full-sibs and half-sibs is an indication of a link between estimated relationships and known pedigrees.
Information on the application of parentage analysis in major root and tuber crops such as yam, cassava, sweet potato and potatoes has not been well reported.Moreover, unlike potatoes, which have the highest application of the technique, yam is among the crops with the lowest application (Table 1).In yams, parentage analysis has not been done on progenies derived from open pollination and polycross blocks.It was, therefore, imperative to catalogue existing literature on parentage analysis on the root and tuber crops to serve as a basis to improve the breeding strategy of the crops.Pedigree reconstruction is necessary for enhanced breeding and genetic studies.Such information will help with the estimation of reliable heritability and genetic correlation parameters in open pollinated plants to maximize genetic gain and design efficient breeding program.It also ascertains the genetic identity of mislabeled genotypes used in breeding programs.Thus, a good understanding of life history traits and pedigree reconstruction is necessary for the formulation of effective strategies for genetic conservation, management and utilization of genotypes in breeding and genetics programs.In this paper, we review the application and potential of parentage analysis to facilitate breeding activities in selected root and tuber crops.

Overview of Parentage Analysis in Breeding Programs
Parentage analysis was historically determined using the traditional morphological analysis technique [19].This technique is limited and cumbersome, especially when candidate parents exhibit similar phenotypes and progenies are generated by open pollination or polycross [19,20].Moreover, phenotypic parentage analysis is complicated where species used in crosses possess similar traits within each subgenus.These limitations of the phenotypic parentage analysis led to the advent of improved parentage analysis techniques to augment the traditional one.
Parentage analysis in natural populations started with chromosomal polymorphisms [21], followed by allozyme electrophoresis [22] and DNA molecular technology [20].The DNA fingerprinting technology was originally developed to identify human remains in forensic research [23], but was later utilized for resolution of immigration [24] and paternity conflicts [25].The technique has also been useful to accurately resolve genetic relationships in plants [20], as well as other organisms.The efficiency of DNA fingerprinting-led parentage analysis was first discovered in birds in the 1980s, which led to a great paradigm shift in behavioral ecology [26].Statistical techniques were simultaneously developed for parentage analysis using single-locus polymorphisms such as allozymes [27].This simple statistical analysis was ineffective at determining parentage using multi-loci polymorphisms [28].The limitations of the earlier parentage analysis techniques led to the discovery and development of microsatellites and other advanced molecular markers, as well as robust statistical packages [29].These advances have made determination of genetic polymorphisms in populations and parentage analysis easier in the fields of breeding, genetics, evolution, behavioral ecology and molecular ecology [29,30].

The Concept and Application of Parentage Analysis
Patterns of pollination may influence reproductive capacity of plants, hybridization within and among populations and habitat fragmentation [92].Pollen dispersal determines the extent of gene flow between cultivated and wild crop species and contaminated seeds.In the past, several indirect techniques were used to trace the physical movement of pollen, including traps [93,94], dyes [95] and the paths of pollinators [92,96,97].These techniques are limited regarding the actual patterns of fertilization and gene flow.The path of a pollen-forming seed in an open-pollinated environment may be successfully tracked using a direct technique like parentage analysis.Parentage analysis involves DNA profiling of progeny and potential candidate parents, comparing their alleles for identification and confirmation of existing relationships.In this technique, the DNA profiles of progeny and maternal and paternal parents are verified by assigning each allele of tested progeny to its parents.The maternal parent is known, because seeds are harvested from it.If the assigned alleles of progeny align more with a putative male parent than the remaining candidates, it implies a high probability of being the true pollen parent.Where alleles fail to be assigned to one of the putative parents, the relationship between progeny and parents is excluded [30].The choice of any parentage analysis technique depends on adequate answers that address their corresponding empirical questions.Some of these questions include: (i) Which parentage analysis technique has a higher probability of success regarding the research objective of the study?(ii) Which sampling design is appropriate for a typical parentage analysis experiment?(iii) How are samples collected to ensure desired results from data subjected to an appropriate statistical method?(iv) Which molecular markers provide high levels of polymorphism per locus?The general roadmap on the application of the parentage analysis in natural and artificial population of organisms is elaborately illustrated in Jones et al. [29].According to these authors, the type of parentage analysis to employ depends on the sampling techniques applied.When putative parents are mated to produce progenies in a breeding population, all parentage analysis techniques are possible.However, the absence of one or more putative parents in the mating population limits the number of applicable parentage analysis options.For instance, parental reconstruction becomes the only applicable technique where several half-or full-sib groups are identifiable and sampled.For small family groups, parental reconstruction, or sibship reconstruction, or assessment for multiple mating within family groups is applicable.Multiple mating (mixed parentage) occurs where more than one putative male parent contributes pollen to fertilize the female stigmatic organ leading to fruit and seed development.Its influence on sample size depends on ploidy complexity, cross compatibility, pollen viability, flowering intensity and synchronization, as well as biotic and abiotic factors.In breeding programs where progenies are not collected in family groups, only the sibship reconstruction technique is applicable.Successful utilization of parental reconstruction or sibship reconstruction technique allows comparison among reconstructed parental genotypes for their mating patterns.
Spanoghe et al. [8] successfully applied parentage analysis in European potato cultivars for pedigree validation, as well as potential parental assignment.Accordingly, a root or tuber breeding program can make informed decision with the application of parentage analysis in instances where the pedigree is either in doubt or incomplete for the materials in the breeding program.
Parentage analysis is widely applicable in assessing reproductive success, mating patterns, kinship, fitness in natural populations and in developing highly polymorphic molecular markers of multilocus genotypes [98,99].Parentage analysis is useful in multiple mating breeding systems where individuals of one sex mate with two or more partners of the opposite sex.It enhances investigation of the genetic mating system of organisms by evaluating the actual reproductive success of paternity and construction of pedigrees [99,100].Further, it is used to confirm monogamy in some species [101] and also those that exhibit extra pair copulation [102].Parentage analysis resolves difficulties in direct mating systems [103].It also enhances heritability and genetic correlation estimates in open pollinated plants needed for evaluation of expected genetic gains and designing of breeding programs.The parentage analysis technique is a useful tool in the protection pf Plant Breeders' Rights (PBR).The PBR comprise a globally-accredited system that gives breeders intellectual property rights for unique technology such as a new variety or varieties developed or adapted for commercial or various end uses [104].The PBR accord breeders with end point royalties (EPR) utilized to support and sustain breeding activities.The use of DNA profiling resolves issues of genetic mixtures, mislabeling, mutations and outcrossing in natural populations that often confound efficient selection in breeding programs.The use of markers that are highly informative, high-throughput and reliably reproducible clearly and robustly determine genetic identity based on morphological attributes, aiding in the protection of PBR [104].
Despite its relevance and wide application, the success of parentage analysis depends on the sampling design used, choice of molecular markers, family structure species, pollen and seed dispersal [29,30,92,105].Additionally, various unfavorable biological attributes of organisms and use of inadequate markers may also limit the potential of parentage analysis.

Sampling Techniques
Parentage analysis is often used in systems that allow the collection of candidate parents with the assumption of the presence of a sample of adult genotypes.An ideal parentage study involves large family sets of progeny obtained from the maternal parent and a complete set of potential candidate paternal parents in the breeding population [106].This permits application of all parentage analysis methods with a high probability of success.However, the ideal situation is very difficult in natural populations.An excessively large number of genotyping tests for assaying many progenies per family may be needed depending on the goals of the study.Nonetheless, parentage inference has good prospects with exclusion and assignment techniques remaining useful options if candidate parents are present.Sampling of specimens is the hardest aspect of parentage analysis; however, it is also the most important criterion required for good analysis.Care must be taken to maintain complete samples from the field.Knowledge of one of the parents of the offspring makes parentage analysis easier and more powerful than when both parents are unknown.Thus, use of adequate sampling design and collection of offspring from a known parent facilitates parentage analysis targeted at answering various questions of interest to the researcher.

Molecular Marker Systems
Generally, molecular markers that identify heritable variations among genotypes are useful in parentage analysis.The identification of appropriate markers determines their successful application in parentage analysis [105].A reasonable number of highly polymorphic molecular markers or a large number of the low to moderate polymorphic types is needed for successful parentage analysis [29].
Despite the various markers utilized in parentage studies, few useful ones have been earmarked such as microsatellites, single nucleotide polymorphisms (SNPs) and amplified fragment length polymorphisms (AFLPs).Of these, microsatellites are often utilized in parentage analysis due to their repeatability, high polymorphism, co-dominance and PCR-based attributes [105].The hyper-variability and codominant inheritance attributes of microsatellite alleles aid molecular identification of organisms utilized in mating system experiments including parentage analysis [92].Microsatellites are DNA repeats with 2-6 base pair sequences recurring at least 12 times.They are also referred to as short tandem repeats (STRs) [107] or simple sequence repeats (SSRs) [108].Most microsatellites utilized in parentage analysis are di-, tri-or tetra-nucleotide repeats, with repeated motifs of two, three or four base pairs, respectively.Of the three, tetra-nucleotide microsatellites are preferred since different alleles are easily separated on a gel, compared to di-nucleotides, which exhibit less 'stuttering' with slightly larger or smaller bands than the true allele on the gel [29,109].Stuttering is a process whereby stutter bands with different sizes from normal PCR products are produced in multiples of the short repeat unit (1-2 bp) [110,111].Stuttering is caused by strand slippage, replication slippage and the structure of short tandem repeats (STRs) during polymerase chain reactions producing a highly polymorphic stutter product with the wrong number of repeats.The presence of stutter bands in SSRs makes scoring cumbersome, producing quasi-scoring in ladders lacking prominent bands [111].Moreover, although multiplexing permits moderate-level throughput in microsatellites, these markers are poorly transferable across species, necessitating their investigation and optimization prior to use for parentage analysis in a new species [112].
A typical paternity analysis using microsatellites involves, firstly, sampling and genotyping parents to obtain microsatellite loci.Progeny obtained from known maternal parents are genotyped at these loci.A diploid progeny exhibits two alleles; of which, one allele is contributed by the seed parent and the other by the pollen parent.The maternal allele at each locus of the genotyped progeny is identifiable.The progenies are tested against a pool of pollen parents to determine their paternity.The exclusion probability of 97-99% determines paternity with a high degree of confidence.The exclusion probability is the probability of excluding a misrelated pollen parent as the father of a progeny, where knowledge of maternity exists.Allelic mismatch of pollen parents implies that pollens that formed the progeny were obtained outside the studied area.Genotyping of progeny families provides a good understanding of the distribution of pollination distances, directions and the emergence of successful reproductive paternity.In root and tuber crops, microsatellites have been utilized to determine the progeny-paternity relationship (Table 1).Microsatellites have been used for parentage analysis in yams [32][33][34][35][36][37][38][39], potatoes [44][45][46][48][49][50]52,53,55], cassava [63][64][65][66][67][68][69][70][71][72][73] and sweet potato [76][77][78][79][80][81][82]84,85].
Besides microsatellites, other useful markers for parentage analysis are SNPs, AFLPs [113,114] and diversity array technology (DArT) markers [6].The AFLPs and SNPs have many loci, with each locus exhibiting two alleles per locus and low polymorphism.The alleles of SNPs are codominant, whilst those of AFLPs are dominant, implying that AFLPs disallow the separation of the heterozygote from one of the homozygous groups.The application of AFLP markers in parentage analysis has only been reported in plants [6].One of the merits of AFLP markers is the flexible use of its commercial kit for parentage analysis of many organisms with little need to develop new ones [29].This contrasts SNP markers that are dependent on known DNA sequences with polymorphic nucleotide positions.The accessibility of SNP markers and the availability of hundreds of loci of model organisms are more easily obtainable compared to other marker systems.Moreover, SNPs are more easily assayed per locus compared to microsatellites, leading to the futuristic prediction of their preferential use for parentage analysis [115].However, the low per-locus polymorphism of SNPs limits some data analysis techniques such as Sanger's method [116] and in silico SNP discovery using mining of SNPs within expressed sequence tag (EST) databases followed by PCR-based validation [117].Single nucleotide polymorphism discovery is more difficult in crops with complex genomes than those with simple genomes due to their highly repetitive nature [118].Despite the gene discovery technique, the low per-locus polymorphism in conserved genic regions, low-copy noncoding regions and intergenic spaces are independent of the detection of gene-based SNPs [119].
As molecular paternity research progressed, it became imperative to develop markers with wider genome coverage and higher level throughput for the increased resolution and speed required for various applications.Jaccoud et al. [120] suggested DArT as a promising alternative that satisfies the throughput, genome coverage and transferability criteria.DArT is a high-throughput complexity reduction technique involved in the hybridization of fluorescent DNA probes to targeted DNAs spotted on a microarray [121].The DArT assays produce whole genome fingerprints used for scoring the presence or absence of DNA fragments in genomic sequences generated from genomic DNA samples through complexity reduction.The DArT genome profiles enable fast mapping of QTLs, accelerate the introgression of a selected genomic region into an elite genetic background and guide the assembly of many different genomic regions into improved varieties.The number of markers detectable by DArT depends mainly on the level of DNA sequence variability in analyzed samples and the complexity reduction technique used [122].The merits of DArT include a high multiplexing level, simultaneous genotyping of several thousand loci per assay and its simultaneous screening of thousands of polymorphic loci lacking prior genome sequence information; additionally, DArT provides hundreds to tens of thousands of highly reliable markers [120], as well as good genome coverage information [123].Moreover, sequences of DArT markers are easily accessible compared to AFLPs, making them a preferred technique for non-model species [124].However, DArT is limited in some applications due to high technicality in preparing genomic representation for the target species, cloning, data management and analysis.Other demerits include its requirement of robust software (DArTsoft and DArTdb) for analysis and the dominance of DArT markers (present or absent) or differential intensity [125].
Prediction and selection of improved plants on a genome-wide scale requires large populations with dense molecular markers across the genome [126].This is achieved through a technique known as genotype-by-sequencing (GBS).GBS is an enzyme-based complexity reduction technique that utilizes restriction endonucleases targeted at a small portion of the genome and DNA barcoded adapters for the production of multiplex libraries of samples used in next-generation sequencing (NGS).The GBS technique is noted for its robustness across many plant species and capability to produce 10,000s-100,000s of molecular markers [126,127].It is flexible regarding species, populations and research objectives, indicating its suitability for plant genetics studies.Another merit of this technique is that both marker discovery and genotyping are completed at the same time.Moreover, GBS aids the exploration of new germplasm or species without prior discovery and characterization of polymorphisms.Reanalysis of raw sequences from GBS uncovers new polymorphisms, annotated genes, etc. [126].
Based on the above facts, it is clear that DArT and microsatellites are useful and preferred for parentage studies.This partly corroborates earlier prediction of the continued usefulness and preference of microsatellites for parentage analysis [29,128].The AFLPs may be less popular and less utilized for parentage determination, except for species lacking justification to develop microsatellites.With increasing availability of genomic data, the potential use of SNPs for exclusion or parentage assignment in model systems is increasing, though they might not be applicable in some molecular work.It is also probable that the use of tightly-linked loci in experiments with the linkage phase will increase the applicability of SNPs in parentage analysis [129].The linked SNPs are dubbed super-loci, with the potential of many alleles under a sufficiently low rate of recombination that permits stable inheritance of haplotypes [29].Similarly, it is also probable to increase the use of microsatellites in species with low polymorphism [29].Besides the super-locus scenario, one should avoid loci that are in linkage disequilibrium to adhere to the independence among the loci statistical assumption of parentage analysis [30].

Analysis Method
Various analysis methods are available for the implementation of the parentage analysis in the breeding of crops whenever applicable for the program.However, which method is more appropriate to maximize the genetic gain in a breeding program is most important when it comes to the crop improvement.Jones et al. [9] provided the pros and cons of each of the techniques for parentage analysis.We briefly summarize the methods here with respect to their possible application in facilitating root and tuber crop breeding.

Exclusion Technique
This technique involves comparison of the alleles of progenies with their putative parents followed by elimination of parents lacking matching alleles at all loci with tested progenies.The exclusion technique leverages the Mendelian inheritance principle that is typical in sexual diploid plants, as the individual progeny has at least one allele per locus the same as each of its parents.This technique is more feasible to dissect the parentage issues in a population with fewer progenies and candidate parents.It requires highly polymorphic genetic markers.Larger population size, genotyping error, mutation in progenies and complex ploidy are among the factors affecting its successful implementation.

Categorical Allocation Technique
Categorical allocation involves assigning of progenies to putative non-excluded parent with the highest likelihood or posterior probability of consideration as the true parent.This technique is applicable for assignment of non-excluded putative parents that lack assignment by the complete parental exclusion technique.The technique is also useful in resolving scoring errors or mutations, as well as in estimating confidence.
Categorical allocation measures the logarithm of likelihood ratio (LOD score).It is the quotient of the likelihood of the putative parent(s) of a given progeny and the likelihood of unrelated individuals.The progenies are unassigned when the LOD score is negative or zero.

Fractional Allocation Technique
This technique involves assigning a fraction or proportion of progenies to non-excluded putative parents based on their likelihoods of parentage.The fraction of progenies assigned to a putative parent is proportional to the likelihood of parenting them compared with non-excluded putative parents.This works on the assumption that the genotypes of all parents in the population, as well as one parent of the specified progeny are known.
The fractional allocation is estimated as the proportion of progeny (O = k) assigned to putative male j (MP = j) conditional on female i (FP = i).This is summarized in the equation below.
where X ik is the proportion of non-excluded female parents (i) and progenies (k); P indicates proportion; MP and FP represent male parent and female parent, respectively.

Full Probability Parentage Analysis Technique
The full probability technique involves simultaneous estimation of patterns of parentage and other desired population level variables using models.The technique offers better data mining through incorporation of available uncertainties in determining desired variables.The full probability models are suggested for the estimation of population-level variables of interest, particularly where parentage analysis methods show <95-100% confidence in assigning progeny to parents [29].

Parental Reconstruction Technique
In this technique, the genotype of each progeny in full-or half-sib families is used for parental reconstruction.The parental reconstruction is applicable where at least one parent is shared among the full-or half-sib progeny arrays.Genotypes of unknown parents are determined using parental reconstruction, by matching alleles of progeny to a set of candidate parents and verifying those using assignment or exclusion techniques [130].However, the parental reconstruction technique is not applicable in situations lacking large sets of related progeny.This technique often works on the assumption of a minimum number of parents, maximum likelihood or Bayesian approaches.

Sibship Reconstruction Technique
This analysis technique involves the collection of large groups of full-or half-sib progenies that lack putative parents.The algorithms of this technique group progenies into unique clusters of full-sibs, half-sibs and unrelated progenies using patterns of relatedness or maximum likelihood techniques.Successful reconstruction of progenies is contingent upon adequate identification of clusters of half-sib or full-sib progenies.
All techniques of parentage analysis are possible where candidates are identifiable.In the absence of candidate parents, the options are fairly limited.For instance, for a breeding program in root and tuber crops employing a half-sib breeding method with an identifiable and comparatively large family size, the parental reconstruction technique is useful to establish the relationship.Several authors have also reported various techniques used to investigate proof of multiple mating in small family sets [131][132][133][134]. Sibship reconstruction is the only appropriate technique applicable where offspring are not collected per family.

Genotyping Errors, Mutations and Null Alleles
The occurrence of genotyping errors, mutations and null alleles negatively affects parentage analysis even where appropriate sampling design and molecular markers are used.Genotyping errors and mutations are particularly the most critical problems that account for inconsistent results in parentage analysis.Genotyping errors arise from genotypic misreads, amplification failure or spurious production of false results.Mutations are allelic alterations in the progeny compared to those inherited from their parents.The existence of both genotyping errors and mutations produces incompatibilities between true candidate parents and their progeny.This necessitates use of good quality control measures that facilitate successful parentage analysis [135,136].For instance, in the exclusion or parental reconstruction technique, parents are excluded or an additional parent is invoked provided the result is verifiable by one or more additional loci.This technique is probably excessively conservative, consequently producing several incorrect inclusions.It requires care for the assurance of sufficiently adequate power.Unlike the conservative technique, full probability models and sibship reconstruction methods of the categorical or fractional assignment build a model of error into likelihood estimates or posterior probabilities [137,138].
Lack of allelic amplification, or "null alleles," is a problem in parentage analyses [139].Null alleles account for mismatches between parents and progenies by presenting truly heterozygous genotypes for the null allele as homozygous.These alleles are mostly improperly handled by parentage analysis programs.The detection of null alleles is often evident as a deviation from Hardy-Weinberg equilibrium at the null-bearing locus or as a non-Mendelian pattern of segregation in known sets of families [29,140].Since null allelic loci are often detectable, they are apparently not a major threat to parentage analysis.A well-articulated study on the validity of null alleles is reported by Jones and Ardren [30].
This notwithstanding, Wang has developed a model to resolve issues of allelic dropouts, stochastic errors, genotyping error, mutations and null alleles [137].Wang's model works on a similar principle as the Kalinowski et al. model by fitting systematic allelic dropouts (null alleles) and microsatellite mutations and scoring errors (stochastic error category) [141,142].The technique allows variation among loci for the rate of error.Wang's model has been utilized for sibship reconstruction [137] and full probability parentage analysis [143].The model is, however, still under examination to determine incremental accuracy in handling errors and null alleles for parentage analysis.

Family Structure of Candidate Parents
The family structure sometimes constitutes candidate parents, their relatives or desired progeny.Besides parent-offspring relationships, most inference methods work on the assumption that candidate parents and progeny are unrelated, presenting difficulty for breeding populations with an existing family structure [144,145].Several authors have noted the effects of the family structure of candidate parents in parentage analysis [29,146,147].Parentage analysis often only produces the desired impact where very close relatives of progeny are added to candidate parents.In reality, non-exclusion of full progeny siblings exhibits higher likelihoods of parentage than the true parents [148].The addition of many close progeny relatives to candidate parents necessitates the use of complete exclusion or parental reconstruction methods, due to their comparative insensitivity to family structure and reliable diagnosis of true patterns of parentage [30].

Species Effect
Paternity assignment is also influenced by the ploidy level of the studied crop species.Paternity assignment or exclusion is more direct and simpler in diploids than polyploids since the developed progeny inherits one paternal allele from the putative male plant [92].Genotypic scoring is also simpler in diploids than polyploids since each individual candidate exhibits one allele (homozygote type) or two alleles (heterozygote type) at each locus.Generally, about 30-50% of plant species are polyploids [143,149].This presents an acute paternity assignment limitation especially in highly polyploidized genomes with complicated inheritance.However, the genomes of some polyploids are highly diploidized due to ancient polyploidy events, resulting in the formation of bivalent chromosomes at meiosis.This is typical of octoploid strawberry [150], as well as some genotypes of polyploid yams, cassava, potato, sweet potato, etc. [4].In polyploids with polyploidized genomes, parentage assignment is complicated due to cumbersome scoring of alleles and inference of their polysomic nature of inheritance [151,152].The parentage assignment complications in polyploids are nowadays resolved by the transformation of polyploid codominant genotypes into pseudodiploid-dominant genotypes [153].

Pollen and Seed Dispersal Mechanism
Pollen-and seed-mediated gene flows are the two major determinants of genetic structure and diversity in plants [154].Pollen-mediated gene flow contributes to plant genetic structure and diversity through immigration, hybridization and introgression, whereas seed-mediated gene flow functions in the colonization of new habitats.Determination of the extent of pollen-and seed-mediated gene flow is imperative for the prediction of its effects on the genetic structure of progenies generated in mating designs [155].Gene flow by pollen in open pollination and polycross mating schemes has been studied in various plant species using molecular markers and paternity analysis [4,81,156].These studies describe the pathway of effective gene flow between the pollen donor and the pollen recipient plants, leading to botanic seed development.In root and tuber crops, however, a dearth of well-articulated information on the extent of pollen-mediated gene flow exists especially for the polycross or open pollination mating design.This information is vital since it represents the realized gene flows by pollen and seed [157].Studies on open-pollinated or polycross-derived seeds permit determination of the rate of pollen immigration, the pattern and distance of pollen dispersal, paternal and maternal fertility levels and the relationship with the plant phenotype and genotype.
Dispersal of seeds results in seed or seedling mixtures where both parents are unknown.Such a situation makes parentage analysis difficult.Seeds or seedlings may be obtained under different scenarios such as from one parent known as a pollen or seed parent; two parents, known as pollen and seed parents; or no parents, known as immigrant seeds [92].Most plant species are cosexual with either separate, monoecious male and female flowers or hermaphrodite flowers (i.e., possessing both male and female organs).Assignment of parentage of cosexual plants to either the seed-or pollen-parent using nuclear microsatellites is impossible.Dioecious plants lack such a problem since one of the parents is known [158][159][160].
Yam is a typical tuber crop that is mostly dioecious, with the staminate (androecious) and pistillate (gynoecious) organs borne on separate plants [161].A few monoecious and trimonoecious types also exist [162].Yams have varying flower, fruit and seed attributes depending on species.Detailed distinctions of these attributes have been reported by various authors [4,[162][163][164].The sticky nature of pollen promotes its pollination by insects [4].Yam fruits are dry dehiscent trilocular capsules, each containing 2-6 seeds per capsule and measuring about 1-3 cm.Seeds of most species are winged to facilitate wind dispersal [4].Seeds of some species such as Dioscorea rotundata are small, flat, light, lenticular and consist of a small embryo surrounded by a relatively large endosperm.At dehiscence, seeds are dispersed from dry locules.The shape and wing patterns of seeds depend on the species [163].For instance, in D. rotundata, the botanic seed is completely encircled by the wing, whereas in D. bulbifera or other species, the wing is found on one or both sides of the seeds, respectively [163].
Cassava is a protogynous monoecious root crop with male and female flowers borne on the same plant.Cassava pollen is small and sticky, which favors pollination by insects such as wasps (mainly Polistes spp.) and honeybees (Apis mellifera).A comprehensive review of the reproductive biology of cassava has been reported [165].
Sweet potato (Ipomoea batatas (L.) Lam.) is an annual, autogamous and outcrossing root crop with both male and female organs borne in the same flower [85].The pollen grains are spherical and mainly dispersed by bees for pollination.A comprehensive description of the floral biology of the crop is found at [166].
Potato (Solanum tuberosum) is a monoecious tuber crop with both male and female organs borne in the same flower.Natural pollen-mediated gene flow in potatoes is mainly by insects including bumblebees (Bombus impatiens Cresson and B. terrestris L.) [167] and pollen beetle species Meligethes aeneus Fabricius [168] with small pollen flow by wind [169].Seed-mediated gene flow is noted to be facilitated by birds [170] and small mammals [171].A comprehensive description of the floral biology of the crop is reported by Canadian Food Inspection Agency (CFIA) [170].The flowers, fruits and seeds of yam and cassava are shown in Figures 1 and 2  bumblebees (Bombus impatiens Cresson and B. terrestris L.) [167] and pollen beetle species Meligethes aeneus Fabricius [168] with small pollen flow by wind [169].Seed-mediated gene flow is noted to be facilitated by birds [170] and small mammals [171].A comprehensive description of the floral biology of the crop is reported by Canadian Food Inspection Agency (CFIA) [170].The flowers, fruits and seeds of yam and cassava are shown in Figures 1 and 2 (International Institute of Tropical Agriculture (IITA)/Prince E. Norman).Mitochondria or chloroplast organelle DNA has been used to distinguish the maternity and paternity of organism with uniparental inheritance [172,173].However, both mtDNA and cpDNA lack sufficient variability to identify individuals.An improved technique utilized for identification of seed parents is seed tissue obtained by maternal inheritance such as endocarp [174,175], pericarp [176] and seed wing [177] tissue or megagametophyte tissue of conifers [178].However, this method is limited to intact seeds, since seedlings generally lack maternal tissues.
Parentage analysis may exhibit any of the three outcomes including identification of (i) none of the parents, (ii) one of the parents or (iii) both male and female parents of the progenies [179].The first scenario is typical of seed immigration, whereas the second represents pollen immigration.However, assuming that the closest parent in the second and third scenarios is the maternal parent, a conservative distance of seed dispersal occurs [179].
Based on the above information, it can be deduced that a good understanding of the dynamics of natural populations is obtained using seedling plants, since they give additional information such as the rate of seed immigration, the distance of seed dispersal and the pattern of seed dispersal.The determination of gene flow in seedlings also permits comparisons between the dispersal distances and patterns of pollen and seeds.The effects of biotic and abiotic factors on pollen and seed dispersal, as well as genetic factors on pollen and seed development may also limit the extent of variability expected in parentage analysis.

Potential of Parentage Analysis in Root and Tuber Breeding Programs
Population improvement is an important breeding strategy to harness genetic gain in root and tuber crops [4].As breeding populations increase, issues of genealogy overlaps arise.Sometimes, when breeders release varieties, farmers give the varieties local names.Tracking of these varieties with time becomes difficult using phenotypic markers alone.Breeders often use routine molecular parentage analysis to resolve these issues and maintain useful allelic and gene diversity in their breeding populations.Maintenance of high genetic variability from generation to generation facilitates long-term sustainability of root and tuber breeding programs.Moreover, parentage analysis ascertains existing changes in allelic frequency of genotypes and how much is maintained in progenies.
Tracking pedigrees is crucial in root and tuber breeding programs.It provides the building blocks upon which heterotic groups are formed.Pedigrees are tracked by the name of the original development cross from which selection is made.Parentage analysis enhances root and tuber breeding activities through establishment of these heterotic patterns, which contribute to improving Mitochondria or chloroplast organelle DNA has been used to distinguish the maternity and paternity of organism with uniparental inheritance [172,173].However, both mtDNA and cpDNA lack sufficient variability to identify individuals.An improved technique utilized for identification of seed parents is seed tissue obtained by maternal inheritance such as endocarp [174,175], pericarp [176] and seed wing [177] tissue or megagametophyte tissue of conifers [178].However, this method is limited to intact seeds, since seedlings generally lack maternal tissues.
Parentage analysis may exhibit any of the three outcomes including identification of (i) none of the parents, (ii) one of the parents or (iii) both male and female parents of the progenies [179].The first scenario is typical of seed immigration, whereas the second represents pollen immigration.However, assuming that the closest parent in the second and third scenarios is the maternal parent, a conservative distance of seed dispersal occurs [179].
Based on the above information, it can be deduced that a good understanding of the dynamics of natural populations is obtained using seedling plants, since they give additional information such as the rate of seed immigration, the distance of seed dispersal and the pattern of seed dispersal.The determination of gene flow in seedlings also permits comparisons between the dispersal distances and patterns of pollen and seeds.The effects of biotic and abiotic factors on pollen and seed dispersal, as well as genetic factors on pollen and seed development may also limit the extent of variability expected in parentage analysis.

Potential of Parentage Analysis in Root and Tuber Breeding Programs
Population improvement is an important breeding strategy to harness genetic gain in root and tuber crops [4].As breeding populations increase, issues of genealogy overlaps arise.Sometimes, when breeders release varieties, farmers give the varieties local names.Tracking of these varieties with time becomes difficult using phenotypic markers alone.Breeders often use routine molecular parentage analysis to resolve these issues and maintain useful allelic and gene diversity in their breeding populations.Maintenance of high genetic variability from generation to generation facilitates long-term sustainability of root and tuber breeding programs.Moreover, parentage analysis ascertains existing changes in allelic frequency of genotypes and how much is maintained in progenies.
Tracking pedigrees is crucial in root and tuber breeding programs.It provides the building blocks upon which heterotic groups are formed.Pedigrees are tracked by the name of the original development cross from which selection is made.Parentage analysis enhances root and tuber breeding activities through establishment of these heterotic patterns, which contribute to improving the heterosis in the new hybrids developed.This improvement leads to a gain in the selection of parents with desired complementary alleles for improvement of target traits.
The knowledge of the ploidy level of the parent, fertility-regulating mechanism of putative parents, genes controlling traits of interest, etc., also guides the breeder in planning cross combinations and the choice of breeding strategy to maximize genetic gain.Molecular parentage analysis could help breeders to make informed decisions in a more efficient, accurate, creative and rapid manner [85][86][87].
Parentage analysis has many potential applications in root and tuber breeding programs.Some of the applications include: (i) Increased parental control in the breeding program: Adequate identification of parents contributing alleles to the next generation improves the breeding efficiency.(ii) Reduced cost and time of maintaining parents in the breeding program: Adequate genetic profiling of parents and progenies guide the selection of genotypes with desired traits at an early of breeding, thereby saving time, cost and labor.(iii) Reliable estimation of paternal breeding values in the half-sib family: Adequate molecular analysis complements conventional breeding values in the half-sib family, especially where the expression of alleles is masked by epistatic, dominance and other genetic and environmental factors.(iv) Parentage analysis minimizes pollination and labeling errors in breeding program; the effects of pollination and mislabeling errors are so costly in a breeding program, reducing the validity of progeny developed.Parentage analysis reduces these effects by adequately tracing the identities of progenies to their parents.(v) Assess pollen donor rate and pollen movement in breeding blocks: Pollen movement and amount of pollen load on the female stigmatic surface are influenced by several factors, including spatial arrangement and distance between male and female plants, genetic, biotic and abiotic factors.Successful parentage analysis helps to ascertain pollen donor rate and dispersal.(vi) Estimate the level of inbreeding in the breeding programs: In monoecious and perfect flower genotypes of roots and tubers, parentage analysis helps to determine the extent of inbreeding in the progeny developed.(vii) Assess incompatibility in breeding programs: Good knowledge of ploidy status and other fertility-regulating mechanisms in parents guides parental selection.Thus, parentage analysis helps to uncover any genetic aberrations limiting breeding progress.(viii) Estimate of the genetic effects including general combining ability (GCA) and specific combining ability (SCA) in the breeding program: In half-sib breeding, genetic estimates are done by assumption.Such assumptions are less accurate and therefore misleading.However, parentage analysis contributes to generating reliable pedigree and progeny genetic data, which invariably aid reliable estimation of genetic effects.(ix) Assess the rate of pollen contamination in breeding blocks: Pollen contamination may occur in both controlled and artificial mating schemes through insects, wind, human and other agents of pollination.Routine parentage analysis helps to assess whether pollen used is pure and the progeny developed is true to type.(x) Identify parents that produce superior progenies for selection and high quality seed production.
Complete parentage assignment has higher prospects where maternity is known, progeny samples are collected in family groups and complete samples of putative male parents are represented.The probability of adequate parentage assignment decreases with decreasing completeness of the dataset.The use of higher resolving power molecular markers counterbalances reconstruction of less ideal samples.The number of loci and the locus heterozygosity determine the maximum number of samples needed for parentage assignment.A priori knowledge of kinship relations is, therefore, needed to guide the type of parentage assignment technique to implement in root and tuber population development.
Root and tuber improvement for desired traits is crucial to societies in Africa and elsewhere making their livelihoods directly or indirectly from these crops.The future prospects of genetic improvement of long-duration root and tuber crops should target a good understanding of SWOT analysis of progeny and parent analyses using genomic markers, the development of a robust schematic pathway of parentage analysis in root and tuber crops as typified in aquatic animals [180], the reduction of the long breeding cycle for desired consumer and market traits and improved collaboration with relevant stakeholders in the crops' product value chains.Another prospect of enhancing parentage analysis relates to improvement in translational genetics, routine optimization of DNA sequencing protocols of recalcitrant root and tuber species, effective and efficient quality control and development of more user-friendly, but highly efficient parentage analysis software packages.Despite these prospects, more efforts are needed in root and tuber crops regarding the contribution of parentage analysis to improve estimates of genetic parameters, the extent of pollen and seed gene flows, the determination of the paternity of progenies harvested from known parents in random open pollination or polycross blocks especially in neglected root and tuber crops and the determination of the gap between the actual population and the expected population sizes.
Efficient reconstruction of genealogies of root and tuber crops also requires more efforts.Genealogy reconstruction may be limited by parent-offspring marker genotype data incompatibility using the paternity exclusion technique.Buteler [83] noted two limitations of paternity assignment in root and tuber crops using Mendelian segregation probabilities: ambiguous progeny genotypic profiles and statistical bias in favor of homozygotes.The ambiguity favors homozygous individuals with higher likelihood for the particular allele.However, the bias can be reduced by increasing the number of genetic markers.

Current Application of Parentage Analysis in Root and Tuber Crops
Molecular parentage analysis has been extensively utilized for the discovery of pedigree errors in breeding populations developed using natural and artificial mating schemes [30,59,181,182].In potatoes, many researchers have noted conflicting level of accuracies in pedigree assignment.For instance, Douches et al. [183] discovered mismatches in parental assignment in their isozyme-based marker pedigree investigation.Similarly, using 17 SSR markers, [8] reported some discrepancies in the pedigree records of 577 potato varieties.The quest to resolve the discrepancies in pedigree data and understand the level of accuracy led to the development and utilization of other marker types.The development of an Infinium SNP array from sequence data of six potato varieties including Bintje, Kennebec, Premier Russet, Shepody, Snowden and Atlantic [184,185] has facilitated the genotyping of several elite North American potato germplasm.However, pedigree errors or mismatches were also observed in the SNP-based markers [59].
Parent-progeny trios' relationships, as well as the paternity of some varieties generated with bulk pollen were determined in a genome-wide SNP analysis in potato [59].A trio consists of two parents and one progeny assessed at a time during parentage analysis.A total of 719 tetraploid potatoes were genotyped with the Infinium SNP array to produce 5063 high-quality markers.The curated information was used to verify pedigree records and establish parent-progeny relationships.Findings revealed pedigree errors in some trios.Of the 198 parent-progeny trios studied, 182 trios had accurate pedigree match (mean 0.02%, range 0-0.73%), while 16 trios exhibited pedigree error or mismatch (mean 16.4%, range 7-30%).Besides the 719 curated potato dataset, 24 genotypes were utilized to establish an alternative parentage hypothesis.Of the 24 genotypes, 17 pedigree modifications were suggested to have occurred.Of the 17 pedigree modifications, four were not due to pedigree error, but an unknown male, since the female parent was either open-pollinated by All Red or hybridized with bulked pollen of genotypes Allegany, AmaRosa and Purple Pelisse.
The exclusion analytical technique has been used to exclude candidates with a higher fraction of segregation mismatches at homozygous loci [30].The same technique has been utilized in root and tuber crops.In potato, for instance, [8] identified segregation mismatches in the pedigree information using 17 SSR markers and the analytical exclusion technique.However, they noted that the markers used were unreliable.The segregation mismatch identified using at least 5000 SNP markers was very informative with unambiguous determination of the pedigree and level of accuracy [59].Furthermore, the authors remarked that as the inbreeding coefficient increases, the use of homozygous loci alone to check segregation mismatch becomes less effective [59].In yams, little is known with regards to segregation mismatches in the pedigree information of existing germplasm.Currently, studies on parentage analysis in controlled (North Carolina 1) and polycross-derived progeny using DArT seq are being conducted at IITA.The findings of this study will be published as soon as the analysis and reports are completed.
Based on the above information, it is clear that parentage analysis is possible in roots and tuber crops.However, the existence of genetic bottlenecks with regards variable ploidy may contribute to segregation mismatches in pedigree records of genotypes of these crops.

Generic Framework of Parentage Analysis in Root and Tuber Crops
The flowchart in Figure 3 shows the experimental procedure parentage analysis for a typical root or tuber crop.Accordingly, from many germplasm accessions in a root or tuber breeding program, molecular markers can be used to confirm phenotypic variations observed in-field.The database of the core collections of fingerprinted genotypes is useful in breeding.At this preliminary assessment stage, informative markers that contribute to most of the variability are identified and the genetic relationships or similarities of clones determined using various coefficients such as Jaccard, Dice, etc., and their rankings are based on their genetic distances.Putative parental candidates with desired complementary traits used in population development are subjected to parentage analysis to determine the paternity or maternity, or both, for the tested progenies.The choice of the type of parentage analysis depends on the sampling technique and other factors previously discussed.The pedigree information obtained could be subjected to inferential analysis using threshold levels for the genetic distance ranking (GDR) or logarithm of odds score (LOD); and simple exclusion for the kinship testing option.
Agriculture 2018, 8, x FOR PEER REVIEW 14 of 22 database of the core collections of fingerprinted genotypes is useful in breeding.At this preliminary assessment stage, informative markers that contribute to most of the variability are identified and the genetic relationships or similarities of clones determined using various coefficients such as Jaccard, Dice, etc., and their rankings are based on their genetic distances.Putative parental candidates with desired complementary traits used in population development are subjected to parentage analysis to determine the paternity or maternity, or both, for the tested progenies.The choice of the type of parentage analysis depends on the sampling technique and other factors previously discussed.The pedigree information obtained could be subjected to inferential analysis using threshold levels for the genetic distance ranking (GDR) or logarithm of odds score (LOD); and simple exclusion for the kinship testing option.

Conclusions
Successful application of DNA-informed breeding techniques in root and tuber crops in complementarity with conventional population development techniques, and other recent advances in genomics and phenomics tools can accelerate the genetic gain compared to using the conventional breeding technique alone.Utilization of molecular technique in parentage analysis and realization of its merits have contributed to a greater impact on root and tuber breeding programs.The effectiveness of the molecular techniques in parentage assignment contributes to resolving many genetic bottle necks associated with half-sib breeding using polycross and open pollination mating designs compared to the full-sib breeding using controlled pairwise mattings.Moreover, successful use of parentage analysis improves half-sib breeding efficiency by increasing selection gain or accuracy, leading to higher selection pressure (<3%), more accurate estimation of genetic parameters, sib screening, the protection of breeders' rights, the utilization of new mating designs, the selection of new traits, the identification of the genetic identity of candidate genotypes and the reduction of time and costs.Efficient parentage assignment also establishes the extent of pollen and seed gene flow necessary for prediction of its effects on the genetic structure of progenies.These merits make

Conclusions
Successful application of DNA-informed breeding techniques in root and tuber crops in complementarity with conventional population development techniques, and other recent advances in genomics and phenomics tools can accelerate the genetic gain compared to using the conventional breeding technique alone.Utilization of molecular technique in parentage analysis and realization of its merits have contributed to a greater impact on root and tuber breeding programs.The effectiveness of the molecular techniques in parentage assignment contributes to resolving many genetic bottle necks associated with half-sib breeding using polycross and open pollination mating designs compared to the full-sib breeding using controlled pairwise mattings.Moreover, successful use of parentage analysis improves half-sib breeding efficiency by increasing selection gain or accuracy, leading to higher selection pressure (<3%), more accurate estimation of genetic parameters, sib screening, the protection of breeders' rights, the utilization of new mating designs, the selection of new traits, the identification of the genetic identity of candidate genotypes and the reduction of time and costs.Efficient parentage assignment also establishes the extent of pollen and seed gene flow necessary for prediction of its effects on the genetic structure of progenies.These merits make population improvement of organisms, including niche and neglected species, simple and flexible.

Figure 1 .
Figure 1.Floral, fruit and seed photos of yam.Photo source: International Institute of Tropical Agriculture (IITA)/Prince E. Norman.

Figure 1 .
Figure 1.Floral, fruit and seed photos of yam.Photo source: International Institute of Tropical Agriculture (IITA)/Prince E. Norman.Agriculture 2018, 8, x FOR PEER REVIEW 11 of 22

Figure 3 .
Figure 3.A flowchart of the experimental procedure of parentage analysis of a typical root or tuber crop.LOD, logarithm of odds.

Figure 3 .
Figure 3.A flowchart of the experimental procedure of parentage analysis of a typical root or tuber crop.LOD, logarithm of odds.

Table 1 .
Inventory of databases and studies on parentage analysis in selected root and tuber crops.