The SSR Null Allele Problem, and Its Consequences in Pedigree Reconstruction and Population Genetic Studies in Viticulture

: Null alleles are alleles that are recessive to codominant markers without any effect on the phenotype. In SSR assays, there are several reasons for the lack of ampliﬁcation at a locus: the primer does not bind well, longer fragments do not amplify due to imperfections in the PCR reaction, or the amount of DNA in the sample is insufﬁcient. In microsatellite studies, null alleles are mostly used in pedigree analysis and population genetics calculations such as diversity estimation. Null alleles in pedigree analysis can cause rejection of the true parent; if not recognized while in population genetics they distort the results in underestimating diversity. In this review, the effects caused by null-alleles in viticultural research and its possible solutions were summarized.


The Null Allele Problem
The grapevine, being one of humanity's oldest crops, has significant economic and cultural value. Climate change is the most major challenge confronting viticulture today, and its long-term impacts are likely to be substantially minimized by new vine varieties that are better adapted to the environment. This emphasizes the significance of breeding. However, breeding success is dependent on an accurate understanding of the genetic diversity in the starting material.
A null allele is an allele that results in the complete absence of a gene product or function. The best-known example of a null allele is the human AB0 blood group system, where allele "0" is considered a null allele because it does not produce a phenotype or the presence of allele "A" or "B" masks it (people with genotype "A0" and "AA" are the same, blood group "A") [1].
Molecular and genetic markers based on PCR reactions (e.g., SSR) often show codominant inheritance, meaning that there may be alleles at certain loci that cannot be detected. Such alleles can be considered null alleles. However, it should be noted that null alleles are not exclusively associated with codominant inheritance. Dominant PCR-based methods (RAPDs, etc.) may also contain null alleles. Array-based SNPs also have null alleles and many of the same considerations apply, as SSRs. GWS (genome Wide Sequencing), on the other hand, does not have this problem unless conducted at low depth and then the missing data are estimated through imputation.
If A n is a null allele, then the A i A n and A i A i genotypes are indistinguishable. If an individual is homozygous for such a null allele, no product is formed in the PCR reaction and genotyping fails [2].
The failure to detect an allele in a PCR-based genetic marker may be due to several reasons: • The primers used in the PCR reaction fail to bind to the DNA because the DNA sequence is different from the conservative reference sequence on which the detection is based [3]. This problem can also be caused by inappropriate primer design. • Amplification of alleles of different sizes may differ, with "longer" alleles sometimes not amplified [4].

•
The amount of DNA in the sample can also cause the lack of detection, because using the same DNA extract yields a PCR product in some loci but not in others [5,6].
For several reasons, null alleles are a problem in the use of genetic markers. In the most common applications in pedigree analyses or paternity tests, they can cause the erroneous exclusion of one or both parents by implying the putative parent in a locus to be homozygous when in fact it is heterozygous for a null allele [1]. For example, crossing a mother with genotype A i A i and a father with genotype A j A n gives a 50% chance of the offspring being of genotype A i A n , which apparently excludes the true father [2,7].
In population genetics studies, the presence of null alleles can result in an apparent reduction in the proportion of heterozygotes, which can greatly confound the assessment of the genetic diversity of a population. A null allele can lead to an overestimation of the frequency of detectable (non-zero) alleles, which can lead to a misrepresentation of population structure [2].

Methods for the Estimation of Null Allele Frequencies
Several methods exist for estimating null allele frequencies. Each of these methods assumes that the individuals in the sample under study form an approximately ideal population (in nature, no ideal population exists, but most populations approximate it). Thus, the Hardy-Weinberg rule is used to estimate the frequency of null alleles [8].

Chakraborty's Method
The method created by Chakraborty et al. to estimate the frequency of the null allele uses the expected (Hexp) and observed (Hobs) numbers of heterozygotes [9,10]: This formula gives a reasonably accurate estimate but cannot be considered a maximum likelihood estimate and does not take into account individuals in the sample where genotyping failed (no PCR product was detected), as this may be caused not only by the locus being homozygous for a null allele [2].

Brookfield's Method
Brookfield's estimate is applicable when there are no missing data in the sample [4]: Summers and Amos's (1997) method is also based on similar estimates. It gives a good estimate with simulated data, but all three estimates have the same flaw: they all estimate from the initial genotype counts and add up the data before giving an estimate of the null allele frequency [11].
In 2006, Kalinowski and Taper developed a new method for estimating null alleles [2]. The method is called maximum likelihood estimation and can handle incomplete data. Kalinowski and Taper developed their algorithm with a slight modification of the EM (Estimation-Maximization) algorithm [12,13] (described later) for estimating the '0' allele of the AB0 blood group.
To deal with the fact that in cases where the PCR reaction does not give a result, it is not known whether this is due to a technical problem or to a genotype homozygous for the null allele, a new variable was introduced. This variable (β) expresses the probability that the failure of the PCR reaction is not due to the null homozygous genotype but to some other problem. Their algorithm is implemented in C#, and the software can be downloaded from the Internet [14]. This algorithm was also implemented as a part of the software MolMarker [15,16].

The EM (Expectation-Maximization) Algorithm and Its Use for Estimating Null Allele Frequencies
The EM algorithm was first formulated by Dempster and colleagues. The EM algorithm is an iterative method that aims to provide maximum likelihood estimates of the parameters of statistical models where the model itself depends on missing or hidden data. The EM iteration consists of two steps: Step 1 E (Expectation): in this step, the missing data are calculated by training a conditional expected value based on the estimated values of the parameters.
Step 2 M (Maximization): based on the data calculated in the previous step and the existing data, a new estimate of the model parameters is made by maximizing the likelihood function.
The iterations are continued until the difference between the previous and the current value of the likelihood function is less than a predefined, sufficiently small value.
The EM algorithm can be used to estimate the frequency of null alleles in PCR-based genetic markers. In this case, heterozygotes carrying the null allele are indistinguishable from homozygotes carrying the detectable allele, so in this case the null allele can be considered as hidden data. The other problem is that if no product (missing data) is obtained in the PCR reaction, there are two possible reasons for this: (1) the tested individual is homozygous for the null allele at the locus or (2) the genotyping failed due to some other error.

The Importance of Pedigree Reconstruction from the Grape Breeder's Point of View
Grape variety performance is a genetically based complex polyfactorial feature whose expression is greatly impacted by ecological and agronomic conditions and is eventually reflected in yield evolution and synthetic varietal value [17,18].
Crosses in grape breeding must be designed and executed with the chosen grape types in mind. The grape varietals predicted to pass on the desired characteristic should be known to the breeder [19,20].
Kozma [18] authored a book on grape breeding that goes into great length about inbreeding and the heterosis effect when crossing and selfing grape cultivars. Cross breeding of grape varieties was where he saw the highest opportunity for increased heterozygosity.
Negrul [21,22] analyzed the influence of crossing within and across ecological groups of varieties on progeny population variability. While crossing within convarietas did not result in considerable variety, he discovered that crossing across ecological groups could result in a progeny population with high diversity. The results revealed that cross between convar. pontica and convar. orientlis produced mostly intermediate offspring, with the pontica variation having a modest advantage. The hybrids grew faster than orientalis cultivars in general, but they produced tiny, luscious berries that are suited for winemaking. The traits of the occidentalis variety predominate in the progeny of crosses between orientalis and occidentalis varieties. The hybrids produced by crossing occidentalis and pontica convarietas have intermediate features. Some of them have proven to be quite prolific and produce excellent wine.
Interspecific crosses can also promote genetic variability. The resultant interspecific hybrids are mostly employed in resistance breeding, as cultivated species frequently lack or have lower resistance than wild species. The so-called Franco-American hybrids have been the most widely utilized in grapevine breeding to boost resistance to mildews, and grey rot [23]. Vitis amurensis is also a significant source of genetic diversity. Early ripening, resistance to mildews, Botrytis, and Agrobacterium, and excellent frost tolerance are all significant qualities for grape breeding. This latter feature makes it a particularly useful source of genes in continental climate grape breeding [24].
The correct selection of parent pairs is a cornerstone of combination breeding; therefore, a wrong choice can set back the breeding programme by up to 30 years. In the past (and even today), so-called combining ability tests have been (and still are) carried out to address this issue. For example, in 1978 by crossing 'Bicane' with 20 different varieties from the pontica, orientalis, and occidentalis ecological groups, the cultivar's combining ability was calculated. A total of 4659 seedlings were investigated from these 20 crossings. Studies on the flower type's inheritance and other biological and economic characteristics have revealed that 'Bicane' has a high combining ability and it is heterozygous for the flower type, allowing homozygous varieties of "White Riesling" and "Muscat Hamburg" to be separated. Because of the large range of variation for numerous features, as well as the heterosis seen for traits including vigor, cold hardness of buds, cane maturity, crop level, cluster size, and berry size, high-valuable genotypes were chosen to be homologated [25].
In F1 offspring of 20 crossings between seeded and seedless table grape cultivars, the combining ability and genotype-environmental interaction were investigated in relation to average cluster weight. Significant distinctions between them were required for their employment in combination breeding to be successful. The importance of genotypeenvironmental interactions was clearly stated, and they should be taken into consideration in breeding practice [26].
It is clear from the foregoing that in grape breeding, the origin and pedigree of varieties is often more important than the phenotypic characteristics of a given variety. However, experiments to test combining ability are very time-consuming. We can save this work (and often time) by looking at the pedigree of varieties with valuable characteristics that are important to us. As recent research has shown, often valuable varieties with a wide range of ecological tolerance come from the same ancestor or ancestors such as 'Heunish' [27,28] or Pinots [29,30]. However, the progeny found in the study of Bowers et al. [31] are all historically related with northeastern France and not with any other locations, which suggests that the crossings took place in this region. It is obvious that 'Pinot' and 'Gouais blanc' make a good parental combination; on the other hand, any other varieties growing in the area are most likely to be relatives of 'Pinot' or 'Gouais blanc' and would be less fit as a result of inbreeding depression.
The breeder should then go back to the successful ancestor to save time and money.

Consequences of the Presence of Null Alleles in Pedigree Studies-Some Examples
The most important consequence of the presence of null alleles is that they can result in a true parent and its offspring appearing homozygous for different alleles. This can lead to a rejection of the real parent-offspring relationship [1,32].
It is essential to maintain precise pedigree records whenever a grape breeding program is being carried out. There is a possibility that the breeder's record will contain errors. It is feasible to identify and validate parent-offspring connections by the use of genetic markers, often known as "DNA Fingerprinting". Markers known as simple sequence repeats (SSR) were applied in order to validate or rectify the pedigrees of grape varieties developed through the Cornell breeding program. In this project, 'Ontario' was confirmed as the parent of the 'Glenora', 'Himrod', and 'Alden' scoring null alleles at the VVMD25 locus. 'Fredonia' × 'Black Kishmish' were also confirmed as parents for 'Suffolk Red' considering the possibility of being a null allele at VVMD6 [33].
To prove that 'Muscat of Hamburg', a fine black table grape variety with a muscat flavor, is the progeny from the crossing of 'Schiava Grossa' and 'Muscat of Alexandria,' researchers used 2 isozymes (GPI and PGM), 30 nuclear, and 5 chloroplastic microsatellite markers. The likelihood of null alleles was calculated and found to be extremely low or absent [34].
The findings supported earlier research demonstrating the distinction between the Italian and Swiss "Cornalin" cultivars and the identity between "Humagne Rouge" and "Cornalin" from the Aosta Valley [35]. 'Goron', 'Petit Rouge', 'Mayolet', and 'Cornalin d'Aoste' all share at least one allele with 'Cornalin du Valais', suggesting parent/offspring relationships. Forty-nine out of fifty microsatellite loci support 'Cornalin du Valais' as the offspring of 'Petit Rouge' and 'Mayolet', but 'Humagne Rouge' has genotype 257-241 at locus VVMD 8 instead of 241-241 for 'Cornalin d'Aoste'. This clonal variant is likely caused by a null allele in 'Cornalin d'Aoste'. This was the first grapevine paternity research to deal with discrepancy at a microsatellite locus, demonstrating that the use of progressively large numbers of loci in generating parentage decisions leads to a proportional rise in the risk of meeting a locus with intra-cultivar variability throughout the analysis. It should be assumed that a single multiple repeat unit disagreement is not sufficient to invalidate a parentage hypothesis.
At first, it was assumed that a parent-offspring link existed between the red grape cultivar known as 'Sangiovese,' which is the most common red grape cultivar in Italy, and the ancient Tuscan variety known as 'Ciliegiolo' [36]. During the process of testing 'Sangiovese' as a parent of 'Ciliegiolo,' the putative other parent was looked for in a large, private, and standardized database; however, no candidate was found. After putting 'Ciliegiolo' through its paces as a potential parent for 'Sangiovese,' a total of four candidate cultivars were discovered. Only one of the fifty microsatellites was not consistent with this paternity test, leading researchers to conclude with a high level of confidence that the grape variety known as 'Sangiovese' is the offspring of 'Ciliegiolo' and 'Calabrese di Montenuovo' [37]. In the same year, Staraz et al., on the basis of their studies, suggested that 'Ciliegiolo' was not the parent but the offspring of 'Sangiovese' [38]. This hypothesis was later confirmed, with the addition that the 'Ciliegiolo' variety was probably the offspring of the 'Sangiovese' × 'Moscato violetto' varieties [39].
Bergamini et al. saw the discovery of two possible parents for the 'Sangiovese' grape that had not been mentioned earlier. The first variety that could be considered a putative parent is known as 'Ciliegiolo', and it has already been discussed as a relative of 'Sangiovese'. The second variety that could be considered a putative parent is 'Negrodolce', which is an old local variety that was considered lost over the course of the last century but was recovered by the authors. The newly postulated parentage held up well even after a comprehensive molecular examination, with the exception of a single inconsistency detected in one of the 57 different microsatellite markers that were examined. This discrepancy is certainly due to a null allele, and as a result, it should not impair the hypothesis. However, it does point out the limitations of the microsatellites profiling as a pedigree research method, considering that this was the third different kinship that had been proposed so far for the 'Sangiovese' grape variety [40,41].
In contrast to the modest number of markers required to establish the identity or nonidentity of two grapevine samples, a substantially higher number of markers are required to reconstruct parentage and pedigrees across cultivars to prevent incorrect relationship assignment. The majority of parentage and kinship reconstruction research included more than 25, and in some cases more than 50 markers [42].
Another example where the null allele caused a dilemma was in the study of Bowers et al. Microsatellite loci in 300 grape cultivars were applied to determine paternal relationships. Sixteen wine grapes grown in northeastern France, including 'Chardonnay', 'Gamay noir', 'Aligote ', and 'Melon', have microsatellite genotypes compatible with being the descendants of a single pair of parents, 'Pinot' and 'Gouais blanc', both of which were widespread in this region in the Middle Ages. 'Romorantin' does not share an allele with 'Pinot' at locus VVS2, expressing a 129-bp allele instead. 'Pinot fin teinturier,' a red-juiced variety of 'Pinot,' has this trait, but no other cultivars do. 'Dameron' does not share an allele with 'Gouais blanc' at locus VVMD36, supposing a mutation to a 254-bp or null allele [31].
With the goal of analyzing genetic diversity and examining parentages, a collection of 1005 grapevine accessions were genotyped at 34 microsatellite (SSR) loci [39]. After a preliminary simulation that permitted the estimate of crucial values of likelihood ratios, the parentage analysis was carried out using the CERVUS program. To accommodate for genotyping mistakes, the occurrence of null alleles, and mutations, mismatches at a maximum of two loci in each trio were permitted. Because of the high frequency of null alleles at locus VChr9b, the data analysis indicated that the majority of mismatches occurred there. As a result, this locus was eliminated and the analysis was rerun. In most cases, incompatible profiles were found at loci with a high frequency of null alleles, and could thus be explained by the presence of a null allele in either parent or offspring.
In Croatia, 36 nuclear SSR, 4 cpSSR, and 47 SNP investigations revealed a large number of admixed varieties and synonyms, which was attributed to complex pedigrees and migrations. The highest fixation index, divergence from the Hardy-Weinberg equilibrium, and highest prevalence of null alleles were determined for the Vchr8b and Vchr14b loci, and hence they were removed from future parentage analyses. The remaining set of markers revealed 24 full parentages and 113 half-kinships [45].
In the analysis of identification and parentage, the condition of HW equilibrium is a key underlying assumption to have [39]. Minor deviations from HW equilibrium or variations at a few loci do not have the potential to distort likelihood estimates; however, deviations at many loci may have this potential. In the event that this is the case, the certainty of identification and parentage designations should be interpreted with extreme caution. However, it is important to keep in mind that the discrimination power of the loci in HW equilibrium may be sufficiently strong to ensure the validity of the study as a whole.

Solutions for the Correct Pedigree Reconstruction
A novel approach was suggested by Mark R. Christi to detect parent-offspring pairs in large data sets; to allow for genotyping errors, null alleles and mutations, it is necessary to quantitatively estimate how many loci should be allowed to mismatch based upon the study-specific error rate. This approach was suggested for application to methods that determine the probability of identity among genotypes and suggested that one can additionally account for null alleles, missing data and mutation simply by adding estimates of those rates to the study-specific error rate [46].
It was suggested that well-established maximum likelihood approaches for estimating relationship and relatedness could be modified to take into account null alleles. This would be accomplished by differentiating between an observed genotype and the set of true genotypes that could have produced that observation. For example, the probability of observing the genotype pair ii/ii was calculated by adding the probabilities that the true genotypes are ii/ii, in/ii, ii/in, or in/in-the four true genotypes that would be observed as ii/ii. [7].
Genetic data can be used to estimate the genealogical link or relatedness of individuals of unknown ancestry. ML-RELATE is a computer software that calculates maximum likelihood estimates of relatedness and connection. This software can handle null alleles and is designed for microsatellite data. It employs simulation to identify which links are supported by genetic data and to compare suspected relationships to alternatives [47].
Pedigrees are used in many areas of genetic research because they enable a precise resolution of genealogical ties between individuals. The estimation of the short-term effective population size (Ne), which is important in domains such as conservation genetics, is one example of how pedigree information might be used. Despite their use, pedigrees are frequently unknown parameters that must be derived from genetic data. Using Markov Chain Monte Carlo, a Bayesian technique for jointly estimating pedigrees and Ne from genetic markers was proposed. With the use of a composite likelihood, this method allows for the examination of a large number of markers and individuals within a single generation, considerably increasing computational efficiency. Simulated data were used to demonstrate that the approach can accurately determine Ne and relationships up to first cousins [48].

Importance of Population Genetic Studies in Wild and Cultivated Grapevines
"Population genetics is the study of the genetic composition of populations, including distributions and changes in genotype and phenotype frequency in response to the processes of natural selection, genetic drift, mutation and gene flow" [49]. In the case of crops, population genetics research is mainly used to study and understand the process of domestication in wild relatives [50,51].
In the case of grapes, research in this area was initially limited to exploring the genetic diversity of the varieties grown in a country or region [52][53][54][55][56]. In the last decade, research has concentrated on assessing the genetic diversity of wild relatives, mainly Vitis vinifera ssp. sylvestris in the case of Vitis vinifera L. ssp. sativa cultivars [57][58][59][60][61][62][63], and of wild American species, mainly Vitis rupestris, Vitis berlandieri and Vitis riparia in the case of rootstocks [64][65][66][67][68][69]. At the same time, climate change has increased the need to search for wild relatives of different origins carrying resistance genes to mildews and, more recently, black rot [70].
The conservation of genetic diversity has become increasingly important in recent years due to climate change. There is now no doubt that genetic diversity is worth preserving for "worse times". In domestication, genetic diversity is always reduced by strong selection pressure, which can significantly reduce the survivability of cultivated species.
A low number of genetic markers was thought to be adequate to discover domestication and breeding loci due to extended LD at these loci induced by positive selection for hermaphroditism, lighter skin color, and muscat aroma [71]. Geographic isolation contributes to differences between Eastern table grapes and Western wine grapes. The reduction in genetic variation at this locus on chromosome 11 suggests that the size difference between table and wine grapes may not be attributable to geography alone, but to selection for larger table grapes in the East. Whether berry size became a breeding target was largely a result of the predominant religion in the geographic area where the grapes were bred: table grapes were bred to be large in the East where alcohol was prohibited, while wine grapes retained the ancestral smaller size that is more desirable for winemaking in predominantly Christian regions in the West. This means that religious constraints about alcohol intake may have altered nucleotide variability in a genomic area linked to berry size [71,72].
Wine and table grape cultivars have self-fertile hermaphrodite flowers, but wild European, American, and Asian varieties are dioecious. It was assumed by Battilana et al. [73] that 1 of 55 potential haplotypes for three SSR markers around the sex locus in 132 V. sylvestris accessions and 171 V. vinifera cultivars accounted for 66% of hermaphrodite individuals and it may be the product of domestication. Specific size variants of the VVIB23 microsatellite sequence inside the 3 -UTR of a putative YABBY1 gene are related with the sex alleles M, H, and f; these markers can help define the status of wild grapevine germplasm. As Vitis vinifera L. ssp. sylvestris is still regarded a gene pool for viticulture, allelic diversity at VVIB23 can be used to define wild germplasm and integrate it into marker-assisted breeding programs.

Consequences of the Presence of Null Alleles in Population Genetic Studies
The overestimation of visible alleles consequently led to the underestimation of genetic diversity present in a population. Considering that the non-visible alleles can differ from each other, the scale of underestimation could be even higher. A relatively high frequency of null alleles could be present in grapevine (Vitis vinifera L. ssp. sativa and Vitis vinifera ssp. sylvestris), which could be estimated at more than 20% (Figure 1).
Instead of being heterozygous with a null allele, samples with only one allele per locus were considered homozygous by Martinez et al. [53]. Based on their data, we estimated the frequency of null alleles by omitting varieties containing three alleles (due to possible gene duplication), which could be as high as 15% (VVMD32 at locus 2 in Figure 2). This may even lead to an underestimation of genetic diversity, although as the study cited is pioneering and relatively few genotypes were examined, this does not detract from the value of the study. Wine and table grape cultivars have self-fertile hermaphrodite flowers, but wild European, American, and Asian varieties are dioecious. It was assumed by Battilana et al. [73] that 1 of 55 potential haplotypes for three SSR markers around the sex locus in 132 V. sylvestris accessions and 171 V. vinifera cultivars accounted for 66% of hermaphrodite individuals and it may be the product of domestication. Specific size variants of the VVIB23 microsatellite sequence inside the 3′-UTR of a putative YABBY1 gene are related with the sex alleles M, H, and f; these markers can help define the status of wild grapevine germplasm. As Vitis vinifera L. ssp. sylvestris is still regarded a gene pool for viticulture, allelic diversity at VVIB23 can be used to define wild germplasm and integrate it into marker-assisted breeding programs.

Consequences of the Presence of Null Alleles in Population Genetic Studies
The overestimation of visible alleles consequently led to the underestimation of genetic diversity present in a population. Considering that the non-visible alleles can differ from each other, the scale of underestimation could be even higher. A relatively high frequency of null alleles could be present in grapevine (Vitis vinifera L. ssp. sativa and Vitis vinifera ssp. sylvestris), which could be estimated at more than 20% (Figure 1).
(A) Instead of being heterozygous with a null allele, samples with only one allele per locus were considered homozygous by Martinez et al. [53]. Based on their data, we estimated the frequency of null alleles by omitting varieties containing three alleles (due to possible gene duplication), which could be as high as 15% (VVMD32 at locus 2 in Figure  2). This may even lead to an underestimation of genetic diversity, although as the study cited is pioneering and relatively few genotypes were examined, this does not detract from the value of the study.
Genetic diversity of wild populations and cultivated varieties of grapevine were studied by De Andres et al. [75]. The problem caused by null alleles was not addressed, even though our calculations indicate that the estimated proportion of null alleles among Vitis vinifera L. ssp. sativa cultivars was 26.46%, while the proportion of null alleles in the woodland grape (Vitis vinifera L. ssp. sylvestris) varieties was as high as 26.82% (Table 1).   [53]; cultivars with triple alleles were ignored, and allele frequencies were calculated by MolMarker [16].
Modern viticulture coexists with traditional winemaking in Montenegro, which still follows historic traditions and uses local varietals. As a result, this region offers what could be a chance to investigate processes that increase genetic diversity. In total, 419 samples in situ around the country (cultivated plants from old orchards and wild vines) and 57 local types maintained in a grapevine collection were collected and analyzed to determine the diversity of Montenegrin grapevines and the processes involved in their devel- Figure 2. Ratio of alleles in VVMD32 SSR locus with probably null alleles. Data from the study of Martinez et al. [53]; cultivars with triple alleles were ignored, and allele frequencies were calculated by MolMarker [16].
Genetic diversity of wild populations and cultivated varieties of grapevine were studied by De Andres et al. [75]. The problem caused by null alleles was not addressed, even though our calculations indicate that the estimated proportion of null alleles among Vitis vinifera L. ssp. sativa cultivars was 26.46%, while the proportion of null alleles in the woodland grape (Vitis vinifera L. ssp. sylvestris) varieties was as high as 26.82% (Table 1). Genetic diversity and population structure assessed by SSR and SNP markers in a large germplasm collection of grapes were analyzed by Emanuelli et al. [76]. While not specifically addressing the estimation of null alleles, it was noted that SSR diversity may be underestimated because sequencing of some microsatellite loci suggested that the polymorphism did not only correspond to a variation in the number of repeats, but also to changes in their architecture and the flanking regions with substitutions and long indels. This suggests that the variation in the number of repeats is not the only factor contributing to the polymorphism.
A broad germplasm collection of 1.5 hectares encompassing most Iranian grape varieties was investigated utilizing 23 SSR loci. The effective number of alleles ranged from 1.82 (VrZAG83) to 9.73 (VMCNG2G7) with a mean of 4.41. Due to significant polymorphism, five SSR markers with PIC values above 0.80 were chosen for fast fingerprinting of grape genotypes. Heterozygosity ranged from 0.49 (VVMD17) to 0.97 (VrZAG64) without significant variations from predicted values. Brookfield's method was used to estimate null alleles [54].
Null alleles were considered and problematic cases were treated by Cao et al. [44] when they studied the pedigree and genetic diversity of Muscadine (Vitis rotundifolia syn. Muscadinia rotundifolia) grapes. Duplicates of a cultivar with missing data at one or two loci but identical at all other loci were regarded the same, and the cultivar with the least missing data was chosen to represent it. When a representative accession was chosen, data from other accession(s) of that cultivar were used to fill in any gaps in the chosen representative's locus. The genotyping errors caused by null alleles (nonamplified alleles), short allele dominance (large allele dropout), and the scoring of stutter peaks were identified using MicroChecker v2.2.3 [43].
Modern viticulture coexists with traditional winemaking in Montenegro, which still follows historic traditions and uses local varietals. As a result, this region offers what could be a chance to investigate processes that increase genetic diversity. In total, 419 samples in situ around the country (cultivated plants from old orchards and wild vines) and 57 local types maintained in a grapevine collection were collected and analyzed to determine the diversity of Montenegrin grapevines and the processes involved in their development. More than 100 of the 144 genetic profiles corresponded to farmed grapevines, indicating a high level of diversity [77]. The software STRUCTURE was used, which maintains missing data and null alleles.
Null alleles were also estimated by Mihaljevic et al. [45] for the genetic diversity and population structure analyses of Croatian grapevines as well. The highest fixation index, divergence from the Hardy-Weinberg equilibrium, and the highest prevalence of null alleles were calculated for the Vchr8b and Vchr14b loci (excluded from further parentage analyses). The risk of a null allele occurrence was low for all other alleles utilized in the parentage study. The informativeness of the SSR loci was measured as polymorphic information content (PIC) and ranged from 0.29 for VVIn73 to 0.78 for VVMD28, with an average PIC value of 0.66.

Solutions for Minimizing the Error of the Estimates
Nuclear SSRs are known for having a high frequency of null alleles, or alleles that do not amplify and hence are recessive and undetectable in heterozygotes. Maximum likelihood methods that compare observed and expected homozygote rates in the population under the assumption of Hardy-Weinberg equilibrium and direct null allele frequency estimations from progeny where parent genotypes are known were compared [78]. It was demonstrated that estimations from the two techniques are comparable, especially when the number of maternal half-sib progeny arrays sampled is substantial. Population genetic metrics such as genetic differentiation (F ST ) may be generally unbiased, with null allele frequencies ranging between 5% and 8% on average across loci. However, in fine-scale population research and parentage analyses, utilizing markers with such a high average prevalence of null alleles (up to 15% for some loci) can be extremely misleading [78].
Microsatellite null alleles can be found in all taxa to various degrees. They are problematic since they can inflate genetic differentiation measures and can be erroneously scored as homozygotes. Although numerous methods for correcting null allele proportions and estimating FST exist, nothing is known about how null alleles effect assignment testing. The percentage of successfully assigned individuals in model-based clustering and Bayesian assignment approaches was marginally, but significantly, lowered in the presence of null alleles, according to data based on simulations (frequency range from 0.000 to 0.913). The power to appropriately assign individuals was slightly reduced due to the bias in assignment tests generated by null alleles (0.2 and 1.0 percent units for STRUCTURE-and 2.4 percent units for GENECLASS-based assignment tests). Furthermore, the presence of null alleles resulted in a little, but statistically significant, overestimation of F ST . As a result, null allele-affected microsatellite loci would likely have no effect on the overall outcome of assignment testing and could thus be included in these types of studies. However, loci prone to null alleles should be utilized with caution because they reduce the power of assignment tests and affect FST accuracy, and loci that are less prone to null alleles should always be favored [79].
To achieve useful results, genetic clustering algorithms require a specific amount of data. How sample group information can be used to obtain better outcomes, when the amount of data is limited, was demonstrated in the common circumstance of individuals being sampled at multiple locations. For the structure program, new models are being built for both admixture and no admixture instances. The prior distribution for each individual's population assignment is modified in these models. The revised prior distributions allow for a variation in the proportion of individuals assigned to each cluster based on their location. The models were validated using simulated data, and microsatellite data from the CEPH Human Genome Diversity Panel was used to show them. The new models were shown to detect structure at lower degrees of divergence or with less data than the original structure models or principal components approaches, and they were not biased toward recognizing structure when it was not present. These models are included in a new version of STRUCTURE, which may be downloaded for free at http://pritch.bsd.uchicago.edu/ structure.html (accessed on 28 June 2022) [80].

Conclusions
Ignoring SSR null alleles may lead to scientifically incorrect conclusions. This means that researchers working on SSR studies should always ensure that any problems that may arise are addressed in some way. Omitting loci carrying null alleles from the study only appears to solve the problem and should only be recommended if it does not lead to excessive data loss. This might be the case, for example, if the minimum number of loci that would otherwise be recommended for a given species in a pedigree analysis can be achieved without loci carrying undetectable alleles and the remaining loci are well distributed in the genome.
Failure to address the null allele problem adequately could postpone breeding against the abiotic effects that climate change will bring to the forefront.