Genetic Diversity, Population Structure, and Parentage Analysis of Croatian Grapevine Germplasm

Croatian viticulture was most extensive at the beginning of the 20th century, when about 400 varieties were in use. Autochthonous varieties are the result of spontaneous hybridization from the pre-phylloxera era and are still cultivated today on about 35 % of vineyard area, while some exist only in repositories. We present what is the most comprehensive genetic analysis of all major Croatian national repositories, with a large number of microsatellite, or simple sequence repeat (SSR) markers, and it is also the first study to apply single nucleotide polymorphism (SNP) markers. After 212 accessions were fingerprinted, 95 were classified as unique to Croatian germplasm. Genetic diversity of Croatian germplasm is rather high considering its size. SNP markers proved useful for fingerprinting but less informative and practical than SSRs. Analysis of the genetic structure showed that Croatian germplasm is predominantly part of the Balkan grape gene pool. A high number of admixed varieties and synonyms is a consequence of complex pedigrees and migrations. Parentage analysis confirmed 24 full parentages, as well as 113 half-kinships. Unexpectedly, several key genitors could not be detected within the present Croatian germplasm. The low number of reconstructed parentages (19%) points to severe genetic erosion and stresses the importance of germplasm repositories.


Introduction
Domesticated grapevine (Vitis vinifera L. subsp. sativa) is one of the major fruit crops in temperate regions [1]. Most of the world's vineyards are planted with varieties that have been perpetuated for centuries by vegetative propagation [2]. Until the beginning of the 20th century, the vast majority of these cultivars arose from seedlings of mainly unknown origin, and most of today's well-known cultivars are their clonal offspring preserved throughout history [3]. Preference for a limited number of elite cultivars recognized worldwide for wine production and globalization of wine and table grape production, as well as market forces, have led to uniformity and loss of grape biodiversity. The structure An additional set of 47 genotypes (Vitis SSR allele reference set), obtained from the Vassal collection, National Institute of Agricultural Research (INRA), France, was genotyped in order to ensure proper allele sizing and adjustment with external data sources (Table S2). The genotype Bombino bianco , a variety which in previous studies [14] was shown to be related to the Croatian gene pool, was obtained from The Research and Innovation Centre, Fondazione Edmund Mach, S. Michele all'Adige, Italy. DNA extraction was performed using the peqGOLD Plant DNA Mini Kit (PEQLAB Biotechnologie GmbH, Erlangen, Germany) according to the manufacturer's instructions.

SSR Genotyping
The selection of SSR loci was based on the level of their polymorphism, genome coverage, good amplification assessed through preliminary analyses, and compatibility with the previously published and available genetic profiles [66][67][68][69][70][71][72][73][74][75]. All selected loci were divided into four categories (cat.) according to their purpose (Table S3). Primers from cat.1 (which included nine SSRs proposed for cultivar identification [7]) were used for initial fingerprinting of all accessions and reference genotypes, cat.1 and cat.2 combined were used for analyses of genetic diversity and genetic structure, as well as parentage assignment, and primer pairs from cat.3 for analysis of genetic structure (just Croatian germplasm) and as additional support of reconstructed parentages. Cat.4 included primers needed for chlorotype determination. All SSR loci, except those from cat.1, were amplified using M13 (-21) tailed universal primers [76] in singleplex PCR for which the respective temperature regimes and PCR protocols were optimized. Additional data like primer sequences, associated annealing temperatures, and PCR and capillary electrophoresis arrangement of given loci are also presented in Table S3.
PCR amplifications were carried out in a Veriti TM Thermal Cycler (Applied Biosystems, Foster City, CA, USA). Multiplex PCR reactions were performed in a total volume of 10 µL containing 10 ng of DNA, 0.75 U Taq DNA polymerase (Sigma-Aldrich, St. Louis, MO, USA), 0.2 mM single dNTP, 0.3 µM forward and reverse primers, 2X PCR buffer (Sigma-Aldrich), 4.0 mM MgCl 2 , and 1 M Betaine solution, which served as a PCR enhancer. PCR temperature conditions were as follows: an initial denaturation of 2 min at 94 • C, followed by 35 successive cycles of denaturation (45 s at 94 • C), annealing (45 s at different T a ranging from 50 to 60 • C, depending upon primer used), and elongation (60 s at 72 • C), as well as a final elongation at 72 • C for 20 min. After fingerprinting, a non-redundant set was defined and further genotyped at additional SSR and cpSSR loci. A unique M13 PCR protocol was used. Singleplex PCR reactions were performed in a total volume of 9.5 µL containing 10 ng of DNA, 0.75 U Taq DNA polymerase (Sigma-Aldrich), 0.21 mM single dNTP, 0.26 µM universal M13 and reverse primers, 0.068 µM forward primer, 1X PCR buffer (Sigma-Aldrich), 4.0 mM MgCl 2 , and 1.05 M Betaine solution. After an initial denaturation of 3 min at 94 • C, 29 consecutive cycles of denaturation (30 s at 94 • C), followed by 45 s of annealing and 1 min of elongation at 72 • C, were performed and then followed by 10 cycles of 30 s denaturation at 94 • C, 45 s of annealing at 53 • C, and 1 min of elongation at 72 • C, with final elongation at 72 • C for 10 min. Amplified products were separated using an ABI3130 Genetic Analyzer (Applied Biosystems) with GeneScan-500 LIZ TM size standard. Sizing of the fragments was performed using GeneMapper 4.0 software (Applied Biosystems).

SNP Genotyping
SNP genotyping was performed on a set of non-redundant genotypes that differed to a lesser extent from that used in the SSR analysis due to technical reasons (highlighted in Table S1). The SNP set consisted of 47 SNP loci previously used and proposed for grapevine fingerprinting [38], to which the SNP locus (i.e., cp4527) for chlorotype determination was added [77] to validate the reactions (instead of locus SNP555_132). SNP loci names and allelic variants expected are listed in the supplement data (Table S4). A set of 47 SNP markers was run on a Biomark system applying the FluiDigm methodology. Primer pairs were used to amplify 47 DNA templates per run and one non-template control. The primers and fluorescently labeled oligo probes were designed and synthesized by FluiDigm (South San Francisco, CA, USA). Allele-specific PCR products were generated after pre-amplification of the target regions by duplex quantitative real time PCR using FluiDigm's FR48.48 dynamic arrays. Genotypes were assigned based on the relative fluorescence intensities of the two alternative dyes (HEX and FAM) labeling the allele-specific probes targeting each SNP locus. Data analysis was performed in SNP Genotyping Analysis v4, and at least one negative control was included in the analysis for data normalization and auto-calibration ('no-template' control -NTC normalization method). Obtained SNP profiles were compared with available published SNP profiles [40].

Microsatellite Database -Formation and Harmonization
Microsatellite profiles at nine common SSR loci were downloaded from the European Vitis database [8] for all available Vitis vinifera L. subsp. sativa accessions and adjusted using reference varieties and an Excel Macro Tool that converts actual allele lengths (bp) into encoded alleles, and vice versa, according to the principle previously described [6] and is freely available [78]. This resulted in the creation of a database of 2152 non-redundant genotypes, including 100 Croatian accessions previously genotyped through the GrapeGen06 project [7]. In addition, 1906 genetic profiles from publications containing shared loci [11,14,26,[29][30][31][32][33]36,37,60,[79][80][81][82][83][84][85][86][87][88][89][90][91][92][93][94] have been manually harmonized using common genotypes and added to the database for further analysis. Comparing all alleles across loci using the Excel-Microsatellite Toolkit [95], two conditions were identified: genotypes' uniqueness and/or synonymy/duplication. Accessions were considered as duplicates when they had an identical SSR profile, and assuming the existence of genotyping errors/spontaneous mutations, maximum difference of two alleles was allowed. Redundant accessions were later considered and discussed as known/unknown synonyms but excluded from further analyses. A non-redundant genotype set that included all genotypes of putative Croatian origin was used for allele and chlorotype frequency calculations, analysis of genetic structure, and parentage reconstructions. Apart from the determination of synonyms, published genetic profiles [14] were used to back up analysis of genetic structure and parentage reconstruction. As M13 primers extend the amplicon length, standardization of the data was performed matching the length of alleles recorded in Reference [14].

Data Analysis
Non-redundant data was analyzed for Polymorphism Information Content [96], null allele probabilities [97], and probability of identity (P( ID )), under the two assumptions: that the individuals within the test set are non-related (P (ID)unrelated ), and, in the other case (P (ID)sib ), that the individuals are related [98] as well as deviation tests from Hardy-Weinberg equilibrium (HW), were all calculated in Cervus [99]. In GenAlEx 6.503 [100,101], the average number of alleles per locus (N a ) was calculated, as well as the number of effective alleles per locus (N e ), expected heterozygosity, and observed heterozygosity (H O ), and fixation index (F). Allelic richness (N ar ) was calculated in FSTAT [102] using the formula of Reference [103]. The allele frequency calculation for SSRs, as well as SNP, markers (expressed as minor allele frequency) was performed in Microsatellite Toolkit [95]. Each Croatian accession was assigned to one of four regions, according to their cultivation area: Istria and the Croatian Littoral (n = 23), northwestern Croatia (n = 28), Slavonia and the Danube region (n = 1), and Dalmatia (n = 75), all detailed in Table S1. The genetic dissimilarity coefficients between each pair of genotypes were calculated as -ln(PSA), where PSA is the proportion of shared alleles distance [104] in the Microsat program [105]. The dissimilarity matrix was further used for clustering of genotypes with UPGMA (Unweighted Pair Group Method with Arithmetic Mean) algorithm (for SSR data). UPGMA construction and dendrogram visualization was done in MEGA software [106].

Genetic Structure
The model-based Bayesian method in STRUCTURE version 2.3.4 [107] was also used to imply the genetic structure and define the number of groups within the analyzed data set in order to assign Croatian germplasm to a certain ancestral group. The analysis was performed on a set of 771 accessions, which also included 664 individuals (defined as traditional) from Reference [14], together with Croatian assumed set genotyped at 20 SSR loci. Those that have been defined as modern are the result of deliberate crosses by man and as such interfere with the assessment of the natural structure of the population. Refined Vassal genotypes were categorized into countries/regions of their presumed origin (based on the ampelographic literature), respectively, as done in Reference [15]: Balkans-BALK; Eastern Mediterranean and Caucasus-EMCA; Iberian Peninsula-IBER; Apennine Peninsula-ITAP; Maghreb-MAGH; Middle and Far East-MFEAS; Croatia; Russia and Ukraine-RUUK, Western and Central Europe-WCEUR, and a group of undetermined varieties-"ND". A total of 10 repetitions were performed for each assumed k that ranged from 1 to 10. Each repetition consisted of a burn in a period of 150,000 steps followed by 150,000 Monte Carlo Markov Chain iterations under the assumption of the admixture model and the correlated allele frequencies. The choice of the most probable number of groups (k) was carried out in the Structure Harvester [108] program by comparing the average probability estimators, calculated as ln [Pr (X | K)] for each k value, as well as by the ∆K calculation based on the change in the probability log of data between successive values [109]. CLUMPP v.1.1.2 [110] was used to align data between independent repetitions for each k using the "greedy" algorithm with 10,000 repetitions. Structure Plot [111] was used for the graphical representation of the groups. Individuals with Q ≥ 75% of their genome were assigned to the group of question, while individuals with Q values < 75% were considered to be of admixed origin. In order to confirm the genetic structure estimated by Structure analysis, only individuals with Q values > 0.75, were included in the molecular variance analysis (AMOVA). AMOVA was performed in GenAlEx 6.503 on a genetic distance matrix also based on the proportion of shared alleles calculated in Microsat [105].

Parentage Analysis
Parentage analysis was performed in Cervus [99] using a non-redundant set of genetic profiles comprised of the Croatian set (n = 127) and Vassal genotypes (n = 1095). Based on the results of allele frequency analysis, parent pair (gender unknown) simulation and analysis were performed and, subsequently, determination of "single parent -paternity" was done in Cervus in the same way, thus revealing first-degree relationship (with possible types of relationship-parent, offspring, full-sib, or half-sib). A million of offspring were simulated with a proportion of 0.01 sampled parents with the possibility of self-fertilization and the existence of relatives among potential parents. The critical LOD (logarithm of odds) value, with a higher confidence level (95%), obtained by simulation, was used as the criterion for parentage assignment, as well as allowing for no more than 5% mismatching (one out of 20 SSRs). In the case of successful parentage reconstruction, the maternal type was designated, when possible, using chlorotype data.

Genetic Characterization of Croatian Grapevine Germplasm Repositories
The status of each studied accession was determined based on the genetic profile of 9 common SSRs compared versus the internal database and classified as follows: (I) unique genotype; (II) synonym/duplicate in Croatia; (III) synonym in neighboring countries; (IV) synonym in distant countries; (V) domesticated variety of known foreign origin; (VI) intruder-intruder found within the true to type vines; (VII) questionable varietal status-accession has a unique genotype, but, due to the lack of ampelographic or literature data, its varietal status is questionable; (VIII) mislabeled accession/incorrect collection-accession is planted under the wrong name in the collection and is not true to type (mostly, these accessions have never been fingerprinted); and IX) incorrect sampling of tissue for DNA analysis. Details for each accession can be found in Table S5.
Among 212 accessions, 19 duplicates were identified between collections (or within a single collection), so only one genetic profile (with positive ampelographic identification) was included in the non-redundant set. The same was done for 26 cases of synonymy among analyzed accessions. All identified synonyms and duplicates are highlighted in Table 1. In addition, one case of homonymy identified as a variety of the Istrian collection called Plavina istarska (5IPT) shares the same name, but not the genetic profile, with the homonymous variety Plavina (4-C) grown in Dalmatia. After elimination of duplicates, synonyms, and intruders, the unique (non-redundant) set of SSR profiles consisted of 145 accessions. Within the collections, 145 different genotypes were detected, and, for 46 genotypes (32% of them), a synonym was found in one of the foreign database or scientific publications. Considering the status of each of the 145 unique accessions, all those that did not have synonyms, or do have synonyms abroad but their foreign origin is not strictly proven, were assumed as a "Croatian set" of varieties, totaling 127 of such accessions. Their SSR profiles for 36 loci are presented in Table S6.   * European Vitis Database [8], accessions numbers also presented when available. ** matches obtained through single nucleotide polymorphism (SNP) comparison.

Descriptive Molecular Statistics for SSR And SNP Data
SRR alleles and their frequencies for non-redundant set are given in Table S7. The genotypic data were fitted into allelic classes based mostly on a difference of three or two base pairs, but there were some alleles with a consistent difference of just one base pair. In addition, novel alleles not previously reported were detected, e.g., a new second allele at locus VMC4f3 (246 bp) in Kraljevina . However, no new alleles were found for any of the 9 common fingerprinting SSR loci. The genetic diversity parameters for a total of 36 analyzed SSR loci were calculated for the entire set of 127 assumed Croatian varieties ( Table 2).
Percentage of missing data was below 5% for most loci with exceptions of VViv37, Vchr4a, Vchr11b, and Vchr2b loci, which had somewhat lower average amplification rate. On average, nine alleles per locus were amplified, with values ranging from two (Vchr17a) to 15 for Vchr8b and VMC4f3, which also had the highest allelic richness (13.59). The effective number of alleles ranged from 1.47 for locus VVIn73 to 8.20 for locus VVMD28, with a mean of 3.9. The same loci were at the end of range for gene diversity (0.32 and 0.88) and observed heterozygosity (0.32 and 0.94). Mean H O (0.71) and H e (0.70) were almost the same. For Vchr8b and Vchr14b loci, the highest fixation index, deviation from the Hardy-Weinberg equilibrium, and the highest incidence of null alleles was estimated (so they were therefore excluded from further parentage analyses). All other alleles used in the parentage analysis had a satisfactorily low chance of null allele's occurrence. In correlation with previous parameters, informativeness of the SSR loci expressed as polymorphic information content (PIC) ranged from 0.29 for VVIn73 to 0.78 for VVMD28, with the average PIC value of 0.66. Together with VVMD28, four more loci (ssrVrZAG79, ssrVrZAG62, VVMD5, and VVMD32) from the common set of nine SSRs were amongst the eight most informative ones. The single-locus P (ID) unrelated values ranged from 0.027 (VVMD28) to 0.496 (VVIn73), whereas the P (ID)sib values ranged from 0.318 (VVMD28) to 0.715 (VVIn73). The P (ID) unrelated across all 36 loci was 1.19 × 10 −34 , whereas P (ID)sib across loci was 5.29 × 10 −14 .
SNP analysis was performed for the 124 genotypes (defined in Table S1) across 47 SNP loci (data provided in Table S8). Two loci were excluded from further analysis, one due to being non-polymorphic (SNP579_187) and the other one due to its poor amplification (SNP1445_218). Genetic diversity parameters were calculated for all remaining SNP loci ( Table 3). The amplification success rate was very high, almost 100% except at the SNP259_199 locus, where only 0.81% of the data was missing. The lowest number of effective alleles (N e = 1.03) was recorded for the SNP425_205 locus and the highest (2.0) for the five SNP loci (SNP197_82, SNP879_308, SNP945_88, SNP325_65, SNP1215_138), with mean of 1.71. The highest observed heterozygosity (H O = 0.7) was detected for SNP581_114 and the lowest (H O =0.03) for SNP425_205. Significant deviation from the Hardy-Weinberg equilibrium was observed for SNP581_114 and VVI_10329, as well as a fixation index that was significantly greater or less than 0. At 9% of loci, the frequency of the minor allele was <0.1. The highest PIC (0.38) was observed for SNP879_308, SNP945_88, SNP197_82, and SNP325_65, while the lowest (0.03) was for the SNP425_205. The single-locus P (ID) unrelated values ranged from 0.38 to 0.94 (SNP425_205), whereas the P (ID)sib values ranged from 0.59 to 0.97, for the same loci, respectively. The P (ID) unrelated across all 45 loci was 1.19 × 10 −34 , whereas P (ID)sib across loci was 5.29 × 10 −14 . High frequency of null alleles (f > 0.05) was detected in 40% of the loci. ** significant at the 1 % level, *** significant at the 0.1 % level, NS = not significant, ND = not done. N = number of observed accessions; MD =% missing data; Na = number of alleles, Ne = effective number of alleles, Ar = allelic richness, He = expected heterozygosity, Ho = observed heterozygosity, F = fixation index, HW = exact test of departure from Hardy-Weinberg equilibrium, F null = estimated frequency of null alleles, PIC = polymorphic information content, P(ID) probability of identity = unrelated and sib .

Comparison of SNP and SSR for Variety Discrimination
A set of nine SSRs, as well as set of 45 SNPs, both proposed for cultivar identification, were able to discriminate all investigated genotypes (data not shown). To compare the performance of nine SSRs and 45 SNP loci in variety distinction, a comparative summary of the genetic variation and polymorphism statistics (Table 4)  Average PIC values were more than twice higher for SSR (0.76) compared to SNPs (0.31). However, cumulative P (ID)sib values were more favorable in both cases for SNP markers (cum P (ID)unrelated 4.57 × 10 −16 , cum P (ID)sib 1.00 × 10 −08 ) versus SSR markers (cum P (ID) unrelated 4.07 × 10 −11 , cum P (ID)sib 1.44 × 10 −04 ). To obtain equal values of cumulative P (ID) values for SNP markers, the 18-19 most informative SSR markers should be used. Having in mind these differences, nine common SSR markers were 2.5 times more informative than 45 proposed SNP markers. * set of nine recommended SSR loci [7]. N = number of observed accessions, MD = missing data, A = total number of amplified alleles, Na = average number of alleles per locus, Ho = observed heterozygosity, He = expected heterozygosity, PIC = polymorphic information content, P(ID) probability of identity = unrelated and sib.

Chlorotype Distribution in Croatian Germplasm
Fragment lengths for the four analyzed cpSSRs and associated chlorotypes are listed in Table S9. Chloroplast DNA analysis on a set of 127 putative Croatian varieties resulted in the reconstruction of the chloroplast haplotype for 93% genotypes (Table S6).
A total of seven different chlorotypes (A, B, C, D, E, F, and H) were detected. Graphical representation of chlorotype shares across regions within Croatia (Figure 2) shows differences in their distribution. Generally, in Croatia, the most prevalent chlorotype is C, which occurs in about half of the varieties, one-third of varieties have chlorotype D, and the third most common has chlorotype A, with differences among regions in their geographical distribution. In northwestern Croatia, chlorotype C dominates and is followed by chlorotype D. Interestingly, chlorotype A, in which occurrence in coastal regions (comprising regions Istria and Croatian Littoral, as well as Dalmatia) accounts for 17%, was not recorded in northwestern Croatia at all. For the region of Istria and the Croatian Littoral, four different chlorotypes (A, C, D, and F) have been observed, with chlorotypes C and D being dominant. A total of five different chlorotypes (A, B, C, D, and H) have been recorded in Dalmatia, making it the region of highest chlorotype diversity. The dominant chlorotype in this region was chlorotype C.

Cluster Analysis and Genetic Structure
The grouping of 127 presumably Croatian varieties based on genotyping with 36 SSR loci is shown in Figure 3. Clustering was mostly in correlation with regions of their cultivation. Dalmatian varieties are grouped separately, and, within existing subgroups, it is possible to notice grouping due to kinship, e.g., subgroups with Plavac mali or Tribidrag . The same was true for the varieties of northwestern Croatia, for example, a subgroup around Belina starohrvatska . Varieties from Istria and the Croatian Littoral also showed certain level of homogeneity within the sub-clusters, although their connections with both Dalmatian and continental varieties can be noticed.
Based on the Structure Harvester results, a decision was made on the best k (k = 5) as the number of original populations ( Figure S1). Membership coefficients (Q values) in each of the five proposed populations are given in Table S10 for all putative Croatian varieties.
The same is visualized in Figure 4, after the varieties have been sorted according to the sampling site or the assumed origin. With sampling location as the criterion, inferred original populations can be renamed as follows: population 1 (represented in red, marked Balkan) covers the largest number of varieties from Hungary, Croatia, Serbia, and Romania and could be broadly defined as the Balkans. Population 2 (represented in yellow and designated Iber) comprises the largest number of varieties from Spain, Portugal, Algeria, and Morocco and was defined as the Iberian Peninsula and Maghreb countries.   (Table S11) confirm the proposed Structure clustering since Φ PT was highly significant indicating the existence of a general structure and confirming that most genetic diversity results from differences between populations. Φ values for each population pair (Table S11) showed high significance for each pair of proposed populations and indicate that Balkan group is the most similar to East, while the most divergent is from the Gallica population.

Parentage Analysis
In Table 5, 24 proposed full parentages of Croatian varieties are presented, with 14 out of them having not been previously reported. The critical LOD values were determined by simulation of parentage (LOD = 14.5 for P = 0.05).
When possible, genotyping was further extended so number of matches (based on visual inspection) of additional 14 three-and tetra-SSR loci, as well as 45 SNP loci, are indicated. Chlorotypes, if known, are highlighted so that maternal genotypes are indicated where possible. Although cv. 'Sokol was not part of 127 Croatian set, his already well-known parentage is shown, as confirmation of his foreign origin. All proposed parentages were significant for P = 0.05. Offspring detected resulted from cross pollination of two different varieties, and self-pollination was not detected. For eight parentages ( Belina starohrvatska included), both parents and offspring are part of Croatian germplasm, while, for the 17 proposed trios, at least one parent is of a foreign origin. The two most common varieties that emerge as a parent in six proposed trios are the Italian variety Bombino bianco (syn. Slovenian Trevolina [30]) and Belina starohrvatska (syn. French Gouais blanc , German 'Heunisch weiss , Table 1). Bombino bianco can be characterized as a mother in at least half of the proposed trios. For two trios, Plavac mali is a second proposed parent, with Bombino bianco as a first, but since they share the same chlorotype, it is not possible to distinguish the maternal parent. The next most common parent in six crosses was Belina starohrvatska , which was in half of the cases a pollen recipient from donor Bljuzgavac (syn. Blank blauer [14]), while its crossing with the Romanian Alba imputotato resulted in two more offspring. The aforementioned Bljuzgavac is part of four parentages, and parents-Bratkovina crvena and Hungarian Gyöngy feher -have been proposed for the same. The most represented Dalmatian variety was Plavac mali (for which previously proposed parentage [64] was not confirmed [112]), and it was recorded as one of the parents in four proposed trios. For three varieties ( Kozjak bijeli , Muškatel , and Volovina ), both proposed parents were of foreign origin. A total of 113 half kinships (varieties sharing at least one allele at each of the 20 SSR loci) was reconstructed, among which: (i) 103 cases of putative direct (first-degree) relationships (with no possibility to determine if a cultivar is a parent, an offspring, or a full sibling of the second cultivar), out of which 61 relationships were not previously reported; and (ii) five half-sib relationships, of which two were not reported previously, as well as (iii) five not possible relationships due to the already known parentages, out of which two were not reported previously. The full list of first-degree relationships are given in Table S12, with some examples highlighted (Table 6). Varieties that are most interconnected within the presumable Croatian set through first-degree relationship are: Plavac mali with a total of twelve assigned relationships, Bombino bianco with a total of seven assigned relationships, Bljuzgavac with a total of seven and Tribidrag with a total of six assigned first-degree relationships. Within Croatian germplasm, for 71 varieties (56%) putative direct (first-degree) relationships was determined (most probably parent-offspring relationship) with one of the varieties from the total analyzed set. The aforementioned degree of relationship, but within Croatian germplasm only, was determined for 58 varieties (45%). For six varieties, Mijajuša , Mladenka , Muškatel , Silbijanac , Teran , and Žutozelen , possible first-degree relationships were determined exclusively with varieties of foreign origin.

Determined Status of Croatian Grapevine Collections
Although Croatian germplasm has been genotyped via SSRs through various projects [36,[57][58][59][60][61], this research represents the most comprehensive analysis of Croatian germplasm at the national level so far. Here, we covered all official grapevine repositories and defined latest status of all so far collected accessions in Croatia (Table S5). The process of germplasm collection requires years of reasoned management with occasional genetic verification of true to typeness, e.g., intruders have been identified within the National Collection of Autochthonous Varieties (Faculty of Agriculture in Zagreb) and were consequently excluded. Fingerprinting has proven particularly useful for the Collection of Istrian varieties (Poreč). Although collected in local vineyards through various conservation projects, Istrian accessions have not been adequately genotyped and compared with international databases [113]. Open issues of the mentioned collection were answered and defined mostly through mislabeled accession / incorrect collection (Table S5). Each germplasm collection will be able, based on the fingerprinting data, to review their duplicates/redundant or unnecessary accessions internally and make cost rationalization decisions. From the same data, the occurrence of duplicates between collections is visible, but such duplications are preferred for security reasons and, moreover, are standard procedure in most germplasm collections [114]. Finally, it should be noted that determining the status of "duplicates" in a collection using microsatellites does not necessarily mean that these accessions are phenotypically identical since they can represent different clones. Identified errors in the internal SSR database, as well as the revision and complementation of data on profiles at nine common SSR loci, enabled the correction of data for 20% of Croatian varieties in which profiles are stored within the largest public database of grapevine varieties, i.e., the European Vitis database. The accuracy and completeness of genetic profiles ensures proper estimation of the overall grapevines genetic variability [7].

Choice of Markers for Routine Fingerprinting
Using the M13-tailed primers, VViv37 locus could not be amplified. Successful amplification of the VViv37 locus was achieved only with primers that were both fluorescently labeled. Since this problem was addressed at a later research stage, weak amplification of the aforementioned locus (Table 2) was probably due to DNA degradation caused over time. Optimization of the PCR reaction with the M13 approach (proposed as cost effective) showed to be challenging due to the method requirements, primarily its temperature regime [76] and length difference of the reverse and forward primer.
On average, the amplification rates for tri-and tetra-nucleotide loci were poorer compared to other loci, as well as their detected allelic richness, PIC, and P (ID) values (Table 2), although the last three parameters were similar to those previously recorded [73]. In addition, the most polymorphic loci showed a high occurrence of null alleles, which has been observed also in other studies [10,73]. This can be an issue, especially in pedigree reconstructions where such loci need to be neglected. One of the main advocated advantages of three-and tetra-nucleotide markers is easier allele determination due to the smaller stuttering effect and easier binning. However, for certain loci, alleles different from the product of basic repeat number were detected, such as Vchr2b, Vchr11b, and Vchr16a locus (Table S7). As confirmed in the present study, irregularities in allele lengths at the Vchr11b and Vchr16a loci were identified earlier [73].
Loci recommended [13] and used in genotyping of world's largest germplasm collection [14], in our study, proved to be equally effective, e.g., the two most informative loci in both studies were VVip31 and VVMD28. The cumulative P (ID)sib value for the nine common SSRs today routinely used for genotyping was 1.44 × 10 −4 , and the same efficacy can be achieved using top eight most informative microsatellite markers (VVMD28, VVIp31, ssrVrZAG79, ssrVrZAG62, VViv67, VVMD5, VMC1b11, and VVMD32), except for the Vchr8b locus due to its high frequency of null alleles (Table 2). Given their high polymorphism, as well as other advantageous properties, like ease of amplification and well-known lab protocols and alleles expected, the recommended set of 9 SSRs proved once again to be a valuable and practical tool in grapevine fingerprinting.

Will SNP Markers Replace SSRs for Routine Identification of Grapevine Varieties?
In the last five years, SNPs have been the markers of choice in grapevine research, as well as for other crops [43,[115][116][117]; however, still, the most used type of markers for routine fingerprinting are SSRs. An alternative to the common set of 9 SSRs in the form of highly reproducible and well scattered 48 SNP markers was proposed [36]. In this research, 47 of the 48 proposed SNP loci were analyzed, and 45 of them have been used for further analyzes. Although SNP579_187 locus was polymorphic in other studies [38,40] where minor allele frequency (MAF) value as 0.17 was reported [38], in this study, it was nonpolymorphic, i.e., uninformative for Croatian germplasm. This was less often the case with microsatellite loci since those selected for fingerprinting detected more than just two alleles per locus. PIC values were, on average, 2.5 times higher for 9 SSRs (PIC = 0.76) than for SNP markers (PIC = 0.31), which is in agreement with previous results [20,38,118]. Comparing the SNP and SSR loci based on P (ID) unrelated (Table 4), it is evident that one SSR locus provides on average information as 3-4 SNP loci, respectively, which is identical to earlier findings [38]. However, bearing in mind the complexity of grapevine kinship [41], it is more appropriate to use a P (ID)sib value indicating that 18 SSR markers provide the same information as 45 SNP loci, that is, one SSR locus provides information as 2.5 SNP loci. As genetic similarity among genotypes increases, so does the informative nature of the SNPs.
Large grapevine collections have genotyped their germplasm in the last decade [20,38,41], but resulting SNP profiles are generally not publicly available. It can be assumed that, without an international consortium initiative to form a public SNP profile database, international comparison will not be possible, as also stressed by others [43]. It is likely that smaller laboratories and research groups will continue to use microsatellite markers for routine genotyping (e.g., genotype checking in nursery production) due to lower equipment costs and the abundance of available microsatellite profiles of already analyzed grapevine varieties on a standard 9-loci set.

Links to Other Countries through Synonyms
For a third of genotypes within Croatian collections, a synonym was detected in one of the neighboring or more distant countries (Table 1 and Table S5). For a total of 95 accessions, no synonym has been detected in any other country, so we can consider them to be unique for Croatian germplasm (Table 1 and Table S5). Although the analysis of the genetic structure suggested that origin of four unique varieties ( Bogdanuša , Trnjak , Krivaja crvena , and Cibib ) was from foreign gene pools, since they were not found in areas of their presumed origin and have a long history in Croatia, they can still be considered as part of the Croatian grapevine heritage. Conservation of these genotypes should be accentuated given their uniqueness. In addition, additional 7 varieties ( Garganja , Draganela , Hrvatica , Dolčin , Bilan , Tribidrag , and Bratkovina bijela ) might have Croatian origin because they have at least one Croatian variety as a first-degree relationship (most likely potential parent offspring pair), and other researchers have not reconstructed other first-degree relationships for the same.
For varieties of coastal area, links with Italian, Slovenian, and, to a lesser extent, Greek germplasm have been established through identified synonyms. The similar results were suggested in earlier comparisons of germplasms [11,57,60,62]. Historically and climatically, these Mediterranean countries are interconnected by the Adriatic Sea. The influence of the Greek germplasm probably extends from the time of the first ancient Greek colonization of Adriatic islands [65] till today in presence of varieties like Mijajuša or Cipar . Neighboring Bosnia and Herzegovina (B&H) is a country where many Croatian varieties are grown under the same name, e.g., as Plavac mali or Plavina from Croatia, or reciprocal example Žilavka from B&H in Croatia [33]. Synonyms of varieties from the continental part of Croatia have been confirmed primarily in Austria and Germany. The fact that numerous Muscat accessions grown in Croatia are synonyms of some well-known Muscat varieties has been confirmed in this study, as well as in previous ones, such as Muškat ruža porečki = Rosenmuskateller [57], Muškat momjanski = Muscat White [119], or Muškat ruža omiški = Muscat Hamburg [59,60]. To what extent Italian or Greek germplasm had impact on the formation of Croatian germplasm is evident from the established genetic structure and kinship, which will be discussed later. Numerous established and confirmed synonyms are probably the result of longtime migration of varieties via trade or migration routes. Migration direction is often difficult to prove, so their exact origin remains an open question.

Chlorotype Variation and Geographical Distribution in Croatia
The most extensive reconstruction of the chlorotypes of Croatian germplasm has detected seven different chlorotypes, revealed certain differences in their geographical distribution ( Figure 2) and contributed to the affirmation of reconstructed pedigrees where chlorotypes of the two parents were distinct. Abundance in chlorotype diversity was found, including infrequent chlorotypes, such as E, F, or H, that were previously found specific for Near East sylvestris populations [53]. Chlorotypes of two accessions ('Teran and Bogdanuša ) did not match those previously established [53] (Table S6). Determined chlorotypes allowed us to discriminate two varieties, Belina starohrvatska and Bombino bianco , as key maternal genitors in the formation of Croatian germplasm. Although the three most extensive grapevine pedigree reconstructions [14,19,41] did not include chloroplast DNA analysis in their methodology, in this research, it proved to be a useful diagnostic and supportive tool in pedigree reconstructions. Greater diversity of chlorotypes in coastal Croatia is correlated with greater genetic variability determined at nuclear DNA level (data not shown). Continental Croatia is dominated by chlorotype C, the one which is also dominant for a group of French and German varieties [53]. Chlorotype A was not recorded for varieties of continental Croatia, although its share in coastal germplasm was 17%. Chlorotype A is a dominant genotype of European sylvestris populations, and its distribution in sativa populations diminishes from west to east [53]. One possible explanation for the absence of this chlorotype in the continental part is the dominance of Belina starohrvatska (chlorotype C) as the mother and the absence of sylvestris introgression. In contrast to our coastal part where chlorotype C is dominating, in Italy, chlorotype C is represented in a small percentage, with chlorotype D being the dominant one [53,120]. This is intriguing given the existence of numerous connections and links with Italian germplasm.

Genetic Diversity and Structure of Croatian Germplasm
Even though most grapevine varieties are self-fertilizing, they are highly heterozygous as also confirmed in this paper. Average observed heterozygosity calculated for Croatian germplasm (36 SSRs) was 0.71 (Table 2), but when considering only 20 SSR common loci used for same parameter calculated for the Vassal Collection (H O = 0.76; n = 2096) [15], the H O value was the same. Average number of alleles per locus (N a = 9.6 for common 20 loci, data not shown; N a = 9 for 36 SSRs) was expectedly lower than that calculated for entire INRA collection (N a = 16.1). However, detected N a = 8.9 for group S-3.3 (Wine = Balkans & East Europe, n = 226) [15] is somewhat lower than values of the Croatian set (N a = 9.6). In fact, only eastern populations of table grape exhibit significantly more alleles per locus, so we can state that genetic diversity of Croatian germplasm is rather high considering its size. This is also supported by SNP genotyping results, where H O values in this study (H O = 0.416) were higher than those (H O = 0.397) previously determined [38] for same loci (but for a diverse panel of varieties from Imidra (Spain) repository) and those (H O = 0.349) determined by Reference [20] for a different SNP panel.
In order to gain insight into the genetic structure, it is necessary to include as many varieties as possible, as well as polymorphic, unlinked markers. Having in mind that the grape germplasm of a certain area does not represent necessarily a natural population, published profiles of only traditional varieties from the Vassal collection were included to reduce human impact through modern breeding. According to Evanno's ∆K statistics ( Figure S1) K = 5 was indicated as the most pertinent level of population subdivision and the assumed genetic structure was confirmed by molecular variance analysis (Table S11). As the optimal subdivision(s) of the Vassal collection, k = 3 and k = 5 were found [15]. For k = 5, two groups were defined as in the present study (Balkan and Iber group), and the largest difference was obtained for grouping of Western and Central European varieties that, unlike in this study (populations Biscay and Gallica), grouped into one group. In this study, 41% of varieties were admixed according to Q values. If we would use stricter criteria, Q≥ 0.85, as in Reference [15], the proportion of varieties of admixed origin for our established k rises to 53% versus 62%, in accordance with Reference [15]. The reason for the large percentage of varieties of admixed origin can be found in complex pedigrees and migration of grapevine material.
Almost half of the Croatian germplasm was assigned to population 1 designated Balkan (Table S10), in accordance with the eco-geographic grouping to proles Pontica (subproles Balcanica) proposed by Reference [121], thus confirming previous reports for a limited number of Croatian varieties that classified them as part of Balkan and East Europe [15] or Balkan only [43]. It is important to note that both continental and coastal varieties were grouped into the same Balkan population. Given the many links with Italian germplasm, it was expected that some varieties would be grouped with Italian ones. However, analyses of genetic structure [15,20] define the Italian population as admixed one and emphasize the role of the Roman Empire in the expansion of viticulture and, consequently, of varieties itself. The Balkan group shows the greatest similarity to the ancestral population East (Table S11), which agrees with previous results [43] and is the result of the grapevine migration via Mediterranean trade routes throughout history.
Cluster analysis performed based on 36 SSRs (Figure 3) implies the existence of a structure within Croatian germplasm. A group of Dalmatian varieties is most homogeneous and formed around key varieties, such as Plavac mali and Tribidrag , and the clusters follow established kinship relationships. In addition, a cluster containing both southern and northern Adriatic varieties grouped on the basis of kinship with the Bombino bianco variety is clearly distinguished, as well as a large cluster consolidating descendants of Belina starohrvatska and Bljuzgavac . Varieties that were not part of Balkan ancestral group also exhibit lower genetic similarity with the rest of Croatian germplasm and are thus separated into different clusters, such as cluster of Eastern (presumably Greek) varieties: Mijajuša , Krivaja crvena , Trnjak , and Brunac .

Origin of Presumably Croatian Varieties
All proposed parentages of Croatian varieties published in reference [14] were confirmed in this study, partly at additional SSR and SNP loci (Table 5). Full parentage was reconstructed for 24 varieties that make up 19% of the putative set of 127 Croatian varieties. For the Vassal collection, this was possible for as much as 35% of varieties [14]. The fact that almost twice as much pedigree trios could be detected for the worlds' largest grapevine collection can be partly explained by the fact that half of them refer to modern varieties, created in the last 150 years, mostly through crossing of well-established and widespread varieties, which can, of course, still be found in germplasm collections. Discovery and confirmation of numerous parentages within French germplasm speaks in favor of well-preserved regional germplasm and suggests that inclusion of regional varieties within a candidate parent set increases the likelihood for successful parentage reconstruction. Published SSR profiles [14] originating from neighboring countries, such as Italy and Hungary, represent only a small fraction of their germplasm, whilst some countries, such as B&H or Serbia, are represented with just a few, and neighboring Slovenia with no accessions. In only five parentages, all three members were part of Croatian germplasm (not including Belina starohrvatska ), indicating that the number of reconstructed parentages could be higher if broader regional germplasm would be included. For a total of 45% of the varieties, at least one first-degree relationship (most probably parent-offspring relationship) with one of the supposed autochthonous varieties was detected, and for some of them first-degree relationships with foreign varieties were also detected. Therefore, assumption that most of the Croatian germplasm included in the research originated on the territory of present-day Croatia could not be confirmed, although most of the varieties today exist exclusively here. The low number of reconstructed parentages points to severe genetic erosion of Croatian germplasm since the late 19th century, as well as to the importance and justification for preserving neglected local varieties, although most of them are not economically interesting at present.

Conclusions
This study is so far the most representative and comprehensive analysis of Croatian grape germplasm. Although Croatia is a small wine growing country, the grape genetic diversity of its germplasm is high. SSR marker data enabled identification of many duplicates and synonyms both within and between collections, which will improve curating the germplasm repositories. The selected 45 SNP loci, previously recommended for fingerprinting, have proven to be a good identification tool, but their future relevance remains questionable due to the absence of public databases for comparison. So, we believe that the broadly accepted 9 SSRs set will remain the gold standard in varietal fingerprinting. The inferred genetic structure has classified 50% of Croatian germplasm as part of the Balkan group of cultivars in accordance with the eco-geographic grouping. The large number of admixed varieties is a consequence of complex pedigrees and migration of varieties over wider territory surrounding Croatia today. For a total of 95 accessions, no synonym has been detected in any other country, so we can consider them to be unique to Croatian germplasm. However, complete parentage reconstruction was possible for only 19% of varieties. This confirms the severity of genetic erosion in Croatian germplasm since the end of the 19th century. Contrary to the expectations, foreign varieties, i.e., genotypes that we failed to find within present day Croatian germplasm, participated in many parentages, and the number of cases where both parents are exclusively of Croatian origin is rather low. Thus, considering the rich political history of the entire region and changes of administrative borders, vicinity and similarity with particular geographical and agro-ecological areas of neighboring countries seems to be more relevant for genetic structure of germplasm than administrative borders of single countries. Studied varieties had been derived from spontaneous crosses of the same parental combinations or just with few key genitors, rather than out of many chance combinations. The empirical selection process that has been performed in the past by humans was probably a result of the unconscious preference for high yielding or more adaptive cultivars. Interestingly, almost all key parents are today critically endangered varieties or have disappeared from our vineyards. All this data points to the importance and justification of preserving neglected, local varieties in germplasm repositories, in spite of their present low economic significance.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4425/11/7/737/s1, Figure S1: Determination of K values according to Reference [109], Evanno et al. (2005). The rate of change of the posterior probability of the data given the number of subgroups is plotted against K. The first peak (K = 5) corresponds to the optimum number of subgroups. Table S1: List of 212 Vitis vinifera L. subsp. sativa accessions including their codes, presumed varietal names, skin color, grape utilization, sampling collection, main cultivation region, conservation status and types of analyses conducted, Table S2: List of reference Vitis accessions, Table S3: Primer sequences, chromosome positions, fluorophore, range, associated annealing temperatures and PCR and capillary electrophoresis arrangement of SSR and cpSSR loci used in this study, Table S4: 47 SNP loci recommended for identification analyzed in this study and expected polymorphism,  Table S9: Determined chlorotype haplotypes based on 4 cpSSR markers. Table S10: Genome share of all varieties from the putative Croatian set in each of the five proposed gene pools. Q values > 0.75 are highlighted, Table S11: AMOVA results and pairwise population ΦPT values, Table S12: Complete list of Croatian cultivars and other Vitis vinifera cultivars in possible direct (first-degree) relationship.  Acknowledgments: SNP fingerprinting was performed at the Julius Kuehn Institute, Federal Research Centre for Cultivated Plants, Institute for Grapevine Breeding -Geilweilerhof, Siebeldingen, Germany. The authors would like to acknowledge the given opportunity and technical assistance. Authors thank Carole P. Meredith for critical reading.

Conflicts of Interest:
The authors declare no conflict of interest.